Perfect optimization experience
Nov. 25th, 2017 07:47 pmMeasure performance, find main bottleneck, fix, repeat. Simple. Even simpler: instead of "find main bottleneck" - find top hotspot, instead of "fix bottleneck" - rewrite the hotspot function to make it faster. That is a textbook example, many iterations rarely happen in real life.
In past 2 weeks it was a first time in my 10+ years full time s/w optimization career when I make 4 iterations of a simple process one by one so fast. Usually every subsequent step becomes harder, and returns become lower, so step 3 or 4 already takes more than a week.
So I got a baseline software from a customer, it was never optimized before (most of it is a straight port from Matlab to C++). Optimization target is ~20x, and my estimate was it would take 2 months.
Switched old gcc to newest Intel compiler, selected right flags, got ~2x speedup. Found a hotspot, optimized it ~20x, got 2x speedup. Found next hotspot, optimized it ~10x, got ~1.8x speedup. Found next hotspot, optimized it ~8x, got ~1.6x speedup. Found next hotspot, optimized it ~6x, got ~1.2x speedup. In just 2 weeks, the speedup now is ~13x, and I think I can make 2-3 more iterations. ROI will drop as usual, but in the end I'll get almost 20x speedup vs original code.
It helps to have a low baseline :)
In past 2 weeks it was a first time in my 10+ years full time s/w optimization career when I make 4 iterations of a simple process one by one so fast. Usually every subsequent step becomes harder, and returns become lower, so step 3 or 4 already takes more than a week.
So I got a baseline software from a customer, it was never optimized before (most of it is a straight port from Matlab to C++). Optimization target is ~20x, and my estimate was it would take 2 months.
Switched old gcc to newest Intel compiler, selected right flags, got ~2x speedup. Found a hotspot, optimized it ~20x, got 2x speedup. Found next hotspot, optimized it ~10x, got ~1.8x speedup. Found next hotspot, optimized it ~8x, got ~1.6x speedup. Found next hotspot, optimized it ~6x, got ~1.2x speedup. In just 2 weeks, the speedup now is ~13x, and I think I can make 2-3 more iterations. ROI will drop as usual, but in the end I'll get almost 20x speedup vs original code.
It helps to have a low baseline :)