Entry tags:
Software performance: consultant's perspective.
After ~13 years working full time in software performance consulting role, I think now I have enough statistics about different types of software optimization projects.
~20% of the projects are the most challenging, a customer comes to me because their software is already well optimized, but they found a performance regression on a new platform, or in a new middleware, OS kernel, compiler, etc. There are some common patterns for these projects, but usually they are unique. The fix is usually on system level, CPU u-arch level or combination of two. It is almost never with software/algorithms.
~30% are interesting. The software is reasonably optimized, the team understands current performance and constraints, but there is a regression or unexpected slowdown, or they just need it to run ~20% faster. In these cases couple of simple tricks with system and u-arch optimizations usually work. No need to deeply investigate application business logic either. These projects are my usual trade, not too fancy, but OK. Specialisation pays off. Sometimes I also find something to learn on these projects, but not very often.
Now, the rest 50%. This is what this post is about. As always, the project's team claims that they are very serious about performance, tried to optimize it themselves and got a nice 2x improvement, but they need 2x more, so they summoned an expert. After a short triage, I usually find several of these symptoms, usually 2-5 in each project:
1. They don't have a performance metric to optimize for.
2. They have a wrong performance metric to optimize for.
3. They developed their own profiler inside their application, and when I test it, it produces wrong results.
4. They developed their own profiler inside their application, and when I test it, results that are reasonably correct. (That is not too bad, just most often there is no reason not to use normal profilers)
5. They are using a profiler that is configured or built so that it disturbs the application to a point where performance data becomes meaningless.
6. Because of 3-5, their real hotspots are very different from what they think their hotspots are. So they optimized wrong things.
7. They spend a lot of efforts on inventing a faster O(x) algorithm for what they think are their hotspots.
8. They optimized right hotspots, but those were never on critical paths for their performance metric. (Can be good for power saving though)
9. Their real hotspots are debug prints (in release build)
10. Their real hotspots are timer reads in debug prints (in release build)
11. (not very common, but happens). They call (trigger a call, maybe periodically) a shell script from a hotspot.
12. They don't have infrastructure to test under full/real load. Customer deployments are their only performance testing testbeds!
The list could go on for another dozen of silly things. But here is an important part that these projects have in common: I never have to go after u-arch specific optimizations. Often I have to fix several apparent system config issues. But then I actually see a code, and I always notice one thing: They usually really do care about performance! They often hire people who code always like on a whiteboard, getting close to best conceivable runtime in every function they write. They minimise locked code and use lockless data structures. But then comes the integration, and third party libraries, and just more complexity, and all their performance work on functions/algorithm level somehow just does not help much for the whole system.
Sometimes their motivation to call me is "we even called a s/w performance consultant and he could not help, our performance is already perfect." I can often notice that when I see pride they show when talking about O(N) in every function, except for the functions where it is O(log N) and O(1)...
OK, rant mode off. I'll go back to hackerrank to practice optimizing superficial problems that I never encounter as a performance consultant..
~20% of the projects are the most challenging, a customer comes to me because their software is already well optimized, but they found a performance regression on a new platform, or in a new middleware, OS kernel, compiler, etc. There are some common patterns for these projects, but usually they are unique. The fix is usually on system level, CPU u-arch level or combination of two. It is almost never with software/algorithms.
~30% are interesting. The software is reasonably optimized, the team understands current performance and constraints, but there is a regression or unexpected slowdown, or they just need it to run ~20% faster. In these cases couple of simple tricks with system and u-arch optimizations usually work. No need to deeply investigate application business logic either. These projects are my usual trade, not too fancy, but OK. Specialisation pays off. Sometimes I also find something to learn on these projects, but not very often.
Now, the rest 50%. This is what this post is about. As always, the project's team claims that they are very serious about performance, tried to optimize it themselves and got a nice 2x improvement, but they need 2x more, so they summoned an expert. After a short triage, I usually find several of these symptoms, usually 2-5 in each project:
1. They don't have a performance metric to optimize for.
2. They have a wrong performance metric to optimize for.
3. They developed their own profiler inside their application, and when I test it, it produces wrong results.
4. They developed their own profiler inside their application, and when I test it, results that are reasonably correct. (That is not too bad, just most often there is no reason not to use normal profilers)
5. They are using a profiler that is configured or built so that it disturbs the application to a point where performance data becomes meaningless.
6. Because of 3-5, their real hotspots are very different from what they think their hotspots are. So they optimized wrong things.
7. They spend a lot of efforts on inventing a faster O(x) algorithm for what they think are their hotspots.
8. They optimized right hotspots, but those were never on critical paths for their performance metric. (Can be good for power saving though)
9. Their real hotspots are debug prints (in release build)
10. Their real hotspots are timer reads in debug prints (in release build)
11. (not very common, but happens). They call (trigger a call, maybe periodically) a shell script from a hotspot.
12. They don't have infrastructure to test under full/real load. Customer deployments are their only performance testing testbeds!
The list could go on for another dozen of silly things. But here is an important part that these projects have in common: I never have to go after u-arch specific optimizations. Often I have to fix several apparent system config issues. But then I actually see a code, and I always notice one thing: They usually really do care about performance! They often hire people who code always like on a whiteboard, getting close to best conceivable runtime in every function they write. They minimise locked code and use lockless data structures. But then comes the integration, and third party libraries, and just more complexity, and all their performance work on functions/algorithm level somehow just does not help much for the whole system.
Sometimes their motivation to call me is "we even called a s/w performance consultant and he could not help, our performance is already perfect." I can often notice that when I see pride they show when talking about O(N) in every function, except for the functions where it is O(log N) and O(1)...
OK, rant mode off. I'll go back to hackerrank to practice optimizing superficial problems that I never encounter as a performance consultant..