izard: (Default)
[personal profile] izard
I had a conversation with
[livejournal.com profile] avlasov recently in one of my earlier posts. It's worth moving to a special post.
Very technical, so goes under
[livejournal.com profile] avlasov notes:
It is strange that nobody yet made a static translator guided by statistics gathered in run-time. At least I have never met one.
In a broader context, I am thinking on static optimizers driven by verification+run-time stats, which should be very effective in many cases. PGO is an old and mature thing, but it is for HPC and similar things only. Like an OpenMP, it is not useful for object/declarative stuff. I meant not only managed runtime, but object-oriented, script and functional languages. They suffer from lack of perfromance often, and it's been a point for not using them in industry for a long time.
Traditional optimizations (C/Fortran style) are not useful for them and we can pretty good progress with JIT optimizations. But they are limited too, JIT developers are restricted doing optimization and analysis in runtime.
While mixing PGO and object/functional optimizations is a perfect way to get 'faster than C' code :). Or at least similar perfromance as C, which removes a biggest reason not to use them.

My answer was:
Your statement about OpenMP and especially PGO as applicable to HPC only is plain wrong! I would not have answered with harsh words, if it was not a very common misconceptions among good s/w engineers.
Here is my arguments:
1. PGO is common sense for C/C++ nowadays. It was developed first at HP labs in ~96, and later Intel produced great implementation for Itanium and later x86. Now MSFT is close with GCC coming to shape.
But you are right, this is novelty in managed runtime, and was discussed at runtime vendors for ages (I've been to few Harmony meetings discussing this)
2. PGO is very mature indeed, and it includes quite a number of small optimizations that uses dynamic profile to organize code to minimize pipeline stalls and use register file and caches most efficiently. This approach is good for any statically compiled language, be it OO or functional like OCaml. The only type of statically compiled applications that it won't work with - those with large code base where the code coverage profile is flat, random and small. There is close to none such apps in existence. I've seen one very custom OLAP system but it did not quite worked and I doubt they finally developed it to production stage.
2. From my experience as an Application Engineer (80% of the job was optimizing code of 3rd party ISVs for performance on Intel h/w), PGO helped for all apps types I've tried it on, giving from 5 to 25% improvement in performance, with about 10% mean.

About OpenMP - I also had this opinion for quite a long time, but it appeared later from my experience that besides HPC, OpenMP could be used as a quick threading prototyping tool when you are developing C/C++ application (it is not only for loops any more if you'd check latest standards). (For C++ though Intel Threading Building Blocks is even better for prototyping, it's architecture is superb though implementation is not full yet and I think it will never be, because Intel does not work quite well in marketing it's tools to community. So in the end I think guys from the Boost will take this architecture and create some more popular but monstrous beast.)

A weak form of PGO is used in BEA VM, IBM VM and Harmony VM and Hotspot(from rather recently) in a sense that JIT optimizations options are chosen depending on the profile.
You are right pointing out that e.g. Python and Ruby VMs are not as mature as Java. So it's just too early to discuss PGO in this context, it makes sense first to make it closer to Java VM performance wise, e.g. add most simple profile guided JIT.

Here is a full discussion.
This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Profile

izard: (Default)
izard

July 2025

S M T W T F S
  12345
67 8 91011 12
13141516171819
20212223242526
2728293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 28th, 2025 01:44 pm
Powered by Dreamwidth Studios