Magical platinum bullet #2
Jun. 27th, 2008 09:41 pm![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
That's really the main caveat that ruins the assumption of "60% "scalability"". However, there is a well known approach to extracting parallelism in spite of data and control dependencies : OOO superscalar pipelined execution.
That means that if I take a cycle-accurate emulator e.g. PTLSim which imposes 99.9% overhead, manage to tackle overhead down to 80% by somewhat reducing accuracy and parallelize for "60% scalability" then than we are done.
Extracting as much IPC as possible from x86 code stream was already done by Intel engineers, and cycle accurate simulator should be parallelizable e.g. by running the code simulating different pipeline stages and execution units on different cores. Unfortunately the theoretical IPC limit of 4.0 and practical - just about 2.0 makes "60% scalability on 16 cores" absolutely impossible.
P.S. The statement above about IPC limit also has a major logical flaw in it :)