Jan. 4th, 2013

izard: (Default)
Here in Germany holidays are long over: I went to the office on 2nd, and now it is 3rd work day of a new year.

And now I got an interesting puzzle to solve from a customer. They gave me a micro benchmark, that runs for 500 cycles on one system, and it takes it 1000 cycles on another. (yes, numbers are that nice and round!) Systems are very similar, same frequency, no power management or turbo or speed step. I run the benchmark for 1000 times to warm up, then I run it for 10000000 times measuring cycles. Wall clock time difference is also 2x, like TSC time difference for individual runs.

As usual, first thing I did I ran it under Vtune (well actually first I ported it to Linux from RTOS, the port did not change performance numbers). I was expecting Vtune to show the same amount of instructions retired from both runs of the benchmark, and 2x cycles spent on slower system. Then I planned to look into places where CPI worsened and find the root cause.

It was not the case! Both instructions and clocks were the same for two runs under Vtune, but wall time difference was still 2x... Now will have to bisect the benchmark, and use other tools (internal and IACA) to understand what is actually happening.

Profile

izard: (Default)
izard

July 2025

S M T W T F S
  12345
67 8 91011 12
13141516171819
20212223242526
2728293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 23rd, 2025 06:07 am
Powered by Dreamwidth Studios