Jan. 4th, 2013

izard: (Default)
Here in Germany holidays are long over: I went to the office on 2nd, and now it is 3rd work day of a new year.

And now I got an interesting puzzle to solve from a customer. They gave me a micro benchmark, that runs for 500 cycles on one system, and it takes it 1000 cycles on another. (yes, numbers are that nice and round!) Systems are very similar, same frequency, no power management or turbo or speed step. I run the benchmark for 1000 times to warm up, then I run it for 10000000 times measuring cycles. Wall clock time difference is also 2x, like TSC time difference for individual runs.

As usual, first thing I did I ran it under Vtune (well actually first I ported it to Linux from RTOS, the port did not change performance numbers). I was expecting Vtune to show the same amount of instructions retired from both runs of the benchmark, and 2x cycles spent on slower system. Then I planned to look into places where CPI worsened and find the root cause.

It was not the case! Both instructions and clocks were the same for two runs under Vtune, but wall time difference was still 2x... Now will have to bisect the benchmark, and use other tools (internal and IACA) to understand what is actually happening.

Profile

izard: (Default)
izard

September 2025

S M T W T F S
 1 23456
78910111213
14151617181920
21222324252627
282930    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Oct. 11th, 2025 10:03 am
Powered by Dreamwidth Studios