Fix in Vtune, finally
Oct. 11th, 2011 12:44 pmAs usual I caught cold on a flight from US, so "working from home" now. (Will actually have to code few things, make a short ppt presentation and answer e-mails in the afternoon). So now I have some time for a short post on Vtune, the tool I use more often than anything else, including a compiler.
There was a well known bug/feature in Vtune for years. Just as a backgrounder, there are two kinds of performance events in CPU's PMU: precise and not-precise. When precise event counter overruns sampling value, PMU triggered interrupt occurs and IP should be, well, precise. If event is not precise then IP could fluctuate a bit.
I realize most of Vtune users don't use events other than most basic ones (CLOCK_TICKS/INSTRUCTIONS_RETIRED), some times adding cache misses events (which is not very productive btw, as not all cache misses are bad :))
Those who use more than two performance counters know that very often a sampling result, even with precise event, may look like:
code : LLC_MISS samples
array[i] = list->pnext.value; : 0
i++; : 123456789000
Of course LLC_MISSES are not a fault of a counter increment but rather likely a liked list access. So everyone knew about this bug/feature and that it could be accounted for seamlessly when interpreting a result.
However, now it got fixed, and when the same code is ran under vtune the samples will be displayed exactly where they happen. This is all good, however the bug/feature is so familiar, that it is easy to keep doing the "one instruction adjustment", which would now lead to inconsistent results.
So the question is should this kind of bugs get fixed even if it creates some confusion?
There was a well known bug/feature in Vtune for years. Just as a backgrounder, there are two kinds of performance events in CPU's PMU: precise and not-precise. When precise event counter overruns sampling value, PMU triggered interrupt occurs and IP should be, well, precise. If event is not precise then IP could fluctuate a bit.
I realize most of Vtune users don't use events other than most basic ones (CLOCK_TICKS/INSTRUCTIONS_RETIRED), some times adding cache misses events (which is not very productive btw, as not all cache misses are bad :))
Those who use more than two performance counters know that very often a sampling result, even with precise event, may look like:
code : LLC_MISS samples
array[i] = list->pnext.value; : 0
i++; : 123456789000
Of course LLC_MISSES are not a fault of a counter increment but rather likely a liked list access. So everyone knew about this bug/feature and that it could be accounted for seamlessly when interpreting a result.
However, now it got fixed, and when the same code is ran under vtune the samples will be displayed exactly where they happen. This is all good, however the bug/feature is so familiar, that it is easy to keep doing the "one instruction adjustment", which would now lead to inconsistent results.
So the question is should this kind of bugs get fixed even if it creates some confusion?