Aug. 29th, 2016

izard: (Default)
I am investigating a performance issue reported by a customer, as usual. Somewhere mid-way, when I had a solid reproducible (as I thought), I noticed a strange thing:

The workload was processing traffic at 44Gb/sec when it came to saturation (there were free CPU cycles, plenty of memory and PCIe throughput, so I had to find what is a limiting factor). For a start, I wanted to compare Vtune profile of 43Gb/sec (slightly below saturation), and 45Gb/sec (slightly above). What I noticed was that both profiles are the same, and there is no saturation at 45Gb/sec. So I increased a dial at my traffic generator, and found that when I run Vtune collector, my application starts losing packets at 48Gb/sec, not 44Gb/sec. Wtf? Quantum effect - observer affects measurement, and helps a lot.

Before recommending a customer to run a vtune collector forever (kidding), I double checked my setup and found that one of the processing threads was running at a wrong NUMA node. When I moved it to the right one, the saturation happened at 48Gb/sec no matter if I was profiling the SUT or not. Pfff.

Profile

izard: (Default)
izard

July 2025

S M T W T F S
  12345
67 8 91011 12
13141516171819
20212223242526
2728293031  

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 13th, 2025 10:56 pm
Powered by Dreamwidth Studios