Big part of my job last 6 years is quickly understanding performance characteristics of big unfamiliar software. So in this post I skip debugger, compiler, and tools focused on optimizing compute kernels (IACA).
90% of customers I work with use Linux OS, and I have to investigate system performance/scalability as well as application's performance.
Intel Vtune was the tool for a while. Than there was also PTU for some time. Statistical call graph and un-core performance counters were useful. Uncore counters support is still most convenient in PTU.
I could have used oprofile, but it lacked many features I need, and the only extra feature it has (support for AMD's counters) was never relevant for me.
Then Vtune was re-designed nearly from scratch and renamed.
Intel Vtune amplifier XE (former Vtune) is way more convenient and easy to use than old one. And it has statistical call graph and threading visualization which together help to get a quick picture on how complex multithreaded app behaves.
When Linux kernel is involved, Vtune (old and new) still do the trick. There were a bunch of other tools which did not seem to be very useful for me (systemtap, oprofile, etc).
Until they introduced perf in kernel/tools. That one is perfect, at last now they have in-kernel tool that is easy to use, and it gets meaningful performance data in a comprehansible format, is flexible and extendable and well documented (the former is sooo rare for kernel subsystems!).
90% of customers I work with use Linux OS, and I have to investigate system performance/scalability as well as application's performance.
Intel Vtune was the tool for a while. Than there was also PTU for some time. Statistical call graph and un-core performance counters were useful. Uncore counters support is still most convenient in PTU.
I could have used oprofile, but it lacked many features I need, and the only extra feature it has (support for AMD's counters) was never relevant for me.
Then Vtune was re-designed nearly from scratch and renamed.
Intel Vtune amplifier XE (former Vtune) is way more convenient and easy to use than old one. And it has statistical call graph and threading visualization which together help to get a quick picture on how complex multithreaded app behaves.
When Linux kernel is involved, Vtune (old and new) still do the trick. There were a bunch of other tools which did not seem to be very useful for me (systemtap, oprofile, etc).
Until they introduced perf in kernel/tools. That one is perfect, at last now they have in-kernel tool that is easy to use, and it gets meaningful performance data in a comprehansible format, is flexible and extendable and well documented (the former is sooo rare for kernel subsystems!).