Norwig's latencies list, revised
May. 29th, 2014 06:21 pmThere is a well known source of latencies numbers published by Peter Norwig, that keeps being reposted in lj and elsewhere.
Of course the numbers were correct at a time of posting but now are a bit dated, I guess it was first published in ~2003. By a PC ppl usually mean a PC with an Intel CPU, and since 2008 when Nehalem CPU was released, cache timings and frequencies did not change much. Let's assume CPU frequency is 3Ghz.
The first number in Peter's data is "execute typical instruction - 1ns". Average CPI could be 0.9. So the execution time will be 0.27ns. (And of course I can assure you that individual instruction execution latency is an utterly useless metric. Curiously enough, there are areas in embedded where they publish this metric for new platforms :)). CPU frequency does not grow, but CPI improves by several % points every year.
L1 hit is 0.5ns according to Norwig. More than 10 years ago L1 hits were fast, but since 2008 they take at least 4 cycles, so 1.2ns.
L2 hit is listed as 7ns, however cache hierarchy architecture was very stable in recent years, and L2 hit is 11 cycles, so 3.3ns.
Branch mispredict is a pipeline flush, and it still costs ~5ns.
Memory latency improved 2x, being closer to 50ns than to 100ns as in the article.
I think many readers, practicing programmers will think "who cares", and some even describe in comments that programming is no longer about counting nanoseconds, and the only performance that matters is programming speed.
Of course the numbers were correct at a time of posting but now are a bit dated, I guess it was first published in ~2003. By a PC ppl usually mean a PC with an Intel CPU, and since 2008 when Nehalem CPU was released, cache timings and frequencies did not change much. Let's assume CPU frequency is 3Ghz.
The first number in Peter's data is "execute typical instruction - 1ns". Average CPI could be 0.9. So the execution time will be 0.27ns. (And of course I can assure you that individual instruction execution latency is an utterly useless metric. Curiously enough, there are areas in embedded where they publish this metric for new platforms :)). CPU frequency does not grow, but CPI improves by several % points every year.
L1 hit is 0.5ns according to Norwig. More than 10 years ago L1 hits were fast, but since 2008 they take at least 4 cycles, so 1.2ns.
L2 hit is listed as 7ns, however cache hierarchy architecture was very stable in recent years, and L2 hit is 11 cycles, so 3.3ns.
Branch mispredict is a pipeline flush, and it still costs ~5ns.
Memory latency improved 2x, being closer to 50ns than to 100ns as in the article.
I think many readers, practicing programmers will think "who cares", and some even describe in comments that programming is no longer about counting nanoseconds, and the only performance that matters is programming speed.