May. 15th, 2013

Gather

May. 15th, 2013 11:01 am
izard: (Default)
The same code (below), when compiled by the Intel compiler for and run on Ivy Bridge and Haswell works much faster on HSW than on IVB. (I can't disclose the exact speedup as HSW is not yet released, but the new instructions involved have not been secret for a while no and neither have the BKMs)
#define A (2*1024)
#define B 64
int a[A]; int b[B];

void main()
{
  int i,j,sum = 0, min = 100000000;
  ticks t1,t2;

  for (i = 0; i < A; i++) a[i] = i; // Init arrays
  for (i = 0; i < B; i++) b[i] = (i*113 + 113) % A;

  for (j = 0; j < 10000; j++)
  {
    sum = 0;
    t1 = rdtscll();
    for (i = 0; i < B; i++) sum += a[b[i]];
    t2 = rdtscll();
    if (min > (t2 - t1)) min = t2 - t1; // measure best case, dont warmup
  }
  printf ("sum = %i, min = %i\n", sum, min); // print sum to trick compiler
}

The new gather instructions are useful, and the compiler uses them automatically. Of course there are no miracles - as I increase A and B, the performance goes down, and eventually becomes on par with that of the old implementation because memory latency becomes a limiting factor.

Profile

izard: (Default)
izard

July 2025

S M T W T F S
  12345
67 8 91011 12
13141516171819
20212223242526
27 28293031  

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 11th, 2025 06:25 pm
Powered by Dreamwidth Studios