May. 15th, 2013

Gather

May. 15th, 2013 11:01 am
izard: (Default)
The same code (below), when compiled by the Intel compiler for and run on Ivy Bridge and Haswell works much faster on HSW than on IVB. (I can't disclose the exact speedup as HSW is not yet released, but the new instructions involved have not been secret for a while no and neither have the BKMs)
#define A (2*1024)
#define B 64
int a[A]; int b[B];

void main()
{
  int i,j,sum = 0, min = 100000000;
  ticks t1,t2;

  for (i = 0; i < A; i++) a[i] = i; // Init arrays
  for (i = 0; i < B; i++) b[i] = (i*113 + 113) % A;

  for (j = 0; j < 10000; j++)
  {
    sum = 0;
    t1 = rdtscll();
    for (i = 0; i < B; i++) sum += a[b[i]];
    t2 = rdtscll();
    if (min > (t2 - t1)) min = t2 - t1; // measure best case, dont warmup
  }
  printf ("sum = %i, min = %i\n", sum, min); // print sum to trick compiler
}

The new gather instructions are useful, and the compiler uses them automatically. Of course there are no miracles - as I increase A and B, the performance goes down, and eventually becomes on par with that of the old implementation because memory latency becomes a limiting factor.

Profile

izard: (Default)
izard

August 2025

S M T W T F S
     12
3456789
10111213 141516
17181920212223
24252627282930
31      

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 1st, 2025 10:58 pm
Powered by Dreamwidth Studios