May. 15th, 2013

Gather

May. 15th, 2013 11:01 am
izard: (Default)
The same code (below), when compiled by the Intel compiler for and run on Ivy Bridge and Haswell works much faster on HSW than on IVB. (I can't disclose the exact speedup as HSW is not yet released, but the new instructions involved have not been secret for a while no and neither have the BKMs)
#define A (2*1024)
#define B 64
int a[A]; int b[B];

void main()
{
  int i,j,sum = 0, min = 100000000;
  ticks t1,t2;

  for (i = 0; i < A; i++) a[i] = i; // Init arrays
  for (i = 0; i < B; i++) b[i] = (i*113 + 113) % A;

  for (j = 0; j < 10000; j++)
  {
    sum = 0;
    t1 = rdtscll();
    for (i = 0; i < B; i++) sum += a[b[i]];
    t2 = rdtscll();
    if (min > (t2 - t1)) min = t2 - t1; // measure best case, dont warmup
  }
  printf ("sum = %i, min = %i\n", sum, min); // print sum to trick compiler
}

The new gather instructions are useful, and the compiler uses them automatically. Of course there are no miracles - as I increase A and B, the performance goes down, and eventually becomes on par with that of the old implementation because memory latency becomes a limiting factor.

Profile

izard: (Default)
izard

June 2025

S M T W T F S
1234567
891011121314
15161718192021
22 23242526 2728
2930     

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 7th, 2025 10:02 am
Powered by Dreamwidth Studios