izard

AVX on Sandy Bridge architecture is only suitable for floating point, and for very limited set of integer ops. In the future CPU generations, it may be extended to broader integer operations support.

I think still it will support only vector operations over 32 bit (or bigger) values. So when e.g. in some crypto code I need to do vectorized ROTL on 16 bit ints, I have to write something like:

// C++ vector pseudo-code
__m256i REG; // input
__m256i CARRY;

// repeat for each bit shifted.
CARRY = REG & 1000000000000000100000000000000b;
REG = REG * 2; // main shift
CARRY = CARRY / (1>>15); // carry shift
REG = REG ^ CARRY; // clear carry bits if set in REG
REG = REG | CARRY; // set carry bits in REG

I don't see how this sequence can be made smaller and I don't see how it can be easily pipelined.

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

May. 3rd, 2011

May. 3rd, 2011

int AVX

Profile

June 2025

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags