Nov. 16th, 2017

AVX-512

Nov. 16th, 2017 09:06 am
izard: (Default)
Yesterday I wrote a simple function using AVX512 intrinsics for my customer. A customer created AVX512 function that was only 8x faster than non vectorized. My version is now ~30x faster.

While working on this function, I noticed that mask register read and write only get 16 bits, not whole 64 bits. Then I found an interesting thread (4 years old) where Agner Fog and Intel engineers were discussing that limitation.

Profile

izard: (Default)
izard

October 2025

S M T W T F S
   1234
567891011
12131415161718
19202122232425
26 272829 3031 

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Oct. 31st, 2025 04:05 am
Powered by Dreamwidth Studios