Nov. 16th, 2017

AVX-512

Nov. 16th, 2017 09:06 am
izard: (Default)
Yesterday I wrote a simple function using AVX512 intrinsics for my customer. A customer created AVX512 function that was only 8x faster than non vectorized. My version is now ~30x faster.

While working on this function, I noticed that mask register read and write only get 16 bits, not whole 64 bits. Then I found an interesting thread (4 years old) where Agner Fog and Intel engineers were discussing that limitation.

Profile

izard: (Default)
izard

November 2025

S M T W T F S
       1
2345678
910 1112131415
1617 1819202122
23242526272829
30      

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Nov. 30th, 2025 10:02 am
Powered by Dreamwidth Studios