AVX-512

Nov. 16th, 2017 09:06 am
izard: (Default)
[personal profile] izard
Yesterday I wrote a simple function using AVX512 intrinsics for my customer. A customer created AVX512 function that was only 8x faster than non vectorized. My version is now ~30x faster.

While working on this function, I noticed that mask register read and write only get 16 bits, not whole 64 bits. Then I found an interesting thread (4 years old) where Agner Fog and Intel engineers were discussing that limitation.

Profile

izard: (Default)
izard

November 2025

S M T W T F S
       1
2345678
910 1112131415
1617 1819202122
23242526272829
30      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 16th, 2026 07:19 am
Powered by Dreamwidth Studios