AVX-512

Nov. 16th, 2017 09:06 am
izard: (Default)
[personal profile] izard
Yesterday I wrote a simple function using AVX512 intrinsics for my customer. A customer created AVX512 function that was only 8x faster than non vectorized. My version is now ~30x faster.

While working on this function, I noticed that mask register read and write only get 16 bits, not whole 64 bits. Then I found an interesting thread (4 years old) where Agner Fog and Intel engineers were discussing that limitation.
This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Profile

izard: (Default)
izard

November 2025

S M T W T F S
       1
2345678
910 1112131415
1617 1819202122
23242526272829
30      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 12th, 2026 07:08 pm
Powered by Dreamwidth Studios