Nov. 16th, 2017

AVX-512

Nov. 16th, 2017 09:06 am
izard: (Default)
Yesterday I wrote a simple function using AVX512 intrinsics for my customer. A customer created AVX512 function that was only 8x faster than non vectorized. My version is now ~30x faster.

While working on this function, I noticed that mask register read and write only get 16 bits, not whole 64 bits. Then I found an interesting thread (4 years old) where Agner Fog and Intel engineers were discussing that limitation.

Profile

izard: (Default)
izard

September 2025

S M T W T F S
 1 23456
78910111213
14151617181920
21222324252627
282930    

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 29th, 2025 05:40 pm
Powered by Dreamwidth Studios