izard | AVX-512 (Reply)

You're viewing

izard's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

Yesterday I wrote a simple function using AVX512 intrinsics for my customer. A customer created AVX512 function that was only 8x faster than non vectorized. My version is now ~30x faster.

While working on this function, I noticed that mask register read and write only get 16 bits, not whole 64 bits. Then I found an interesting thread (4 years old) where Agner Fog and Intel engineers were discussing that limitation.

Crossposts: https://izard.livejournal.com/238113.html

From:

Anonymous This account has disabled anonymous posting.

OpenID

Dreamwidth account

If you don't have an account you can create one now.

Subject

HTML doesn't work in the subject.

Formatting type

Message

Profile

izard

November 2025

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Most Popular Tags

Style Credit

Style: Neutral Good for Practicality by timeasmymeasure

Expand Cut Tags

No cut tags

Page generated May. 18th, 2026 08:50 am

Powered by Dreamwidth Studios