Bugz [2/2]
Jun. 25th, 2013 06:10 pmIn a previous post, the code sequence I used to illustrate an issue was buggy on its own.
Here is the right one:
volatile float f1 = 2e-40f;
volatile float f2 = 3000000000.0f;
volatile float f3;
f3 = f1/f2;
f3 = f2/f1;
Regardless of x87 or SSE, the code above takes ~800 cycles without factor X, and 800 cycles to 5k cycles with factor X.
(Same for REP MOV which is another example of a complex microcode).
P.S. Factor X is very rare and so the issue is not too bad. Besides, 5k cycles is less than 2 microseconds so not a big deal for common PC/server uses either.
Here is the right one:
volatile float f1 = 2e-40f;
volatile float f2 = 3000000000.0f;
volatile float f3;
f3 = f1/f2;
f3 = f2/f1;
Regardless of x87 or SSE, the code above takes ~800 cycles without factor X, and 800 cycles to 5k cycles with factor X.
(Same for REP MOV which is another example of a complex microcode).
P.S. Factor X is very rare and so the issue is not too bad. Besides, 5k cycles is less than 2 microseconds so not a big deal for common PC/server uses either.