izard: (Default)
izard ([personal profile] izard) wrote2016-06-17 11:33 am
Entry tags:

Skylake news

Skylake is the currently selling CPU, and it is a good one. (I have it in my home desktop too).
But there is a small undocumented regression. It should not bother anyone, unless in very specific settings.

Here is the code to test it:

// code starts, Compile with gcc –O1
#include < stdint.h>
#include < stdio.h>
#include < sys/io.h>
#include < unistd.h>
#include < cpuid.h>

#define BASEPORT 0x70

static inline uint64_t rdtsc()
{
uint64_t ret;
asm volatile ( "rdtscp" : "=A"(ret) );
return ret;
}

void main()
{
uint64_t t1, t2, t3;
int l, a, b, c, d, r;
if (ioperm(BASEPORT, 3, 1)) {perror("ioperm"); exit(1);}

r = inb(BASEPORT);
__get_cpuid(l, &a, &b , &c, &d);
t1 = rdtsc();
result = inb(BASEPORT);
__get_cpuid(l, &a, &b , &c, &d);// only to serialize, adds ~1k cycles
t2 = rdtsc();
t3 = t2 - t1;
printf("status: %llu\n", t3);

/* We don't need the ports anymore */
if (ioperm(BASEPORT, 3, 0)) {perror("ioperm"); exit(1);}
}
// code ends

The code just reads RTC (but does not parse the output). On Broadwell, previous gen, it takes less than 1k cycles. On Skylake, it takes 200k-500k cycles. Why? If you would read MSR 0x34 (SMIs handled since boot) before and after the test on Broadwell, the increment will be 0. On Skylake, it would be 2.

So any IN or OUT operation on ports below 0xff are now handled in SMI. This is legacy IO that is rarely used in a modern OS, and even if it is occasionally used, ~10 microseconds delay would not affect performance. Unless you are trying to control equipment with 31.25 microsecond response time ;)