Cache partitioning.
Jan. 28th, 2010 01:45 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Shared cache in a CPU is a great thing for multicore - it allows efficient data sharing between cores and almost always efficiently shares capacity.
What if developer thinks cache is not shared fairly between e.g. 2 cores? There are no means to explicitly control this. But here is a workaround, a weird one though. If we write a custom allocator that only allocates data starting from addresses that go to 0-7th cache sets on 1st core, and 7th-15th sets on a second core, then we effectively make the cache a non-shared one. Unfortunately, the biggest continuous area of memory allocator could accommodate is 512 bytes then (64 bytes cache line multiplied by cache sets divided by 2 cores). The more data is allocated through this "weird cache-conscious allocator", the more fair it gets.
512 bytes cap is very annoying and thus likely not realistic for practical use, but if we would have 128-way, not 16-way last level shared cache, the cap would go up to 4k that would work naturally with OS VM mechanics :). Fortunately last level cache is tagged by physical address, so this would lift the continuous memory limit, not just make it 4k and move complexity from allocator to OS.
Upd: after careful study of prior art, it looks like I've reinvented a wheel, and made it a square one rather then round. There is a better way to partition shared cache than I've described above.
What if developer thinks cache is not shared fairly between e.g. 2 cores? There are no means to explicitly control this. But here is a workaround, a weird one though. If we write a custom allocator that only allocates data starting from addresses that go to 0-7th cache sets on 1st core, and 7th-15th sets on a second core, then we effectively make the cache a non-shared one. Unfortunately, the biggest continuous area of memory allocator could accommodate is 512 bytes then (64 bytes cache line multiplied by cache sets divided by 2 cores). The more data is allocated through this "weird cache-conscious allocator", the more fair it gets.
512 bytes cap is very annoying and thus likely not realistic for practical use, but if we would have 128-way, not 16-way last level shared cache, the cap would go up to 4k that would work naturally with OS VM mechanics :). Fortunately last level cache is tagged by physical address, so this would lift the continuous memory limit, not just make it 4k and move complexity from allocator to OS.
Upd: after careful study of prior art, it looks like I've reinvented a wheel, and made it a square one rather then round. There is a better way to partition shared cache than I've described above.