Hard realtime by brute force, x86
Jul. 18th, 2012 09:57 amJust a rough and quick text on using X86 for hard realtime apps.
There are some architectures where developer can estimate application's timings with nearly cycles accuracy. This applies to both interrupt timings and execution timings. Not so on x86. OOO, HT, shared cache and buses, integrated GPU, power management, SMIs, VT all contribute to jitter. Some of those (HT, SMIs, VT, PM, GPU can be switched off).
When deadline is met because execution time plus worst case jitter is always below target, we may consider system is hard RT. There are three ways to achieve that: set less challenging deadline, reduce worst case jitter and optimize execution time. The latter is where x86 shines, and minimal jitter is well known advantage of CPU architectures more established in RT.
Setting less challenging deadline helps a lot, e.g. if system cycle time is ~10ms, and deadline is ~2ms, then even user mode process in general purpose OS could make it, with minor tweaks. When cycle time in 500us, deadline 100us, we can still use GPOS but may would need to run RT task in timer interrupt context.
When cycle time is 50us, and deadline is 20us, on x86 we have no choice but run RT code only in APIC timer interrupt handler. Also in this case high CPU frequency and low Clocks per Instruction ratio helps meeting deadline despite the jitter.
At the times when CPU frequency was growing, there was a clear gap between X86 and others, which helped reducing execution time to extent that it compensated the difference in jitter. Now high end microcontrollers have almost the same frequency as lowest power embedded Atom and Core CPUs. Of course IPC is much better on x86, and is improving by 5-10% with each generation, but it is not a panacea especially when we are IO bound. Other x86 performance advantages like multicore and SIMD are also rarely helpful for typical RT control tasks. So to shine in RT x86 has to contain jitter in 1us range which is a difficult task especially on multicore.
There are some architectures where developer can estimate application's timings with nearly cycles accuracy. This applies to both interrupt timings and execution timings. Not so on x86. OOO, HT, shared cache and buses, integrated GPU, power management, SMIs, VT all contribute to jitter. Some of those (HT, SMIs, VT, PM, GPU can be switched off).
When deadline is met because execution time plus worst case jitter is always below target, we may consider system is hard RT. There are three ways to achieve that: set less challenging deadline, reduce worst case jitter and optimize execution time. The latter is where x86 shines, and minimal jitter is well known advantage of CPU architectures more established in RT.
Setting less challenging deadline helps a lot, e.g. if system cycle time is ~10ms, and deadline is ~2ms, then even user mode process in general purpose OS could make it, with minor tweaks. When cycle time in 500us, deadline 100us, we can still use GPOS but may would need to run RT task in timer interrupt context.
When cycle time is 50us, and deadline is 20us, on x86 we have no choice but run RT code only in APIC timer interrupt handler. Also in this case high CPU frequency and low Clocks per Instruction ratio helps meeting deadline despite the jitter.
At the times when CPU frequency was growing, there was a clear gap between X86 and others, which helped reducing execution time to extent that it compensated the difference in jitter. Now high end microcontrollers have almost the same frequency as lowest power embedded Atom and Core CPUs. Of course IPC is much better on x86, and is improving by 5-10% with each generation, but it is not a panacea especially when we are IO bound. Other x86 performance advantages like multicore and SIMD are also rarely helpful for typical RT control tasks. So to shine in RT x86 has to contain jitter in 1us range which is a difficult task especially on multicore.