Frequently Asked Questions
|Draft / Work in progress|
FAQ for realtime support in Linux Kernel
Hardware causes of ISR latency
A good real time behavior of a system depends a lot on low latency interrupt handling. Taking a look at the X86 platform, it shows that this platform is not optimised for RT usage. Several mechanisms cause ISR latencies that can run into the 10's of microseconds. Knowing them will enable you to make the best design choices on this platform to enable you to work around the negative impact.
- DMA bus mastering: Bus mastering events can cause long-latency CPU stalls of many microseconds. It can be generated by every device that uses DMA, such as SATA/PATA/SCSI devices and even network adapters. Also video cards that insert wait cycles on the bus in response to a CPU access can cause this kind of latency. Sometimes the behavior of such peripherals can be controlled from the driver, trading off throughput for lower latency. The negative impact of bus mastering is independent from the chosen OS, so this is not a unique problem for Linux-RT, even other RTOS-es experience these type of latency!
- On-demand CPU scaling: creates long-latency events when the CPU is put in a low-power-consumption state after a period of inactivity. Such problems are usually quite easy to detect. (e.g. On Fedora the 'cpuspeed' tool should be disabled, as this tool loads the ondemand scaling_governor driver)
- VGA Console: When the system is fulfilling its RT requirements the VGA Text Console must be left untouched. Nothing is allowed to be written to that console, even printk's are not allowed. This VGA text console causes very large latencies, up to more than hundreds of microseconds. It is better to use a serial console and have no login shell on the VGA text console. Also SSH or Telnet sessions can be used. The 'quiet' option on the kernel command line could also be useful to prevent preventing any printk to reach the console. Notice that using a graphical UI of X has no RT-impact, it is just the VGA text console that causes latencies.
Latencies caused by Page-faults
Whenever the RT process runs into a page-fault the kernel freezes the entire process (with all its threads in it), until the kernel has handled the page fault. There are 2 types of pagefaults, major and minor pagefaults. Minor pagefaults are handled without IO accesses. Major pagefaults are pagefaults that are handled by means of IO activity. Page faults are therefor dangerous for RT applications and need to be prevented.
If there is no Swap space used, and no other applications stress the memory boundaries, then there is enough free RAM ready for the RT application to be used. In this case the RT-application will only run into minor pagefaults, which cause relatively small latencies. But, if the RT application is just one of the many applications on the system, and there is Swap space used, then special actions has to be taken to protect the memory of the RT-application. If memory has to be retrieved from disk or pushed towards the disk to handle a page fault, the RT-application will experience very large latencies, sometimes up to more than a second! Notice that pagefaults of one application cannot interfere the RT-behavior of another application.
During startup a RT-application will always experience a lot of pagefaults. These cannot be prevented. In fact, this startup period must be used to claim and lock enough memory for the RT-process in RAM. This must be done in such a way that when the application needs to expose its RT capabilities, pagefaults do not occur anymore.
This can be done by taking care of the following during the initial startup phase:
- Call directly from the main() entry the mlockall() call.
- Create all threads at startup time of the application, and touch each page of the entire stack of each thread. Never start threads dynamically during RT show time, this will ruin RT behavior.
- Never use system calls that are known to generate pagefaults, such as fopen(). (Opening of files does the mmap() system call, which generates a page-fault).
- Do not use 'compile time static arrays' without initializing them directly after startup, before RT show time.
File handling is known to generate disastrous pagefaults. So, if there is a need for file access from the context of the RT-application, then this can be done best by splitting the application in an RT part and a file-handling part. Both parts are allowed to communicate through sockets. I have never seen a page fault caused by socket traffic. Note: While accessing files the low-level fopen() call will do a mmap() to allocate new memory to the process, resulting in a new pagefault.