HOWTO: Build an RT-application

From RTwiki
(Difference between revisions)
Jump to: navigation, search
(Add and introduction section which explains the preconditions and how this HOWTO is (will be) structured.)
(Restructered/moved the sections. No changes made.)
Line 1: Line 1:
<p>This document describes the steps to writing hard real time Linux programs while using the real time Preemption Patch.
+
This document describes the steps to writing hard real time Linux programs while using the real time Preemption Patch. It also describes the pitfalls that destroy the real time responsiveness. It focuses on x86 and ARM, although the concepts are also valid on other architectures, as long as Glibc is used. (Some fundamental parts lack in uClibc, like for example PI-mutex support and the control of malloc/new behaviour, so uClibc is not recommended)
It also describes the pitfalls that destroy the real time responsiveness. It focuses on x86 and ARM, although the concepts are also valid on other architectures, as long as Glibc is used. (Some fundamental parts lack in uClibc, like for example PI-mutex support and the control of malloc/new behaviour, so uClibc is not recommended)</p>
+
  
 
==Introduction==
 
==Introduction==
Line 11: Line 10:
 
* Driver
 
* Driver
  
==Latencies==
+
==Hardware==
===Hardware causes of ISR latency===
+
===ISR latency===
  
 
A good real time behaviour of a system depends a lot on low latency interrupt handling.
 
A good real time behaviour of a system depends a lot on low latency interrupt handling.
Line 20: Line 19:
  
 
; DMA bus mastering: Bus mastering events can cause long-latency CPU stalls of many microseconds. It can be generated by every device that uses DMA, such as SATA/PATA/SCSI devices and even network adapters. Also video cards that insert wait cycles on the bus in response to a CPU access can cause this kind of latency. Sometimes the behaviour of such peripherals can be controlled from the driver, trading off throughput for lower latency. The negative impact of bus mastering is independent from the chosen OS, so this is not a unique problem for Linux-RT, even other RTOS-es experience these type of latency!
 
; DMA bus mastering: Bus mastering events can cause long-latency CPU stalls of many microseconds. It can be generated by every device that uses DMA, such as SATA/PATA/SCSI devices and even network adapters. Also video cards that insert wait cycles on the bus in response to a CPU access can cause this kind of latency. Sometimes the behaviour of such peripherals can be controlled from the driver, trading off throughput for lower latency. The negative impact of bus mastering is independent from the chosen OS, so this is not a unique problem for Linux-RT, even other RTOS-es experience these type of latency!
 
; On-demand CPU scaling: creates long-latency events when the CPU is put in a low-power-consumption state after a period of inactivity. Such problems are usually quite easy to detect. (e.g. On Fedora the 'cpuspeed' tool should be disabled, as this tool loads the on-demand scaling_governor driver)
 
 
; VGA Console: When the system is fulfilling its RT requirements the VGA Text Console must be left untouched. Nothing is allowed to be written to that console, even printk's are not allowed. This VGA text console causes very large latencies, up to more than hundreds of microseconds. It is better to use a serial console and have no login shell on the VGA text console. Also SSH or Telnet sessions can be used. The 'quiet' option on the kernel command line could also be useful to prevent preventing any printk to reach the console. Notice that using a graphical UI of X has no RT-impact, it is just the VGA text console that causes latencies.
 
  
 
====Hints for getting rid of SMI interrupts on x86====
 
====Hints for getting rid of SMI interrupts on x86====
Line 36: Line 31:
 
{{WARN|Do not ever disable the SMI interrupts globally. Disabling SMI may cause serious harm to your computer. On P4 systems you can '''burn your CPU to death''', when SMI is disabled. SMIs are also used to fix up chip bugs, so certain components may not work as expected when SMI is disabled. So, be very sure you '''know what you are doing''' before disabling any SMI interrupt. }}
 
{{WARN|Do not ever disable the SMI interrupts globally. Disabling SMI may cause serious harm to your computer. On P4 systems you can '''burn your CPU to death''', when SMI is disabled. SMIs are also used to fix up chip bugs, so certain components may not work as expected when SMI is disabled. So, be very sure you '''know what you are doing''' before disabling any SMI interrupt. }}
  
===Latencies caused by Page-faults===
+
==Kernel configuration==
 +
 
 +
; On-demand CPU scaling: creates long-latency events when the CPU is put in a low-power-consumption state after a period of inactivity. Such problems are usually quite easy to detect. (e.g. On Fedora the 'cpuspeed' tool should be disabled, as this tool loads the on-demand scaling_governor driver)
 +
 
 +
==Application==
 +
 
 +
; VGA Console: When the system is fulfilling its RT requirements the VGA Text Console must be left untouched. Nothing is allowed to be written to that console, even printk's are not allowed. This VGA text console causes very large latencies, up to more than hundreds of microseconds. It is better to use a serial console and have no login shell on the VGA text console. Also SSH or Telnet sessions can be used. The 'quiet' option on the kernel command line could also be useful to prevent preventing any printk to reach the console. Notice that using a graphical UI of X has no RT-impact, it is just the VGA text console that causes latencies.
 +
 
 +
; Latencies caused by Page-faults:
 
There are 2 types of page-faults, major and minor pagefaults. Minor pagefaults are handled without IO accesses. Major page-faults are page-faults that are handled by means of IO activity. The Linux page swapping mechanism can swap code pages of an application to disk, it will take a long time to swap those pages back into RAM. If such a page belongs to the realtime process, latencies are hugely increased. Page-faults are therefore dangerous for RT applications and need to be prevented.
 
There are 2 types of page-faults, major and minor pagefaults. Minor pagefaults are handled without IO accesses. Major page-faults are page-faults that are handled by means of IO activity. The Linux page swapping mechanism can swap code pages of an application to disk, it will take a long time to swap those pages back into RAM. If such a page belongs to the realtime process, latencies are hugely increased. Page-faults are therefore dangerous for RT applications and need to be prevented.
  
Line 49: Line 52:
 
* Reserve a pool of memory to do new/delete or malloc/free in, if you require dynamic memory allocation.
 
* Reserve a pool of memory to do new/delete or malloc/free in, if you require dynamic memory allocation.
 
* Never use system calls that are known to generate pagefaults, like system calls that allocate memory inside the kernel.
 
* Never use system calls that are known to generate pagefaults, like system calls that allocate memory inside the kernel.
<BR>
+
 
 
There are several examples that show the several aspects of preventing page-faults. It depends on the your requirements which suits best for your purpose.
 
There are several examples that show the several aspects of preventing page-faults. It depends on the your requirements which suits best for your purpose.
 
* [[Simple memory locking example]]: Single threaded application doing a malloc() and make it safe to use.
 
* [[Simple memory locking example]]: Single threaded application doing a malloc() and make it safe to use.
 
* [[Dynamic memory allocation example]]: Same as [[Simple memory locking example]], except it creates a pool of memory to be used for dynamic memory allocation
 
* [[Dynamic memory allocation example]]: Same as [[Simple memory locking example]], except it creates a pool of memory to be used for dynamic memory allocation
 
* [[Threaded RT-application with memory locking and stack handling example]]: Same as [[Dynamic memory allocation example]], but now supports threads.
 
* [[Threaded RT-application with memory locking and stack handling example]]: Same as [[Dynamic memory allocation example]], but now supports threads.
*mlockall() should be called within the application to prevent the page out of memory for the real time application.
+
* mlockall() should be called within the application to prevent the page out of memory for the real time application.
<BR>
+
 
====Global variables and arrays====
+
: Global variables and arrays;
 
Global variables and arrays are not part of the binary, but are allocated by the OS at process startup. The virtual memory pages associated to this data is not immediately mapped to physical pages of RAM, meaning that page faults occur on access. It turns out that the mlockall() call forces all global variables and arrays into RAM, meaning that subsequent access to this memory does not result in page faults. As such, using global variables and arrays do not introduce any additional problems for real time applications. You can verify this behaviour using the following program (run as 'root' to allow the mlockall() operation)
 
Global variables and arrays are not part of the binary, but are allocated by the OS at process startup. The virtual memory pages associated to this data is not immediately mapped to physical pages of RAM, meaning that page faults occur on access. It turns out that the mlockall() call forces all global variables and arrays into RAM, meaning that subsequent access to this memory does not result in page faults. As such, using global variables and arrays do not introduce any additional problems for real time applications. You can verify this behaviour using the following program (run as 'root' to allow the mlockall() operation)
<BR>
+
 
 
[[Verifying the absence of page faults in global arrays proof]]
 
[[Verifying the absence of page faults in global arrays proof]]
<BR>
 
<BR>
 
  
===[[Priority Inheritance]] Mutex support===
+
; [[Priority Inheritance]] Mutex support: A real-time system '''cannot''' be real-time if there is no solution for [[priority inversion]], this will cause undesired latencies and even deadlocks. (see [http://en.wikipedia.org/wiki/Priority_inversion])
A real-time system '''cannot''' be real-time if there is no solution for [[priority inversion]], this will cause undesired latencies and even deadlocks. (see [http://en.wikipedia.org/wiki/Priority_inversion])
+
<BR>On Linux luckily there is a solution for it in user-land since kernel version 2.6.18 together with Glibc 2.5 (PTHREAD_PRIO_INHERIT).
+
<BR>So, if user-land real-time is important, I highly encourage you to use a recent kernel and Glibc-library. Other C-libraries like uClibc do not support PI-futexes at this moment, and are therefore less suitable for realtime!
+
  
 +
On Linux luckily there is a solution for it in user-land since kernel version 2.6.18 together with Glibc 2.5 (PTHREAD_PRIO_INHERIT).
 +
 +
So, if user-land real-time is important, I highly encourage you to use a recent kernel and Glibc-library. Other C-libraries like uClibc do not support PI-futexes at this moment, and are therefore less suitable for realtime!
  
 
==Building Device Drivers==
 
==Building Device Drivers==
===Interrupt Handling===
+
; Interrupt Handling: The RT-kernel handles all the interrupt handlers in thread context. However, the real hardware interrupt context is still available. This context can be recognised on the IRQF_NODELAY flag that is assigned to a certain interrupt handler during request_irq() or setup_irq(). Within this context a much more limited kernel API is allowed to be used.
The RT-kernel handles all the interrupt handlers in thread context. However, the real hardware interrupt context is still available. This context can be recognised on the IRQF_NODELAY flag that is assigned to a certain interrupt handler during request_irq() or setup_irq(). Within this context a much more limited kernel API is allowed to be used.
+
 
====Things you should not do in IRQF_NODELAY context====
+
; Things you should not do in IRQF_NODELAY context:
 
* Calling any kernel API that uses normal spinlocks. Spinlocks are converted to mutexes on RT, and mutexes can sleep due its nature. (Note: the atomic_spinlock_t types behave the same as on a non-RT kernel) Some kernel API's that can block on a spinlock/RT-mutex:
 
* Calling any kernel API that uses normal spinlocks. Spinlocks are converted to mutexes on RT, and mutexes can sleep due its nature. (Note: the atomic_spinlock_t types behave the same as on a non-RT kernel) Some kernel API's that can block on a spinlock/RT-mutex:
 
** wake_up() shall not be used, use wake_up_process() instead.
 
** wake_up() shall not be used, use wake_up_process() instead.
 
** up() shall not be used in this context, this is valid for all semaphore types, thus both ''struct compat_semaphore'', as well as ''struct semaphore''. (of course the same is valid for down()...)
 
** up() shall not be used in this context, this is valid for all semaphore types, thus both ''struct compat_semaphore'', as well as ''struct semaphore''. (of course the same is valid for down()...)
 
** complete(): Uses also a normal spinlock which is defined in 'struct __wait_queue_head' in wait.h, thus not safe.
 
** complete(): Uses also a normal spinlock which is defined in 'struct __wait_queue_head' in wait.h, thus not safe.

Revision as of 11:07, 16 December 2013

This document describes the steps to writing hard real time Linux programs while using the real time Preemption Patch. It also describes the pitfalls that destroy the real time responsiveness. It focuses on x86 and ARM, although the concepts are also valid on other architectures, as long as Glibc is used. (Some fundamental parts lack in uClibc, like for example PI-mutex support and the control of malloc/new behaviour, so uClibc is not recommended)

Contents

Introduction

An RT-application is only able to operate correctly if the underlying OS and hardware are able to provide the needed determinism. That means a higher priority task can preempt a lower priority task. If for example a BIOS decides to use all CPU cycles for a very long time, no operating system or application can provide any latency guarantees. The whole system needs to be tuned and configured correctly.

The goal is to reduce (random) latency. This document is divided into for sections which explain how you can reduce latencies (if possibe)

  • Hardware
  • Kernel configuration
  • Application
  • Driver

Hardware

ISR latency

A good real time behaviour of a system depends a lot on low latency interrupt handling. Taking a look at the X86 platform, it shows that this platform is not optimised for RT usage. Several mechanisms cause ISR latencies that can run into the 10's or 100's of microseconds. Knowing them will enable you to make the best design choices on this platform to enable you to work around the negative impact.

System Management Interrupt (SMI) on Intel x86 ICH chipsets
System Management Interrupts are being generated by the power management hardware on the board. SMI's are evil if real-time is required. First off, they can last for hundreds of microseconds, which for many RT applications causes unacceptable jitter. Second, they are the highest priority interrupt in the system (even higher than the NMI). Third, you can't intercept the SMI because it doesn't have a vector in the CPU. Instead, when the CPU gets an SMI it goes into a special mode and jumps to a hard-wired location in a special SMM address space (which is probably in BIOS ROM). Essentially SMI interrupts are "invisible" to the Operating System. Although SMI interrupts are handled by 1 processor at a time, it even effects real-time responsiveness on dual-core/SMP systems, because if the processor handling the SMI interrupt has locked a mutex or spinlock, which is needed by some other core, that other core has to wait until the SMI interrupt handler has been completed and a mutex/spinlock has been released. This problem also exists on RTAI and other OS-es, see for more info [1]
DMA bus mastering
Bus mastering events can cause long-latency CPU stalls of many microseconds. It can be generated by every device that uses DMA, such as SATA/PATA/SCSI devices and even network adapters. Also video cards that insert wait cycles on the bus in response to a CPU access can cause this kind of latency. Sometimes the behaviour of such peripherals can be controlled from the driver, trading off throughput for lower latency. The negative impact of bus mastering is independent from the chosen OS, so this is not a unique problem for Linux-RT, even other RTOS-es experience these type of latency!

Hints for getting rid of SMI interrupts on x86

   1) Use PS/2 mouse and keyboard,
   2) Disable USB mouse and keyboard in BIOS,
   3) Compile an ACPI-enabled Kernel.
   4) Disable TCO timer generation of SMIs (TCO_EN bit in the SMI_EN register).

The latency should drop to ~10us permanently, at the expense of not being able to use the i8xx_tco watchdog.

One user of RTAI reported: In all cases, do not boot the computer with the USB flash stick plugged in. The latency will raise to 500us if you do so. Connecting and using the USB stick later does no harm, however.

ATTENTION!
Do not ever disable the SMI interrupts globally. Disabling SMI may cause serious harm to your computer. On P4 systems you can burn your CPU to death, when SMI is disabled. SMIs are also used to fix up chip bugs, so certain components may not work as expected when SMI is disabled. So, be very sure you know what you are doing before disabling any SMI interrupt.

Kernel configuration

On-demand CPU scaling
creates long-latency events when the CPU is put in a low-power-consumption state after a period of inactivity. Such problems are usually quite easy to detect. (e.g. On Fedora the 'cpuspeed' tool should be disabled, as this tool loads the on-demand scaling_governor driver)

Application

VGA Console
When the system is fulfilling its RT requirements the VGA Text Console must be left untouched. Nothing is allowed to be written to that console, even printk's are not allowed. This VGA text console causes very large latencies, up to more than hundreds of microseconds. It is better to use a serial console and have no login shell on the VGA text console. Also SSH or Telnet sessions can be used. The 'quiet' option on the kernel command line could also be useful to prevent preventing any printk to reach the console. Notice that using a graphical UI of X has no RT-impact, it is just the VGA text console that causes latencies.
Latencies caused by Page-faults

There are 2 types of page-faults, major and minor pagefaults. Minor pagefaults are handled without IO accesses. Major page-faults are page-faults that are handled by means of IO activity. The Linux page swapping mechanism can swap code pages of an application to disk, it will take a long time to swap those pages back into RAM. If such a page belongs to the realtime process, latencies are hugely increased. Page-faults are therefore dangerous for RT applications and need to be prevented.

If there is no Swap space being used and no other applications stress the memory boundaries, then there is probably enough free RAM ready for the RT application to be used. In this case the RT-application will likely only run into minor pagefaults, which cause relatively small latencies. Notice that pagefaults of one application cannot interfere the RT-behavior of another application.

During startup a RT-application will always experience a lot of pagefaults. These cannot be prevented. In fact, this startup period must be used to claim and lock enough memory for the RT-process in RAM. This must be done in such a way that when the application needs to expose its RT capabilities, pagefaults do not occur any more.

This can be done by taking care of the following during the initial startup phase:

  • Call directly from the main() entry the mlockall() call.
  • Create all threads at startup time of the application. Never start threads dynamically during RT show time, this will ruin RT behaviour.
  • Reserve a pool of memory to do new/delete or malloc/free in, if you require dynamic memory allocation.
  • Never use system calls that are known to generate pagefaults, like system calls that allocate memory inside the kernel.

There are several examples that show the several aspects of preventing page-faults. It depends on the your requirements which suits best for your purpose.

Global variables and arrays;

Global variables and arrays are not part of the binary, but are allocated by the OS at process startup. The virtual memory pages associated to this data is not immediately mapped to physical pages of RAM, meaning that page faults occur on access. It turns out that the mlockall() call forces all global variables and arrays into RAM, meaning that subsequent access to this memory does not result in page faults. As such, using global variables and arrays do not introduce any additional problems for real time applications. You can verify this behaviour using the following program (run as 'root' to allow the mlockall() operation)

Verifying the absence of page faults in global arrays proof

Priority Inheritance Mutex support
A real-time system cannot be real-time if there is no solution for priority inversion, this will cause undesired latencies and even deadlocks. (see [2])

On Linux luckily there is a solution for it in user-land since kernel version 2.6.18 together with Glibc 2.5 (PTHREAD_PRIO_INHERIT).

So, if user-land real-time is important, I highly encourage you to use a recent kernel and Glibc-library. Other C-libraries like uClibc do not support PI-futexes at this moment, and are therefore less suitable for realtime!

Building Device Drivers

Interrupt Handling
The RT-kernel handles all the interrupt handlers in thread context. However, the real hardware interrupt context is still available. This context can be recognised on the IRQF_NODELAY flag that is assigned to a certain interrupt handler during request_irq() or setup_irq(). Within this context a much more limited kernel API is allowed to be used.
Things you should not do in IRQF_NODELAY context
  • Calling any kernel API that uses normal spinlocks. Spinlocks are converted to mutexes on RT, and mutexes can sleep due its nature. (Note: the atomic_spinlock_t types behave the same as on a non-RT kernel) Some kernel API's that can block on a spinlock/RT-mutex:
    • wake_up() shall not be used, use wake_up_process() instead.
    • up() shall not be used in this context, this is valid for all semaphore types, thus both struct compat_semaphore, as well as struct semaphore. (of course the same is valid for down()...)
    • complete(): Uses also a normal spinlock which is defined in 'struct __wait_queue_head' in wait.h, thus not safe.
Personal tools