Kernel configuration options

Real-time Ubuntu kernel configuration options are crucial for achieving low-latency and high-responsiveness in systems where deterministic performance is essential. This document outlines various kernel config options, that Real Time Ubuntu uses on it’s real-time kernel. These options enable and optimize preemption, manage IRQs, and control block I/O latency, among other features, that make Ubuntu real-time capable.

All the configurations bellow are kernel config options, that begin with CONFIG_, but this prefix is omitted in the following sections for brevity.


Basic preemption configurations

Preemption refers to the ability of an operating system to interrupt the execution of a running task or process. This interruption is done in order to give priority to another task. This feature is essential to reduce latency and improve the responsiveness of the system. Real-time applications are the ones who most benefit from it, where deterministic response times are critical.

ARCH_SUPPORTS_RT

This is a boolean value that indicates if the architecture supports real-time features.

PREEMPTION

The primary config, that enables the preemptible kernel. This configuration needs to be enabled in order to enable the other preemption levels and configurations.

PREEMPT_COUNT

Once the previous option get enabled, this configuration enables the counting of preemption. Needed for kernel APIs like preempt_count().

PREEMPT_RT

This configuration enables a fully preemptible kernel. There are other levels of preemption available in the kernel, but the PREEMPT_RT configuration is the most aggressive. The way that it works is by replacing some kernel primitives related to locking with preemptible priority-inheritance aware variants. This enables mechanisms that can break up long non-preemptible sections of code. However, this configuration doesn’t apply for very low level and critical sections of the kernel, such as entry code, scheduler, and low level interrupt handlers.


Kernel internals configurations

Next configurations are related to kernel internals.

PREEMPT_RCU

Enables preemption in the context of the synchronization mechanism RCU - Read-copy update. This is done by selecting the RCU implementation designed for very large Symmetric multiprocessing (SMP) systems that usually had hundreds of CPUs and that requires real-time response times. However this also scales down nicely to smaller systems.

RCU_BOOST

Working together with previous configuration, this one boosts the priority of preempted RCU readers that block the current preemptible RCU grace period for too long. This also prevents heavy loads from blocking RCU callback invocation.

RCU_NOCB_CPU

Used to reduce OS jitter for aggressive HPC or real-time workloads. It offloads RCU callback invocation to energy-efficient CPUs in asymmetric multiprocessors. The reduced jitter comes at the cost of increased call_rcu() overhead and potentially higher context-switch rates. This option creates kthreads (”rcuox/N”) to invoke callbacks on specified CPUs, where “N” is the CPU being offloaded and “x” will be “p” for RCU-preempt (PREEMPTION kernels) and “s” for RCU-sched (non PREEMPTION kernels). Affinity or cgroups can be used to control the CPU set for the kthreads.

RCU_NOCB_CPU_CB_BOOST

Used to offload RCU callbacks from real-kernel threads to energy-efficient CPUs. It invokes offloaded callbacks as SCHED_FIFO possible starvation caused by heavy background load on SCHED_OTHER. It’s necessary to ensure that latency-sensitive tasks will either run with higher priority or in some other isolated CPU.

TASKS_RCU

Enables task-based RCU implementation that uses only voluntary context switches (non preemptible).

CONTEXT_TRACKING

This configuration enables context tracking, by activating Kernel APIs on the context tracking header


CPU isolation configurations

The next configurations are related to CPU isolation and how to track CPU time spent in the kernel.

VIRT_CPU_ACCOUNTING

Boolean value that enables stats accounting and CPU time spent in the kernel.

VIRT_CPU_ACCOUNTING_GEN

Related to CPU isolation, this configuration enable task and CPU time accounting on full dynticks systems also known as NO_HZ systems. This accounting is implemented by watching every kernel-user boundaries using the context tracking subsystem. The accounting is thus performed at the expense of some significant overhead.

HAVE_POSIX_CPU_TIMERS_TASK_WORK

Used to handle posix CPU timers from task_work and not from the timer interrupt context.

POSIX_CPU_TIMERS_TASK_WORK

Related to the previous configuration, this configuration depends on the previous one and the posix timers being enabled in the system.


Debugging latencies issues

The next configurations are related to debugging and tracing latencies issues in the kernel.

TIMERLAT_TRACER

Timerlat tracer helps to trace sources of wake up latency in the kernel. It works by creating a per-cpu kernel thread that sets a periodic timer to wake up itself. Then, it goes to sleep and waits for the timer to fire. At the wake up, the thread then computes a wake up latency value as the difference between the current time and the absolute time that the timer was set to expire.

OSNOISE_TRACER

In high-performance computing (HPC), system noise is the interference that an application can suffer from the system due to the action of internal system mechanisms. In the context of Linux, this can be caused by NMIs, IRQs, SoftIRQs and other system threads. Also Hardware-related jobs like SMIs can also be source of noise.



Block I/O configurations

The next configurations are related to block I/O and how to account and control the latency of I/O operations.

BLK_CGROUP_IOLATENCY

Enabling this option enables the .latency interface for IO throttling. This enables to have guarantees on IO latencies.

ARCH_WANT_HUGE_PMD_SHARE

Memory management configuration that enables the sharing of huge Page Middle Directory (PMD) between processes. Currently this feature is needed on modern architectures like amd64, arm64 and riscv64.