Linux kernel locks¶
Note
This documentation covers some kernel internals specific to Linux kernel v6.8. The details provided here may not fully apply to newer kernel versions.
The interface of kernel locks hasn’t changed significantly over the years, nevertheless they are considered as unstable and subject to change. This is discussed in the The Linux Kernel Driver Interface documentation.
In the kernel space, certain critical code regions across all subsystems must be protected from context switching or, in the case of real-time kernels, from preemption. On SMP systems (Symmetric Multiprocessing), the Linux kernel provides various synchronization mechanisms to safeguard shared data structures from concurrent access, preventing race conditions in the kernel space.
Locks are primarily used to prevent race conditions, which are a concern in both SMP and real-time systems. Even in a single-processor real-time system, the preemption mechanism can lead to race conditions similar to those found in an SMP system.
There are essentially three classes of locks in the kernel:
Sleeping locks (also known as blocking locks)
Spinlocks¶
Spinlocks (found in include/linux/spinlock.h) are locks that keep polling (or “spinning”), attempting to acquire the lock. They are small, fast, and simple to use, making them generally applicable throughout the kernel. Spinlocks are ideal for situations where the kernel process cannot sleep. The ideal type of spinlock to use depends on the specific contexts between which the data needs to be shared.
Within the class of spinlocks, there are various types:
rwlock_t¶
rwlock_t
(defined on include/linux/rwlock_types.h) is a variant of the read/write lock for spinning locks.
Consumers of this lock are divided into two classes: readers and writers.
This lock structure allows multiple readers but only one writer at a time.
By implementing this approach, the number of locks used can be reduced, which in turn decreases the overhead associated with locking.
This lock behaves different on PREEMPT_RT
kernels.
On non-PREEMPT_RT kernels it relies on the architecture dependent assembly implementation of a
arch_rwlock_t
.On PREEMPT_RT kernels it is implemented as a
rwbase_rt
, realizing a sleeping lock.
spinlock_t¶
spinlock_t
is the most generic implementation of spinlock structures, as defined in include/linux/spinlock_types.h.
It is implemented in two different ways, depending on whether the real-time kernel configuration (i.e., with PREEMPT_RT
enabled) is used.
In non-PREEMPT_RT scenarios,
spinlock_t
maps to raw_spinlock_t.In PREEMPT_RT kernels, this is implemented as a rt_mutex, making this a sleeping lock.
raw_spinlock_t¶
raw_spinlock_t
(defined on include/linux/spinlock_types_raw.h) is a strict spinning lock implementation that operates consistently across all kernels, including real-time kernels.
It is reserved for critical core code, low-level interrupt handling, and scenarios where disabling preemption or interrupts is necessary.
For example, this lock is used to safely access hardware state.
In some cases, raw_spinlock_t
can also be employed when the critical section is very small, thereby avoiding the overhead associated with rt_mutex.
mutex¶
A Mutex, short for “mutual exclusion” is used to protect critical sections of code, ensuring that only one thread or process can access a shared resource at any given time.
Specifically, the mutex
type (defined on include/linux/mutex_types.h) is a simple mutex implementation that varies depending on whether the current kernel is a PREEMPT_RT kernel or not.
For non-PREEMPT_RT kernels, the implementation relies on an atomic operator which holds the owner and a raw_spinlock_t
On PREEMPT_RT kernels, the implementation is based on rt_mutex.
The primary use case for this type of lock occurs when data needs to be shared between different user contexts.
rt_mutex¶
RT-mutexes (defined on include/linux/rtmutex.h) are a special type of mutex that support priority inheritance. These mutexes are particularly useful in real-time kernels because priority inheritance is a protocol designed to mitigate the effects of priority inversion. It’s important to note that priority inheritance has limitations in non-real-time kernels due to preemption and interrupt-disabled sections. Additionally, priority inheritance cannot take effect in regions of code where preemption or interrupts are disabled. The benefits of priority inheritance are confined to regions where preemption is enabled (i.e., in preemptive task contexts). Essentially, RT-mutexes can be used to implement spinlock_t and read-write locks, providing the added advantage of priority inheritance where applicable.
semaphore¶
The semaphore
(defined in include/linux/semaphore.h) structure discussed here follows the counting semaphore structure (as the binary semaphore equivalent is essentially a mutex).
It’s a foundational implementation intended for use within other kernel lock structures.
Internally, it relies on a raw_spinlock_t and a counter.
Their primary use cases involve serialization and waiting.
It’s important to note that newer mechanisms, such as mutexes for serialization and completions for waiting, are generally preferable to traditional semaphores.
rw_semaphore¶
rw_semaphore
(Read-Write Semaphore, defined on include/linux/rwsem.h) is a variant of the read/write lock for sleeping locks.
The goal of this locking mechanism is to reduce the overhead caused by acquiring and releasing locks by minimizing the amount of locking required.
A read-write semaphore can be held by many readers but only one writer.
This significantly reduces the number of locks needed for various types of data.
ww_mutex¶
ww_mutex
(Wound/Wait Mutex found on include/linux/ww_mutex.h) are designed to handle complex synchronization scenarios, especially in GPU drivers where multiple buffers are shared across contexts, processes, and devices.
Traditional mutexes can lead to deadlocks due to unpredictable buffer access orders.
WW-Mutexes prevent deadlocks using a reservation ticket system, where the oldest task wins in case of contention.
Two algorithms, Wound-Wait and Wait-Die, manage conflicts, with Wound-Wait generally reducing rollbacks.
WW-Mutexes support flexible locking strategies, making them suitable for dynamic and unordered lock acquisition scenarios.
percpu_rw_semaphore¶
The percpu_rw_semaphore
(defined in include/linux/percpu-rwsem.h) is a modern read-write semaphore designed with optimization for read-heavy workloads.
Traditional read-write semaphores suffer from performance degradation when multiple cores acquire the lock for reading, as the cache line containing the semaphore bounces between the L1 caches of those cores.
With percpu_rw_semaphore, reading is extremely fast due to the use of RCU (Read-Copy-Update), which eliminates the need for atomic instructions in the lock and unlock paths.
However, locking for writing is significantly more expensive, as it involves calling synchronize_rcu(), which can take hundreds of milliseconds.
CPU Local Locks¶
On non-PREEMPT_RT kernels, local_lock
functions are wrappers around preemption and interrupt-disabling primitives.
Unlike other locking mechanisms, disabling preemption or interrupts is strictly a CPU-local concurrency control technique and is not suitable for managing inter-CPU concurrency.
local_lock_t¶
local_lock_t
(defined in include/linux/local_lock_internal.h) is a cpu lock implementation that is implemented in two different ways depending on whether the real-time kernel configuration (i.e., with PREEMPT_RT
enabled) is used.
In non-preemptive scenarios it is just a simple lock definition without any special fields.
With
PREEMPT_RT
enabled, it maps to a per CPU spinlock_t, which protects the critical section while staying preemptible.