Programming: Per-CPU Reader-Writer Semaphore
Published:
The motivation and implementation of the per-CPU reader-writer semaphore.
Contents:
Motivation
From the name per-CPU we know that similar to other per-CPU data structures ors mechanisms, the per-CPU reader-writer semaphore is used to resolve the cache line bouncing between L1 caches of the cores.
Algorithm
RCU is used in read to avoid the relatively expensive atomic instructions. On the other hand, write is very expensive because synchronize_rcu can take hundreds of milliseconds.
Implementation in Linux
The percpu_rwsem
struct is defined as follows (removed DEBUG):
struct percpu_rw_semaphore {
struct rcu_sync rss;
unsigned int __percpu *read_count;
struct rcuwait writer;
wait_queue_head_t waiters;
atomic_t block;
};
Each CPU core has a read_count
field.
For the readers, when on the fast path, it only needs to update its own copy; when on the slow path, it is responsible to wake up potential waiting writers.
For the writers, percpu_down_write
first enforces slow path by rcu_sync_enter
. The fast path is enabled again during percpu_up_write
which calls rcu_sync_exit
.
The fast or slow path is determined by rss->gp_state
and can be checked by rcu_sync_is_idle
which returns true if in fast path.
The source code: include/linux/percpu-rwsem.h and kernel/locking/percpu-rwsem.c
Usage in cgroup
On Linux 5.20, one patch makes the synchronize_sched()
optional during __cgroup_procs_write()
because otherwise the frequent fork and exit would have high latency.
In cgroup, cgroup_threadgroup_rwsem
is a per-CPU reader-writer semaphore. When migrating a process with all its threads to another cgroup, it needs to WRITE lock this semaphore and block forks and exits, which require the READ lock. The purpose is to make the threadgroup of a process stable during the migration; otherwise, there might be new threads in the old cgroup.
Developers observed that when process migration is frequent (in Android case), the WRITE lock is frequent which forces the the READ lock into the slow path and slows down the fork and exit.
Two new options favordynmods
(at runtime) and CGROUP_FAVOR_DYNMODS
(at compile time) are available to enable the per-CPU behavior. After Linux 5.20, by default the per-CPU behavior is disabled.
Link: