Programming: Per-CPU Reader-Writer Semaphore


The motivation and implementation of the per-CPU reader-writer semaphore.



From the name per-CPU we know that similar to other per-CPU data structures ors mechanisms, the per-CPU reader-writer semaphore is used to resolve the cache line bouncing between L1 caches of the cores.


RCU is used in read to avoid the relatively expensive atomic instructions. On the other hand, write is very expensive because synchronize_rcu can take hundreds of milliseconds.

Implementation in Linux

The percpu_rwsem struct is defined as follows (removed DEBUG):

struct percpu_rw_semaphore {
	struct rcu_sync		rss;
	unsigned int __percpu	*read_count;
	struct rcuwait		writer;
	wait_queue_head_t	waiters;
	atomic_t		block;

Each CPU core has a read_count field.

For the readers, when on the fast path, it only needs to update its own copy; when on the slow path, it is responsible to wake up potential waiting writers.

For the writers, percpu_down_write first enforces slow path by rcu_sync_enter. The fast path is enabled again during percpu_up_write which calls rcu_sync_exit.

The fast or slow path is determined by rss->gp_state and can be checked by rcu_sync_is_idle which returns true if in fast path.

The source code: include/linux/percpu-rwsem.h and kernel/locking/percpu-rwsem.c

Usage in cgroup

On Linux 5.20, one patch makes the synchronize_sched() optional during __cgroup_procs_write() because otherwise the frequent fork and exit would have high latency.

In cgroup, cgroup_threadgroup_rwsem is a per-CPU reader-writer semaphore. When migrating a process with all its threads to another cgroup, it needs to WRITE lock this semaphore and block forks and exits, which require the READ lock. The purpose is to make the threadgroup of a process stable during the migration; otherwise, there might be new threads in the old cgroup.

Developers observed that when process migration is frequent (in Android case), the WRITE lock is frequent which forces the the READ lock into the slow path and slows down the fork and exit.

Two new options favordynmods (at runtime) and CGROUP_FAVOR_DYNMODS (at compile time) are available to enable the per-CPU behavior. After Linux 5.20, by default the per-CPU behavior is disabled.