Ruby's Global VM Lock (GVL) protects the Ruby VM's data but reduces parallel execution because only one thread can hold the lock at a time.
The GVL can affect application performance. For example, in a Puma server with several threads, when one thread holds the lock, it causes delays for other threads.
Ruby 3.2 introduced a GVL instrumentation API, and there are several tools for visualizing it. However, such observability requires Ruby VM support. Ruby VM supported observability development is slow, hard to cover all scenarios, and adds maintenance overhead for Ruby.
This article explores a more dynamic solution that provides similar observability without modifying Ruby code. It uses LD_PRELOAD
[1] and dlsym
[2] to wrap pthread lock functions, achieving behavior similar to Ruby's alias method.
And the code is in https://github.com/yfractal/sdb
Ruby’s GVL is implemented using mutex and conditional variable, which are loaded through the dynamic linker. On linux, the dynamic linker allows us to override those functions using LD_PRELOAD
. In the overridden functions, we can log relevant events and locate the original function through dlsym
. This approach is similar to Ruby's alias method but for dynamically linked functions.
#[no_mangle]
pub unsafe extern "C" fn pthread_mutex_lock(mutex: *mut pthread_mutex_t) -> i32 {
// log acquire event ...
if let Some(real_pthread_mutex_lock) = REAL_PTHREAD_MUTEX_LOCK {
let ret = real_pthread_mutex_lock(mutex);
// log acquired event ...
ret
} else {
eprintln!("Failed to resolve pthread_mutex_lock");
-1
}
}
// then we could do similar things for pthread_mutex_unlock, pthread_cond_wait and pthread_cond_signal
To identify the mutex’s address, we need to access Ruby's rb_thread_t
object.
Here’s a simplified version of the code:
pub unsafe extern "C" fn log_gvl_addr(_module: VALUE, thread_val: VALUE) -> VALUE {
// find rb_thread_t from thread value
let thread_ptr: *mut RTypedData = thread_val as *mut RTypedData;
let rb_thread_ptr = (*thread_ptr).data as *mut rb_thread_t;
// access gvl_addr through offset directly
let gvl_addr = (*rb_thread_ptr).ractor as u64 + 344;
let gvl_ref = gvl_addr as *mut rb_global_vm_lock_t;
let lock_addr = &((*gvl_ref).lock) as *const _ as u64;
// log gvl address ...
rb_ll2inum(lock_addr as i64) as VALUE
}
I used the following script for testing:
// example.rb
require 'sdb'
Sdb.log_gvl_addr
threads = []
10.times {
thread = Thread.new do
Sdb.log_gvl_addr
i = 0
10000.times do
i += 1
end
end
threads << thread
}
threads.each {|thread| thread.join }
We can run it using the this command: LD_PRELOAD=./target/release/libsdb_shim.so bundle exec ruby example.rb
(libsdb_shim.so
is the compiled Rust file).
Then, we could see logs similar to these:
2024-09-10 21:09:11.540956679 [INFO] [lock] thread_id=281472580841568, rb_thread_addr=187651089870448, gvl_mutex_addr=187651083330256
2024-09-10 21:09:11.53981372 [INFO] [lock][mutex][acquire]: thread=281472580841568, lock_addr=187651083330256
2024-09-10 21:09:11.539815804 [INFO] [lock][mutex][acquired]: thread=281472580841568, lock_addr=187651083330256
2024-09-10 21:09:11.539816595 [INFO] [lock][cond][acquire]: thread=281472580841568, lock_addr=187651083330256, cond_var_addr=187651089870568
2024-09-10 21:09:11.540927137 [INFO] [lock][cond][acquired]: thread=281472580841568, lock_addr=187651083330256, cond_var_addr=187651089870568
Ruby uses the GVL to protect its VM and releases the lock during I/O operations. It's not bad for I/O-bound applications.
However, background threads or code instrumentation (like NewRelic) can not only consume CPU resources but also introduce delays to all Ruby application threads.
We could use eBPF to probe these functions without modifying the application, but eBPF programs usually require root privileges and have more dependencies.
LD_PRELOAD alters the application’s library loading but is a much lighter solution compared to eBPF.
The code demonstrates how to use LD_PRELOAD
and dlsym
to instrument the Ruby VM without modifying Ruby code.
Since Ruby’s GVL is complex(it uses conditional variables and only acquires the lock when the GVL has an owner and the current thread is not the timer thread), instrumenting mutex and conditional variable doesn’t fully capture gvl_acquire
and gvl_release
. However, we can still infer GVL delays from the locking patterns.
The code logs events to a file, allowing for async analysis. We could use fast_log[4], which buffers logs in memory and writes them to a file in batches.
However, since Ruby VM accesses the GVL pretty frequently, the example.rb
can generate over 80,000 lines of logs. Likes ldb[3], the performance could be further improved by logging lock events only when the delay exceeds a threshold.
The uses LD_PRELOAD and dlsym to instrument the GVL without modifying Ruby code. You can find the code at https://github.com/yfractal/sdb