Ruby Detect Ruby GVL contention through dynamic link library functions

yfractal · 2024年09月10日 · 最后由 dordle12 回复于 2024年10月14日 · 311 次阅读

Introduction

Ruby's Global VM Lock (GVL) protects the Ruby VM's data but reduces parallel execution because only one thread can hold the lock at a time.

The GVL can affect application performance. For example, in a Puma server with several threads, when one thread holds the lock, it causes delays for other threads.

Ruby 3.2 introduced a GVL instrumentation API, and there are several tools for visualizing it. However, such observability requires Ruby VM support. Ruby VM supported observability development is slow, hard to cover all scenarios, and adds maintenance overhead for Ruby.

This article explores a more dynamic solution that provides similar observability without modifying Ruby code. It uses LD_PRELOAD[1] and dlsym[2] to wrap pthread lock functions, achieving behavior similar to Ruby's alias method.

And the code is in https://github.com/yfractal/sdb

How it Works

Ruby’s GVL is implemented using mutex and conditional variable, which are loaded through the dynamic linker. On linux, the dynamic linker allows us to override those functions using LD_PRELOAD. In the overridden functions, we can log relevant events and locate the original function through dlsym. This approach is similar to Ruby's alias method but for dynamically linked functions.

#[no_mangle]
pub unsafe extern "C" fn pthread_mutex_lock(mutex: *mut pthread_mutex_t) -> i32 {
    // log acquire event ...
    if let Some(real_pthread_mutex_lock) = REAL_PTHREAD_MUTEX_LOCK {
        let ret = real_pthread_mutex_lock(mutex);
        // log acquired event ...
        ret
    } else {
        eprintln!("Failed to resolve pthread_mutex_lock");
        -1
    }
}

// then we could do similar things for pthread_mutex_unlock, pthread_cond_wait and pthread_cond_signal

To identify the mutex’s address, we need to access Ruby's rb_thread_t object.

Here’s a simplified version of the code:

pub unsafe extern "C" fn log_gvl_addr(_module: VALUE, thread_val: VALUE) -> VALUE {
    // find rb_thread_t from thread value
    let thread_ptr: *mut RTypedData = thread_val as *mut RTypedData;
    let rb_thread_ptr = (*thread_ptr).data as *mut rb_thread_t;

    // access gvl_addr through offset directly
    let gvl_addr = (*rb_thread_ptr).ractor as u64 + 344;
    let gvl_ref = gvl_addr as *mut rb_global_vm_lock_t;
    let lock_addr = &((*gvl_ref).lock) as *const _ as u64;

    // log gvl address ...
    rb_ll2inum(lock_addr as i64) as VALUE
}

Testing

I used the following script for testing:

// example.rb
require 'sdb'

Sdb.log_gvl_addr

threads = []
10.times {
  thread = Thread.new do
    Sdb.log_gvl_addr
    i = 0
    10000.times do
      i += 1
    end
  end
  threads << thread
}

threads.each {|thread| thread.join }

We can run it using the this command: LD_PRELOAD=./target/release/libsdb_shim.so bundle exec ruby example.rb(libsdb_shim.so is the compiled Rust file).

Then, we could see logs similar to these:

2024-09-10 21:09:11.540956679 [INFO] [lock] thread_id=281472580841568, rb_thread_addr=187651089870448, gvl_mutex_addr=187651083330256

2024-09-10 21:09:11.53981372 [INFO] [lock][mutex][acquire]: thread=281472580841568, lock_addr=187651083330256
2024-09-10 21:09:11.539815804 [INFO] [lock][mutex][acquired]: thread=281472580841568, lock_addr=187651083330256
2024-09-10 21:09:11.539816595 [INFO] [lock][cond][acquire]: thread=281472580841568, lock_addr=187651083330256, cond_var_addr=187651089870568
2024-09-10 21:09:11.540927137 [INFO] [lock][cond][acquired]: thread=281472580841568, lock_addr=187651083330256, cond_var_addr=187651089870568

Others

Does the GVL Matter?

Ruby uses the GVL to protect its VM and releases the lock during I/O operations. It's not bad for I/O-bound applications.

However, background threads or code instrumentation (like NewRelic) can not only consume CPU resources but also introduce delays to all Ruby application threads.

eBPF Solution

We could use eBPF to probe these functions without modifying the application, but eBPF programs usually require root privileges and have more dependencies.

LD_PRELOAD alters the application’s library loading but is a much lighter solution compared to eBPF.

Improvements

The code demonstrates how to use LD_PRELOAD and dlsym to instrument the Ruby VM without modifying Ruby code.

Since Ruby’s GVL is complex(it uses conditional variables and only acquires the lock when the GVL has an owner and the current thread is not the timer thread), instrumenting mutex and conditional variable doesn’t fully capture gvl_acquire and gvl_release. However, we can still infer GVL delays from the locking patterns.

The code logs events to a file, allowing for async analysis. We could use fast_log[4], which buffers logs in memory and writes them to a file in batches.

However, since Ruby VM accesses the GVL pretty frequently, the example.rb can generate over 80,000 lines of logs. Likes ldb[3], the performance could be further improved by logging lock events only when the delay exceeds a threshold.

Summary

The uses LD_PRELOAD and dlsym to instrument the GVL without modifying Ruby code. You can find the code at https://github.com/yfractal/sdb

References

  1. https://man7.org/linux/man-pages/man8/ld.so.8.html
  2. https://linux.die.net/man/3/dlsym
  3. LDB: An Efficient Latency Profiling Tool for Multithreaded Applications
  4. https://github.com/rbatis/fast_log
yfractal 无 root 权限、证书查看 Ruby HTTPS 请求内容 提及了此话题。 09月15日 00:19
需要 登录 后方可回复, 如果你还没有账号请 注册新账号