Rust Memory Safety between Rust and Ruby — Making Ruby Allocated Memory Works in Rust

yfractal · 2024年06月08日 · 最后由 KAITHY 回复于 2024年06月29日 · 468 次阅读

Introduction

Breaking the barrier between different programming languages is both interesting and beneficial. For example, ByteDance developed rust2go[1] to facilitate migrating Golang projects to Rust smoothly[2].

Importing Rust into Ruby can not only improve performance but also reduce tedious, repetitive work. For instance, Rust has implemented OpenTelemetry metrics, whereas Ruby hasn't. Wrapping the Rust implementation for use in Ruby can save a lot of effort. Ccache[3] is an experimental project exploring this direction.

Ccache is an etag-based local cache that saves cache values in Arc, which are then queried by Ruby. It needs to consider how to handle memory efficiently between the two systems.

This article introduces an idea about how to make Rust Arc work between Rust and Ruby without copying.

Background

Programs allocate memory and, after usage, need to consider how to return that memory.

The most straightforward way is managing memory manually, where programmers need to know when a variable can be freed and release it to the system. This is how C works; however, it's error-prone, and many memory bugs can be found in C programs.

Rust improves this by providing ownership. Variables belong to a specific scope, and when execution goes out of that scope, Rust releases the memory. This makes memory management explicit in the code.

Arc shares variable usage by adding a reference. When code uses a variable, it increases the count, and after finishing its use, it decreases the count. Rust releases the memory when the reference count reaches zero.

Ruby's Garbage Collection (GC) allows programs to use memory without considering when to release it. The Ruby VM triggers GC when necessary, and during GC, it finds unreachable variables and returns their memory to the system or memory pool.

The problem arises when we use Rust's Arc and need to pass the value to the Ruby part.

Producing the Problem

An Example Using Rust Arc

pub struct Store {
    hash_map: HashMap<String, Arc<AnyObject>>,
}

impl Store {
    fn new() -> Self {
        Store {
            hash_map: HashMap::new(),
        }
    }
}

wrappable_struct!(Store, StoreWrapper, STORE_WRAPPER);
class!(RubyStore);

methods!(
    RubyStore,
    rtself,
    fn ruby_new() -> AnyObject {
        let store = Store::new();
        Class::from_existing("RubyStore").wrap_data(store, &*STORE_WRAPPER)
    },
    fn ruby_insert(key: RString, obj: AnyObject) -> AnyObject {
        let rbself = rtself.get_data_mut(&*STORE_WRAPPER);

        rbself
            .hash_map
            .insert(key.unwrap().to_string(), Arc::new(obj.unwrap()));
        NilClass::new().into()
    },
    fn ruby_get(rb_key: RString) -> AnyObject {
        let rbself = rtself.get_data_mut(&*STORE_WRAPPER);

        let key = rb_key.unwrap().to_string();
        let val = rbself.hash_map.get(&key).unwrap();
        AnyObject::from(val.value())
    },
);

#[allow(non_snake_case)]
#[no_mangle]
pub extern "C" fn Init_ruby_example() {
    Class::new("RubyStore", None).define(|klass| {
        klass.def_self("new", ruby_new);
        klass.def("insert", ruby_insert);
        klass.def("get", ruby_get);
    });
}

The Store struct is a simple HashMap, and its value is an Arc of Ruby AnyObject. It is for concurrent usage.

And it seems to work well:

it 'works' do
  store = RubyStore.new
  foo = Foo.new(1, 2)
  store.insert("key", foo)

  sleep 0.1
  GC.start

  expect(store.get("key").class).to eq Foo
  expect(store.get("key").a).to eq 1
  expect(store.get("key").b).to eq 2
end

The Segmentation Fault

The above example works because the created Foo object is still referenced by foo variable, so the memory has not been freed.

To trigger the segmentation fault or other memory issues, we create and pass the Foo object directly to RubyStore.

it 'has memory issues :(' do
  store = RubyStore.new
  store.insert("key", Foo.new(1, 2))

  sleep 0.1
  GC.start

  expect(store.get("key").class).to eq Foo
end

Then it raises a segmentation fault.

A Simple Solution

We can avoid this by deep cloning the Arc’s value; however, it is not zero cost. Serializing a large object may take more than 1ms in Ruby[3]. To improve performance, we need to consider other methods.

To work around this situation, one option is to let Rust allocate the memory and free it through drop. However, this means Rust needs to figure out whether the allocated memory is being used by Ruby, which is not feasible. Thus, memory must be allocated by Ruby.

Rust ownership is a great idea as it lets the owner manage its job. We need to consider the responsibilities between Ruby and Rust. The memory is allocated by Ruby, so Ruby has the duty to release it. Then the memory is used by Rust but it doesn’t own it. Thus, we can still use Arc; the only difference is that we do not return memory back when we drop the Arc.

pub struct RubyObject {
    value: rutie::types::Value,
}

impl Drop for RubyObject {
    // drop nothing, GC was handled by Ruby
    fn drop(&mut self) {
    }
}

Rust has done its job properly, so we need to consider Ruby's job now. In Ruby, it allocates memory from the system, so it needs to free the memory. Additionally, it passes the object to Rust, so it needs to record this.

klass.def("insert_inner", ruby_insert);
class RubyStore
  def insert(key, val)
    @_val = val
    insert_inner(key, val)
  end
end

The value is assigned to a local variable @_val, making it reachable through the RubyStore instance. This prevents Ruby's GC from reclaiming its memory. When the key is deleted, we can set @_val = nil to “free” its memory.

Now, everything works well.

Discussion

Clearly, the current solution is far from ideal. To improve it, we can use a doubly linked list to save all Arc references in a Ruby local variable. When Drop is called, instead of doing nothing, we can remove the reference from the doubly linked list.

For this solution, Rust depends on Ruby's GC for reclaiming memory, and it requires Ruby’s cooperation. Rust doesn’t trust programmers completely, so it isn’t strictly safe. However, Ruby works in another way; it assumes programmers can do the right things (though they often don’t). Thus, this solution is acceptable for Ruby. To make it safer and cleaner, we can handle the references things in Rust code.

Another interesting direction is to make ownership works in Ruby, not only for Arc, but also for mutable/immutable, and lock usage. This can make the interaction smoother and make Ruby safer.

Summary

This article discussed a method to integrate Rust's Arc into Ruby, ensuring memory safety without the need for deep cloning. You can find the whole example in rust_arc_demo[4] and a use case in Ccache.

References

  1. https://github.com/ihciah/rust2go
  2. https://en.ihcblog.com/rust2go/
  3. https://github.com/yfractal/ccache
  4. https://github.com/yfractal/rust_arc_demo
yfractal Calling Rust from Ruby: How Rutie Works 提及了此话题。 06月27日 21:46
yfractal 关闭了讨论。 07月02日 16:09
需要 登录 后方可回复, 如果你还没有账号请 注册新账号