Rust Calling Rust from Ruby: How Rutie Works

yfractal · 2024年06月27日 · 78 次阅读

Introduction

Recently, I've been developing an experimental project called ccache(https://github.com/yfractal/ccache), which is a Redis client-side caching that guarantees consistency. Since it operates on the client side, I need to ensure it supports different programming languages.

The common practice is to write similar logic in different languages, as seen with the Redis client and OpenTelemetry instrument library. However, this approach involves tedious and repetitive work.

One potential solution is to write the core functionality in Rust and integrate it with different languages. To achieve this, we need to address the discrepancies between Rust and other languages, such as how to represent data and manage memory safely.

In this article, I will introduce Rutie, which bridges the gap between Ruby and Rust.

Untitled

How it works

Ruby MRI is written in C, so it natively works well with C. The idea is to write Rust code in the C ABI and ensure memory safety.

Ruby Calls Rust Functions

First, we can let cargo compile Rust files into a dynamic library by specifying crate-type = ["dylib"] in Cargo.toml.

Then we can calls the function through fiddle[2].

For example:

#[allow(non_snake_case)]
#[no_mangle]
pub extern "C" fn rust_method() {
  println!("hello from Rust");
}
handle = Fiddle.dlopen("./target/release/libruby_example.dylib")
Fiddle::Function.new(handle)[rust_method], [], Fiddle::TYPE_VOIDP).call

Bind C Functions to Ruby Class

Ruby allows us to define Ruby methods through C, which is more convenient than using fiddle. For example, void rb_define_method(VALUE klass, const char *name, VALUE (*func)(ANYARGS), int argc) is used to define an instance method for a class. The first argument is the class, the second argument is the method’s name, and the third argument is the callback function.

After defining the method through rb_define_method(SomeClass, "a_method", call_back_ptr, -1):

When the method is called in the Ruby VM, for example SomeClass.new().a_method(), Ruby calls the callback function. The callback function receives the argument count, arguments array, and the object(the self in Ruby). For example, the callback function in C could be:

static VALUE
ruby_insert(int argc, VALUE *argv, VALUE self) {
  // ...
}

Bind C Struct to Ruby Object

C and Rust use struct and define methods for structs. By binding a struct to a Ruby object allowing us reusing exist code. Ruby achieves this through rb_data_typed_object_wrap and rb_check_typeddata methods.

rb_data_typed_object_wrap creates a new instance with the struct. Its signature is VALUE rb_data_typed_object_wrap(VALUE klass, void *datap, const rb_data_type_t *type). datap is the pointer to our struct, and the return value is the created instance.

Then we can use rb_check_typeddata to find the struct. Its signature is void * rb_check_typeddata(VALUE obj, const rb_data_type_t *data_type). The first argument is the Ruby instance, and it returns the struct’s pointer.

Making Ruby Work with Rust

Binding Rust Methods to Ruby Classes

In the sections above, I explained how Ruby works with C. Now, I will introduce how Rust works with Ruby.

Rust allows us to define C functions using the extern keyword:

pub extern fn ruby_insert(
    argc: ::rutie::types::Argc,
    argv: *const ::rutie::AnyObject,
    mut self: SomeClass,
) -> AnyObject {
  // ......
}

Then the method can be bound to a class through:

Class::new("SomeRubyClass", None).define(|klass| {
    klass.def("rs_insert", ruby_insert);
});

Binding Rust Structs to Ruby Objects

To bind a struct to a Ruby object, we need to manage memory properly since Ruby uses garbage collection (GC) while Rust relies on ownership. rutie solves this problem by delegating the struct’s memory management to Ruby. When it wraps data, it bypasses memory management using Box::into_raw(Box::new(data)) as *mut c_void (in the Class::wrap_data method). This allocates memory on the heap through Box::new and then bypasses Rust's memory management through Box::into_raw, meaning Rust doesn’t free the variable when it goes out of scope. When Ruby reclaims the struct-wrapped object, it also frees the struct. When wrapping the struct by calling rb_data_typed_object_wrap, the last argument includes the free callback, which is:

pub extern "C" fn free<T: Sized>(data: *mut c_void) {
    // Memory is freed when the box goes out of the scope
    unsafe {
        let _ = Box::from_raw(data as *mut T);
    };
}

in Rutie.

Moreover, Rutie provides several macros and methods to make this process easier:

wrappable_struct!(SomeStruct, SomeStructWraper, SOME_STRUCT_WRAPPER); 
class!(SomeRubyClass);
class::wrap_data(Class::from_existing("SomeRubyClass").value(), some_struct, &*SOME_STRUCT_WRAPPER);

wrappable_struct! defines a wrapper struct:

pub struct SomeStructWraper<T> {
    data_type: ::rutie::types::DataType,
    _marker: ::std::marker::PhantomData<T>,
}

PhantomData is used for referencing T, and data_type is for the last argument of rb_data_typed_object_wrap. The macro defines a global variable SOME_STRUCT_WRAPPER for use, similar to a singleton in Ruby[3].

class!(RubySomeStruct); defines a Ruby class for wrapping.

class::wrap_data(Class::from_existing("RubySomeStruct").value(), some_struct, &*SOME_STRUCT_WRAPPER); is called in initialize for binding. As mentioned, SOME_STRUCT_WRAPPER contains SomeStructWrapper, which has a free method in it

pub extern fn ruby_initialize(
    argc: ::rutie::types::Argc,
    argv: *const ::rutie::AnyObject,
    mut rtself: RubyStore,
) -> AnyObject {
    return class::wrap_data(Class::from_existing("RubySomeStruct").value(), some_struct, &*SOME_STRUCT_WRAPPER);
}

For using the struct:

pub extern fn ruby_insert(
    argc: ::rutie::types::Argc,
    argv: *const ::rutie::AnyObject,
    mut rtself: RubyStore,
) -> AnyObject {
   let rs_struct = rtself.get_data_mut(&*SOME_STRUCT_WRAPPER);
   // ......
}

We need the global variable SOME_STRUCT_WRAPPER because get_data_mut calls rb_check_typeddata, whose third argument is rb_data_type_t provided by the struct wrapper’s data_type field.

Memory Safety

A wrapped Rust struct is safe because the responsibility for freeing memory has been delegated to Ruby, allowing Ruby to free the memory during garbage collection (GC).

Variables passed to Rust are structures of pointers, so Rust does not free the contents the pointers point to. Since this is safe in C, it is safe in Rust as well.

The return variables are allocated through Ruby's memory system, and when they return, their ownership is passed to the caller—Ruby—which is also safe.

However, it becomes unsafe when Rust wants to keep a reference in its struct because Ruby doesn't know that Rust holds the reference, and Ruby might free it, causing a use-after-free issue. A simple solution is when Rust wants to hold a reference in Arc, it needs to let Ruby keep an additional reference to the object. And when Rust drops the Arc, it should let Ruby remove the additional reference.

Summary

This article introduced how Rutie enables Ruby to use Rust code and discussed memory safety. Many details are not covered here, such as Rutie allows Rust call Ruby, how to build for different platforms, and how it translates structs and binds C methods. It's encouraged to use Rutie and explore its source code for a deeper understanding.

I believe Rutie can greatly benefit the Ruby community. Not only can Rust enhance performance, but it also allows Ruby to leverage Rust implementations, such as gRPC and OpenTelemetry metrics.

A complete example of using Rutie can be found at https://github.com/yfractal/ccache/tree/main/ccache_rb.

References

  1. https://github.com/danielpclark/rutie
  2. https://github.com/ruby/fiddle
  3. https://refactoring.guru/design-patterns/singleton/ruby/example
  4. https://ruby-china.org/topics/43728
yfractal 关闭了讨论。 07月02日 16:09
需要 登录 后方可回复, 如果你还没有账号请 注册新账号