Ruby Share HashMap for Different Processes by `mmap` in Ruby

yfractal · 2022年08月28日 · 最后由 kauruus 回复于 2022年11月03日 · 175 次阅读

Why

A cache in the application can reduce both latency and network usage. Such a cache can replace Redis in some situations too. Erlang's ETS is a really good example.

But as we know, most Rails applications are deployed in cluster mode, and a cluster will have 3 or more processes, data can't be accessed by different processes normally.

If each process has one such cache, it will waste memory and reduce the cache hit.

So we need a HashMap that can be accessed by different processes.

We need something that works like:

_pid = Process.fork do
  insert(1, 10) # insert in child process
end

insert(2, 20) # insert in parent process
Process.wait

display # should display the 2 elements

How it works

As know, a program uses virtual memory to access physical memory, and the virtual memory to physical memory mapping is managed by the operating system.

So we can use mmap to map the same physical memory to different processes.

The code is simple:

void* create_shared_memory(size_t size) {
    int protection = PROT_READ | PROT_WRITE;
    int visibility = MAP_SHARED | MAP_ANONYMOUS;
    return mmap(NULL, size, protection, visibility, -1, 0);
}

Then we need to allocate memory for both array(HashMap is an array actually) pointers and the array data by:

struct DataItem **hashArray = (struct DataItem**)create_shared_memory(sizeof(void *) * SIZE);
void *dataArea = create_shared_memory(sizeof(struct DataItem) * SIZE);

Then we can insert item into the array by:

struct DataItem *item = (struct DataItem*) (dataArea + sizeof(struct DataItem) * hashIndex); // use the shared memory
hashArray[hashIndex] = item;

Ruby can write C extension easily:

void Init_extension(void) {
    VALUE CFromRubyExample = rb_define_module("CacheRb");
    VALUE NativeHelpers = rb_define_class_under(CFromRubyExample, "NativeHelpers", rb_cObject);
    rb_define_singleton_method(NativeHelpers, "insert", rb_insert, 2);
   // ......
}

Now we can test this by

CacheRb::NativeHelpers.init

_pid = Process.fork do
  CacheRb::NativeHelpers.insert(1, 10)
end

CacheRb::NativeHelpers.insert(2, 20)
Process.wait

CacheRb::NativeHelpers.display

After display is executed, we can see the hash has two elements, 1 => 10 and 2 => 20, one is inserted by the child process and one is inserted by the parent process.

All code is in https://github.com/yfractal/cache_rb, you can compile it by rake compile, and run CacheRb.demo in the bundle console.

What's the next?

This is a just simple or silly example to prove the idea works.

For making it useful, I will find or write a good hash map and handle memory allocation and free wisely in the following days.

Shared memory between processes is a common practice in the OpenResty/Nginx world.

Not sure if you protect the shared memory from concurrent access. At least in Nginx, they use a lock for this.

需要 登录 后方可回复, 如果你还没有账号请 注册新账号