Good morning, Ruby Conf! I'm excited. How are you doing today? My name is Minqi Pan. I came from China, Beijing. And I'm a hacker of Ruby and C++. I'm also a Node.js collaborator, so I do a lot of open source development, and this is my GitHub page.
Today, we'll be talking about how to compile your Ruby application into a single executable. So, where does the story come from? You know, the other day I was installing the gitlab-ci-runner on a Windows machine, because our project is in C sharp, and I thought, geez, gitlab was written in Ruby, it seems very hard to install Ruby on Windows, probably I need to set up the Ruby environment and do some gem installs and stuff, but it turned out that's not the case. They made a single executable for me to download. It's 46 megabyte. After I download that executable, I just execute it and installed a service on my CI machine. And voila, it works.
So, why is that? Because they're not writing the CI runner in Ruby, they're writing it in Go. So Go has this nice feature that can build your program into a single product that you're free to distribute. It feels so nice. So, in order to solve this distribution problem, people would just need to drop Ruby and use Go. Thanks all for coming (laughs).
But seriously though, some company does that indeed. Like Heroku, I remember a long time ago when installed Heroku CLI tools on my machine, on my Windows machine, there was an installer based on Ruby installer, that basically installed Ruby along with the Heroku project code into my machine. But they changed that, the last time I tried Heroku CLI is no longer in Ruby, I guess it's also in Go or Node.js or something like that.
So I'm kind of jealous about the Go language. I love Ruby, I've been writing Ruby for years, I think we all do love Ruby right, it's so sweet, we love the language. Because of this jealousy, I'm trying to bring that feature into the Ruby world.
So what problem are we trying to solve here? Before, we, before the Ruby packer, how we distribute our program. Well, something, somewhat like this. We have to first, well, assume that the user does not have Ruby installed on their machine. We have to first let them install the Ruby environment, and then probably install the rvm and then install Ruby, and those all take a long time, long period of time to complete, I recorded this video in China and they even got worse connection than the US and that takes a long time. It's definitely not friendly to the end user. And also finally, after we got the Ruby environment set up on user's machine, you need to let them do some gem install stuff, there are a lot of dependencies to download. I think overall that's not a good experience. And, if you consider Windows, that's a disaster. So the problem is, installation is slow, tons of files to download and, if in China, the Great Wall is somewhat helps it even makes the thing worse. And sometimes, people forgot to use sudo, and it's error-prone, especially with native modules. And you have to care about Ruby runtime version. Say, if you use the lonely operator that was introduced in Ruby 2.3, and user machine was installed with Ruby 2.1, then your program won't run.
And also, updating. After distributing your program you want to keep it up-to-date, so how we do that? We have to do the same gem install thing again, and the user does not even know that you have a new version, right? And on Windows you have to do the installer again. So, no version checks, it's also cumbersome to update as cumbersome as to install, I guess. So, after I try this packing your project into a single executable idea, it turned out to be much better. Look, you can compile your Ruby project into a single file. This is one project that I compiled, it weighs 40 megabytes. And, you can just execute it. Dot slash your program, it runs. And also, on Windows, the user no longer has to install those installers anymore. You drop a exe to them -- 33 megabytes -- and they cannot double click it because usually it's a command line interface. So, you just bring up your cmd and it runs.
And this is the best part -- Updating has been made so much easier once you pack it into a single executable. It feels so nice that I want to show it to you in a demo. So, this is the program that I packed, and when it runs it will check for a new version of it. And if it finds a new version, it will download the new version and replace itself. While the download step happens on the temporary directory, it downloads the file to some temporary location, and move that file to this single place to replace itself. It inflated the thing and replaced, and look, it resumed the execution and you've now got the new version. And it happens on Windows as well. So you can sort of just put this self-check for versions when your program begins, and the updating process is so easy because it's just one file, it just, you replace yourself.
That's the idea. And the tool that I made will help you produce executables like this. You can get the source code of the tool from this address, it's on my GitHub page, it's called ruby-packer. And, I even made a homepage for it, it's used to prove the idea that this thing works, because this tool itself was written in Ruby, and this tool is used to compile Ruby projects, so it must be able to compile itself. So this is an example of the thing that it compiles. It actually can generates those files on three platforms: Windows, Mac and Linux. You can just download and it works out of box.
So, how to use this tool. Well, there's several scenarios. The first scenario is, I have this tool installed at this location. If I don't give it any arguments, it will just try to produce a single Ruby executable that is blank. It's just a single Ruby interpreter executable. How does it look after the compilation process? I have some examples here. This is final product and if you execute it, it's just a Ruby executable so you can print one plus one, and it's two. I wish Ruby was distributed in this way. Actually, it's not. Currently Ruby was distributed in source code form, and there's no, almost no binary distribution. There is some for the Windows, the Ruby installer, but that distribution contains so many files because the standard library is so huge. What if we can distribute Ruby in just one single executable, that would be so better. And, but you must have said, Ruby contains so many things, it's just not one single executable in there. It contains IRB. Well, IRB is inside this, I will tell you how it works later.
Let's see another scenario where you can use this tool. Is the ordinary scenario where you want to compile your own project. Say if you have a Ruby project like the one that I got here, you just give the entrance of that project to this tool. /bin/rubyc and it will begin compiling, and after that, you will get executable for your ruby project. And that's usually the main scenario that it's supposed to be used.
And we cannot forget about Rails. So, if you have a Rails project, you can use this tool with the entrance set to bin slash rails, and that will produce a single executable for your Rails project. I have already compiled one here so we can show it. Like this executable, you can see it's 38 megabyte, and I can run it. And it will say you have to add another command, let's run it as server. Rails always starts a little bit slow. Then you go to locahost 3000. Yay, you're on Rails.
The fourth scenario is you can actually pass it a gem to let it compile the gem. And also, there are parameters for you to specify auto-update URL to check for new versions, so that the auto-updating part is built in. And that's because I made this tool on top of another library called libautoupdate. That library basically abstracts away the difference between Windows and Linux, and use the socket functions to communicate with the server to check for new versions.
So, what happens under the hood, how did we make this work?
The basic idea is we put your project into something like a mounted disk in your memory, along with the Ruby interpreter, and we mounted it at a special location called
/__enclose_io_memfs__. You probably already seen that in the backtrace. That is like a virtual file system in-memory. We can use the other one, the Ruby interpreter, the blank Ruby interpreter that we compiled, and we can give it a entrance, and that entrance you can even reference a file inside the virtual file system, and that's the reason we can bring up the IRB.
And, if you look at the load path, that is all in the virtual file system that is actually in-memory, part of the executable. So that when you require something, it actually will search inside this virtual memory to give you that. You can do all kinds of file system operations on it, and it all just works. Like, if I wonder what is on the root path in this virtual memory, there bin include lib and share so Ruby standard library is actually embedded inside this single executable. Well you know, you used to have this huge library installed on your system, but now it's embedded inside this executable. So all the global level gems are there, this is very similar to what you would install on your machine. And you can even read file out of it, like File.read and give it a virtual path, bin/irb. Look, there's the irb that we just invoked. So, that's the idea.
So, it's a combination of in-memory file system and the real file system. The key is that if it starts with this string
/__enclose_io_memfs__, it goes into itself, otherwise it goes to the outside. And all kinds of file operations are supported on this virtual file system. Like I can read this file, and I get its meta data at the time that I compiled this. You see all those information are stored when I compile this project.
So that's the idea. Whatever starts with this path goes to your memory, the others still goes to your disk.
So where is your project? We put your project into a special location under this virtual directory called local. If you compile, use this tool to compile project, you would find your project here at that location. So, to wrap it up, you then hard-code an entrance. Because you used to run your project via the Ruby interpreter and put your entrance at the argv location, but now, since we have compiled this into one single executable, we just preset the argv to your location, so that user, when the user fetches your final executable it just runs, runs from this location.
It just works.
So how did we do this. There are so many file APIs, you cannot hack them one by one to add this special functionality. Right, it's just not maintainable. You would be subject to change, their 2.5 is coming up. So, we did not change on the Ruby level. Yes, we hacked some code on the Ruby level, but in a limited way. Like ruby/io.c is the source code for a lot of IO operations. We didn't change much, we just included an extra header to intercept some of the system calls, and most of the logic is actually inside this small library, it's called libsquash.
What is libsquash?
That is actually the core part of it. Well, I want to take this opportunity to thank a few people. This library is not solely made by me. Dave Vasilevsky made a library called squashfuse. It is a file system, a squash file system implementation on Fuse, and I take lots of code from him and make it into a library. And also this is my good buddy, he's called Shengyuan Liu, and he helped me wrote this library as well.
So, this library basically implements SquashFS. SquashFS is a compressed, read-only file system that was used by Live CD versions, mainly Linux distributions, like I used to play with the Ubuntu Live CD and the Live CD was actually a SquashFS that can be mounted by the kernel when you run it. And also used by some router companies as their firmware. So, we are trying to invent this tool to compile project we're looking for a data structure to hold it, and I was looking for different options and I think this squashFS data structure fits our needs. And it has been there for a while, it's stable, so we used that. And we try the effectiveness of it. If your project weighs over 100 megabyte, after making it into a SquashFS file, it been compressed into just like 16 megabyte. And their tools for it is called mksquashfs. It compiles your project, meanwhile, the final data structure being able to randomly access, so it's a good data structure.
The compressing part is important. You don't want the user to have a huge file being distributed to them. You cannot let them download something like multi megabyte that is unfriendly as well. Takes a long time to download and update. So, we choose one with compression in it.
SquashFS is actually part of the Linux kernel, since 2009. So we cannot just use that because it's GPL licensed and the code is part of the kernel, you cannot use it on the application level because, like the malloc are just kmalloc. They're not at the user level. It's the kernel level. So, we cannot use the original implementation, so that's when I found the squashfuse, it's a Fuse implementation so it's in the user space, and it's MIT licensed, So I took his code and made another library called libsquash, that removes the Fuse part of out it -- because we don't want to make any assumptions about user's environment -- we do not want to assume that they have Fuse, they don't need that. So it does not depend on Fuse, it's just a library implementing SquashFS access part of it, and it's MIT licensed because the former project is also MIT licensed.
And also it compiles on three platforms: Windows, Linux, Mac. So that if you use this, your application is super distributable.
And this is the important part: I designed this library to make it to mirror a lot of the system calls. So, it can just come as an in-place replacement for a lot of system calls the file system wants. And this is another important design. You know a lot of system calls do not have path in it. I have to distinguish the path that begins with
/__enclose_io_memfs__, this little screen and path that do not.
Some system calls like open, read, write, they actually work on file descriptors, they're just integers. How do I know that a file descriptor was opened on virtual file or a real file? So I have to come up with something like a virtual file descriptor. Like if you do open call the first time, it has a path in it, and that will distinguish if this is a real path or a virtual path. If it is virtual path, I issue you a virtual file descriptor and that was actually generated by this system call. I duped the zero file descriptor to make it look like a file descriptor, and let it co-exist with other FD's issued by the system. But I have it in my record, I have a global data structure called global_fdtable, so that I remember what FD are issued by me and what are not.
So, after you get that FD, and made some calls after that like read, then I will check if this is my FD or the system's FD. If it's my FD, then I will read the data for you from the SquashFS, otherwise, will send to the operating system. So this is how the header, the magic header works. I just intercepted a lot of existing calls, redirected to that one. And, we spent a lot of time hacking Windows as well. We intercepted those calls as well. So it works on Windows as well.
So what about native extensions? Well, we did some compromise on this part. We could make it so perfect as to record all the native extensions at compile time, but that is really hard. For the moment, when you call dlopen or LoadLibraryExW on Windows, we actually extracts that file to a temporary location. But it works, so for now it's a temporary solution. We extract it to a temporary file, and then do the dlopen on the temporary file, so it still works. And when the process exits, we'll delete that file. So native extensions work.
So what about Rails? Why does Rails work? Rails is special. It will create some files in your root of your project but SquashFS is read-only, you cannot write anything to it. Like Rails will write a lot of logs into the root of your project, it will create a lot of temporary files. So how do you solve that? Actually we also intercepted some of those calls to a temporary location. So when it tries to write files to the SquashFS, we actually redirect that call to a temporary location and remember that file, and also delete that temporary file when we exit.
You will see that the log folder and the tmp folder are just created, because they're trying to write it to the root of your project, but we redirected that call to a temporary location and that temporary location was specified as the current directory, so the log is there. And your config is there, so if you're really going to distribute your Rails app in this way, this is opportunity for you to spread out the config, to make it configurable. So that it distributes with the config folder, but not your code anymore.
That's how it works.
So, in summary, this is your project, and we use mksquashfs to compile that yours.squashfs, and we compile the two libraries libsquash as the run-time access to that file, and libautoupdate to give you auto update feature. And we compile the Ruby Runtime. Then, last step. Sort of like the PPAP process, right? (laughs)
- I have your project - I have a pen
- I have libsquash - I have an apple
- Uh, your project libsquash - Uh, apple pen
- I have libautoupdate - I have a pen
- I have Ruby Runtime - I have pineapple
- Uh, libautoupdate Ruby Runtime - Uh, pineapple pen
- Your project libsquash - Apple pen
- libautoupdate Ruby Runtime - Pineapple pen
- Uh, your project libsquash libautoupdate Ruby Runtime - Pen pineapple apple pen
- Your project lib squash libautoupdate Ruby Runtime - Pen pineapple apple pen
We statically link them together, it becomes yours.exe and you distribute and enjoy your executable.
Thank you very much.