Guilherme Carreiro

Building native Ruby extensions in Rust

Building native Ruby extensions in Rust

by Guilherme Carreiro

In this talk titled "Building native Ruby extensions in Rust" presented by Guilherme Carreiro at BalticRuby 2024, the speaker explores how leveraging the power of Rust can enhance Ruby applications by creating native extensions.

Key Points Discussed:

- Introduction and Background: Guilherme shares a bit about his upbringing in Brazil, his journey into programming, and his experience at Shopify, emphasizing the performance demand in developer tooling.

- Need for Native Extensions: The discussion begins with reasons for using native extensions in Ruby, primarily citing performance benefits. He stresses the importance of benchmarking to ascertain whether the overhead of native extensions is justified.

- Performance Comparison: He compares the performance of pure Ruby, FFI overhead, and Rust native extension implementations. The results illustrate that while Rust can be faster, pure Ruby can also perform commendably in some scenarios.

- Prototyping a YAML Parser: Guilherme proposes the idea of creating a fast YAML parser in Rust to improve on existing Ruby parsers. He outlines the steps for initializing a Ruby gem and structures the native extension using Rust, highlighting key files like cargo.toml, extconf.rb, and the Rust source code located in lib/fast_yaml.rs.

- Integration Steps: The talk details how to define methods in Rust that can be called from Ruby, emphasizing how to properly create and link shared objects within Ruby, which allows seamless function calling across languages.

- Handling Errors and Memory Management: Strategies for error management and memory handling are discussed, including transforming Rust errors into Ruby exceptions and using tools like Valgrind to ensure memory health.

- Making the Gem Production-Ready: The speaker outlines the steps needed to prepare the gem for public use, such as error handling, memory leak checks, and pre-compiling for user-friendly installation.

- Conclusion and Call to Action: Guilherme concludes by highlighting the fate of integrating native extensions, emphasizing the balance between performance enhancement and usability improvements. The audience is invited to explore the Fast YAML gem and participate in its development.

Main Takeaways:

- Native extensions can significantly boost performance but require thoughtful implementation, benchmarking, and error handling.

- Proper Rust and Ruby integration can unlock advanced functionalities in Ruby applications.

- The speaker encourages community engagement in building and improving such tools, noting that while performance is crucial, usability must also be a priority in the development process.

00:00:08 Hello everyone, my name is Guilherme Carreiro, and my handle on GitHub is Kaho. During this talk, I'm going to show you a bunch of code, and after the talk, I'll make everything available on my GitHub and website. So, you can find the slides at kro.com/talks.
00:00:25 I was born in Brazil, in a place called Niteroi, and today I live in Madrid, Spain. Probably most of you know Madrid, but perhaps you are not familiar with Niteroi. One thing my hometown is very proud of is that we have the largest shopping mall in Latin America.
00:00:38 A fun fact about this shopping mall is that when I was 10 years old, my mother bought me a programming book. I learned the fundamentals of programming from that book. I was always tinkering with scripts because my uncle was a programmer, but this book really ignited my passion for coding.
00:01:02 I started doing things with Flash, like ActionScript, animations, and mini games. Today, I work at Shopify as a Staff Developer, focusing on developer tooling for the past three years. Before that, I worked at Red Hat, also in developer tooling.
00:01:20 Developer tooling is a challenge that I really enjoy because it requires a lot of performance. No one likes to wait for the language server when programming. We have strict performance constraints; for example, when building a language server, we want to return information to the developer within 100 milliseconds.
00:01:46 Today, I'm going to show you how to build a native extension together. We'll follow all the steps from scratch—thinking about various considerations required for production. Let's talk about the reasons why we would need a native extension. Why not just write pure Ruby? The straightforward answer is performance.
00:02:28 Native extensions can run much faster because they are compiled. However, this claim is debatable. Ruby is getting faster with every version, especially with garbage collector enhancements. Therefore, always benchmark your idea first. If your benchmarks indicate that writing a native extension is a good idea, then you should proceed.
00:03:06 Let me illustrate the performance aspect of native extensions. Suppose we have a piece of code—an essential calculation for your business. The goal is to optimize this code and make it run faster. By extracting this computation into a native extension, written in Rust, we can improve performance significantly.
00:03:47 We can compare three scenarios to see which is the fastest. The third one, using the Rust native extension, is indeed the fastest. However, the first scenario, which is pure Ruby, performs surprisingly well compared to the second scenario, which introduces FFI overhead. This demonstrates that using native extensions does not always result in better performance.
00:04:20 Therefore, it’s essential to prototype before committing fully to a native extension. As Ruby developers, we frequently work with YAML files. My hypothesis is that Rust has an excellent YAML parser, and if I integrate that parser into the Ruby ecosystem, I can leverage the performance benefits.
00:05:03 To validate my hypothesis, I researched the most commonly used YAML parsers in Ruby. Most of them are pure Ruby implementations. This suggests that developing a fast YAML parser using native extensions could be a beneficial addition to the Ruby ecosystem.
00:06:01 You can create a new Ruby gem with the command 'bundle gem fast_yaml', which initializes a gem structure with all necessary boilerplate code. For this native extension, there are specific files we'll work with.
00:06:17 The 'cargo.toml' file is where we declare our Rust dependencies, while the 'extconf.rb' file guides users on how to compile the native extension on their platform. Finally, we'll define the actual code in the 'lib/fast_yaml.rs' file, where we will parse the YAML content.
00:07:06 In the 'lib/fast_yaml.rs' file, we define a module named 'FastYaml' and create a method called 'parse' which will be our entry point in the Ruby code. When this method is called, it will invoke Rust code to produce a parsed Ruby structure.
00:08:02 Each time we call this method, it will process an input string, parse it, and return a Ruby-compatible object. This must be done carefully to ensure our Ruby objects are correctly instantiated.
00:08:29 Now, when users execute this code, the Ruby VM expects to find the compiled extension. So, we must compile it with 'bundle install', and then the resulting shared object will allow us to call our Rust-defined methods from Ruby.
00:09:06 This shared object file will link our Ruby code to the native extension, and we should pay attention to the installation process so that it runs smoothly across different platforms.
00:09:46 It's essential to note that when you require your native extension from Ruby, you must ensure the Rust code is compiled and linked correctly.
00:10:08 This work can feel a bit magical, as the connections between Ruby and Rust may not always be obvious. The next step in our process is to ensure that our basic Rust extension is functioning correctly, where we need to define the initialization method.
00:10:50 This init function is crucial because it's the first method invoked by the Ruby VM when loading the extension. We define it with the name 'init_fast_yaml', which follows a convention that Ruby expects for native extensions.
00:11:34 Within this function, we create a global Ruby function called 'hello_from_rust'. This function, when called, returns a Ruby string. It is important to remember that this string is created using Ruby's API to ensure compatibility.
00:12:12 We've successfully defined our basic Rust structure. As we may notice, going through everything can feel a bit disconnected unless we fully understand where the code needs to reside.
00:12:48 Next, we ensure that our entry point file correctly requires all necessary components. As we compile everything together, we ensure that our Rust file can be loaded successfully. We need to run 'bundle install' to install dependencies and then compile our Rust code.
00:13:24 Once we have compiled and our function is ready, we can test it by calling our 'hello_from_rust' method. This will confirm whether everything from the Rust code is now properly accessible within Ruby.
00:14:15 The compiled shared object file links our Ruby code with the native Rust code. When Ruby requires this library, it provides the bridge between the two languages. It's a smooth integration when done correctly.
00:14:50 However, while this process seems straightforward, it's crucial to understand the dependencies that may complicate the situation, particularly if we rely on external libraries, which may not follow the same conventions.
00:15:32 Let's continue building our YAML parser gem, focusing on functionalities we provided. From there, we will return a Ruby value based on the data processed through the Rust parser.
00:16:10 In our primary implementation, we develop the parsing logic that will handle various types of YAML input strings. This aspect is essential as we'll manage how we convert data from Rust to Ruby efficiently.
00:16:50 As we analyze our prototype further, we want to ensure that our performance benchmarks validate our speed claims compared to existing Ruby gems.
00:17:40 By running benchmarks, we can confirm our parser performs significantly faster than current options. For instance, we can ascertain that our Fast YAML parser processes input quicker than the established alternatives.
00:18:21 As we analyze various results from benchmarks, we discover that our implementation is efficient, and it reinforces our hypothesis that introducing a native extension can lead to performance improvements.
00:19:05 To make our gem production-ready, we must implement error handling properly. It is imperative that we handle any parsing errors elegantly and prevent crashing Ruby applications.
00:19:50 Instead of stopping the entire Ruby application when an error occurs, we can match the result of our parser and raise a Ruby error, which allows the program to handle exceptions gracefully.
00:20:32 Now, any errors caught can be transformed into a Ruby exception. This way, users will receive a friendly error message instead of the application crashing unexpectedly.
00:21:20 It's also crucial to handle memory management properly. Despite Rust's safety, improper use of memory can lead to leaks, so ensuring we are mindful of memory allocations and garbage collections remains essential.
00:22:05 I utilized Valgrind to check for memory leaks in my gem. By performing tests with and without my native extension in Ruby, I can compare results and identify any memory issues stemming from my code.
00:22:57 Running multiple tests helps confirm whether the memory usage increases as expected. It provides a robust method for ensuring stability before release.
00:23:40 Now that our gem is free of memory leaks and handles errors properly, we can consider publishing it. However, users need to have a smooth installation process.
00:24:20 One important aspect is pre-compiling the gem for popular platforms. This means that when users install the gem, they do not have to worry about downloading Rust or compiling the extension.
00:25:10 Many popular gems have this feature, allowing users to install the version compatible with their system directly. This can make for a much more pleasant user experience.
00:26:00 After compiling the gem for multiple platforms and ensuring that it runs successfully, I opened the repository to share it with all of you.
00:26:45 I welcome you to explore and even contribute if you feel inclined. One of the tasks that need to be addressed involves converting data types effectively, which can be an opportunity for improvement.
00:27:30 While we dare not rewrite every gem with Rust, embracing native extensions opens doors to advanced libraries and performance improvements in CPU-intensive tasks.
00:28:29 It's essential to understand that although this may be partially about performance, real usability should drive our exploration into Rust-native extensions for Ruby.
00:29:20 Thank you all for your attention! Please check my website for the talk slides, and you can find the Fast YAML gem on my GitHub.