Koichi Sasada

Write a Ruby interpreter in Ruby for Ruby 3

Ruby interpreter called MRI (Matz Ruby Interpreter) or CRuby is written in C language. Writing an interpreter in C has several advantages, such as performance at early development, extensibility in C language and so on. However, now we have several issues because of writing MRI in C. To overcome this issue, I propose to rewrite some part of MRI in Ruby language with C functions. It will be a base of Ruby 3 (or Ruby 2.7). In this talk, I'll show the issues and how to solve them with writing Ruby, how to write MRI internal in Ruby and how to build an interpreter with Ruby code.

RubyKaigi 2019 https://rubykaigi.org/2019/presentations/ko1.html#apr18

RubyKaigi 2019

00:00:00 Hello everyone, my talk is about how to write a Ruby interpreter. So the title is 'Write a Ruby interpreter in Ruby for Ruby 3'.
00:00:06 But first, it's unnecessary to mention that this interpreter will be in C. This talk is about a new proposal to write built-in method definitions in Ruby using C language.
00:00:15 As you know, MRI, the Matz Ruby Interpreter, is written in C. Additionally, some extension libraries are also written in C. This proposal aims to add a new framework for writing such libraries and extensions in Ruby.
00:00:36 Please note, this talk is not explicitly about writing a garbage collection virtual machine or implementing every feature in Ruby. However, if you are interested in writing an interpreter in Ruby, this presentation aims to provide valuable insights.
00:01:04 Today, I will showcase the challenges I faced, why I propose this solution, and introduce the technical aspects to achieve this. The goal is to enhance performance and startup time.
00:01:25 I'm Koichi Sasada, and I’ve been involved with Ruby for more than 15 years. I'm also a member of a group called Cookpad, where we provide daily updates. Please feel free to visit our booth.
00:01:44 Let’s address the background of my presentation. The MRI means the Matz Ruby Interpreter, which is predominantly written in C. Many C programmers contribute to developing the MRI.
00:02:01 Most Ruby methods, whether it's strings or arrays, are defined in C, with only a few methods written in Ruby. Unfortunately, we rarely utilize this feature.
00:02:21 This gives rise to how to implement filtering methods in C language. In one particular function, there are two main parts that handle definitions in C.
00:02:42 I hope you can understand the definition of a cross function here, which creates new functions. We define methods using these functions; for example, each string in this function is combined with parameters.
00:03:08 If we look at the string method of a particular character, it will return the number of parameters it expects. This is quite straightforward for C programmers.
00:03:32 For instance, the string length method in C can be defined similarly. The relevant method body is included here, and this function is called each time a length is computed.
00:03:59 Every time we invoke Ruby binary, it defines roughly 500 functions and about 2,000 methods during startup. However, there are several issues with these techniques.
00:04:23 I categorized these problems into four main areas. The first one is notational issues. Previous discussions highlighted that Ruby has several methods we often cannot fully understand.
00:04:50 For example, if we define a nil method with parameter messages, we can check its parameter name with another method. This method call returns a nested array of names, enabling us to understand what each parameter signifies.
00:05:16 However, for C methods, there may not be similarly defined parameter names, reflecting a lack of easily accessible backtrace information. This can significantly hinder performance profiling tools like StackProf.
00:05:37 Consequently, understanding behaviors and analyzing behaviors of C methods becomes crucial. This leads us to consider how methods function and their side effects.
00:06:04 For example, if we call a method with string literals in a loop, it continuously creates new string objects unnecessarily. However, if we know the method does not modify the parameters, theoretical optimizations can prevent this excessive object creation.
00:06:44 By defining methods correctly, we would avoid creating new objects when unnecessary, which can significantly speed up execution.
00:07:05 Another issue is the metadata surrounding method sizes. We do not accurately know how many classes or methods are defined until later in the process, impacting memory allocation and startup time.
00:07:30 If we can estimate this information up front, we can pre-allocate memory tables and improve startup performance. The second aspect is performance itself.
00:07:51 It's a well-known fact that C generally runs faster than Ruby. However, this isn't always the case; for instance, calling methods with keyword parameters can slow down the execution.
00:08:11 C implementations of certain operations can introduce added complexity, adversely affecting performance, even while Ruby may optimize certain operations to run faster.
00:08:38 For example, exception handling in C is more cumbersome and slower than Ruby's implementation due to its complexity. Furthermore, building simple methods in C can sometimes be more straightforward than utilizing more advanced features.
00:09:01 This presents the case that writing certain methods in Ruby could be preferable if performance isn't heavily impacted.
00:09:26 Lastly, we need to address updating the C API to accommodate a more concurrent design. For example, Ruby can benefit from techniques utilized in other Matz Ruby Interpreter implementations.
00:09:55 However, the current MRI implementation does not allow for context pointer access, which is crucial for enhancing concurrency capabilities.
00:10:23 This context data is necessary for achieving concurrent interpreters. Thus, we need some reevaluation of the API to accommodate this new direction.
00:10:51 In summary, we need to consider these various factors, including notation, performance, productivity, and contextual requirements.
00:11:05 I believe that Ruby is an excellent candidate for implementing DSL because of its friendly syntax, making it suitable for this purpose.
00:11:37 Let me show you an example of how we can implement this using a Ruby-like syntax instead of C, making the calls similar to more common Ruby code.
00:12:11 For instance, in a string manipulation operation in Ruby, we can express this beautifully. Additionally, we can introduce foreign function interfaces with specific calls to C functions.
00:12:38 This method of calling functions allows a seamless transition between Ruby and C, as well as the necessary access to parameters and pointers.
00:13:06 I envision Ruby-method definitions being enhanced through this approach, preserving the language's elegance while accessing lower-level programming functionalities.
00:13:34 However, the implementation state is still evolving, and some keywords might not be fixed or finalized. It's important for Ruby core committers to ensure proper integration.
00:14:01 Moreover, the attributes of these methods need careful consideration, particularly regarding their side effects.
00:14:24 Annotation of methods can allow developers to indicate functions that are pure and devoid of side effects, facilitating easier debugging and function optimization.
00:14:54 We propose creating easier notations to apply functions written in Ruby and compiled alongside their C counterparts, thereby improving compatibility.
00:15:20 Let's iterate on defining methods in Ruby, combining well-structured Ruby source code with the efficiency of C implementation to realize our goal.
00:15:45 I'm proposing that we write Ruby code at an initial stage while later analyzing it to generate valid annotations and improve function usability according to best practices.
00:16:07 Introducing an array of new coding functions in C—functions that don't exert a particularly heavy impact on the MRI—can facilitate these goals.
00:16:31 Some solutions will be explored regarding performance improvements derived from reworking the API to build a concurrency-aware interpreter.
00:16:59 I have brought forth many ideas I believe can address the concurrent requirements of Ruby effectively while maintaining compatibility.
00:17:25 This proposal emphasizes mixing Ruby and C languages so that developers can opt for either approach based on their needs.
00:17:51 For cases demanding complex features, Ruby capabilities would be beneficial, but we can still tap into the efficiency of C code when performance mandates it.
00:18:21 However, we must remain mindful of potential pitfalls surrounding timing issues, such as garbage collection and method invocation.
00:18:50 Ultimately, achieving a balance in interpreter design prompts the careful and strategic implementation of Ruby methods.
00:19:16 In doing so, I plan to unveil various hacking techniques that may contribute to a more rapid Ruby interpreter implementation.
00:19:42 I worked on enhancing the compiled binary format to yield improved startup times in Ruby programs.
00:20:06 As I discussed the theoretical framework, time constraints prevented me from delving into more technical details, so I focused on the digestible key aspects.
00:20:35 The first technical improvement involved the implementation of foreign function interfaces using new instruction types.
00:20:55 In summary, by introducing new virtual machine instructions, we are better positioned to streamline function calls and reduce overhead.
00:21:18 This optimization results in faster execution times, which we can measure against both C implementations and previous Ruby implementations.
00:21:43 It’s critical that we also consider maintaining the essence of Ruby code while minimizing the complexity introduced by C code in the underlying implementation.
00:22:08 Optimizing parameters and supporting keyword arguments are areas that we are currently looking into for further enhancement.
00:22:33 The future agenda includes introducing overloading mechanisms in our designs to push performance improvements, hence implementing a Ruby to C seamless interaction.
00:23:00 While I might not get to dive into the compilation binary format details today, I want to emphasize its role in improving Ruby's efficiency.
00:23:27 Currently, our framework is set to provide support for larger binaries and improve data handling through the shared use of compiled elements.
00:23:51 By working on several methods of compilation, we can effectively reduce the startup time for Ruby programs, ensuring performance sustainability.
00:24:20 In the method of lazy loading, the focus is on optimizing the resource allocation based on actual need, avoiding unnecessary memory usage.
00:24:48 This technique has been beneficial, especially in instances where class definitions may be extensive but only a subset is invokeable.
00:25:08 Through analyzing the performance improvements achieved via these optimizations, we've seen considerable reductions in load times.
00:25:32 Ultimately, I aim to demonstrate how our ideas can bind Ruby and C together while producing exceptional efficiency, thereby addressing modern programming demands.
00:26:06 Thank you so much for your attention during my talk. Your engagement and thoughts are greatly appreciated.