Koichi Sasada

Ractor Enhancements, 2024

RubyKaigi 2024

00:00:07.520 Hi, good afternoon. This talk is the last one at the LUIC on the first day.
00:00:14.960 Today, I want to present a progress report on the Lure system for this year.
00:00:21.119 I'm Koichi Sasada from Stores.
00:00:27.359 I will discuss some improvements and implementations for significant features in ROIs.
00:00:34.160 These features include require and timeout, and I will also present a survey analyzing memory management in the Lure system.
00:00:41.200 I'll highlight some potential future pull requests as well.
00:00:49.079 Let me introduce myself again. My name is Koichi Sasada, and I work at Stores.
00:00:57.879 I am very happy to be here this year, and I am the author of the YB machine, garbage collection in the Lure system, and the scheduler system.
00:01:06.320 Furthermore, I am the director of the Ruby Association.
00:01:13.600 So, let's start the talk. Lure is a Ruby on Rails component introduced in Ruby 3.0 and is designed to enable parallel computing in Ruby.
00:01:22.560 Ruby has threads, but threads do not run in parallel on MRI (Matz's Ruby Interpreter) because parallel computing with threads can introduce critical bugs.
00:01:30.560 To prevent such issues, we designed the Lure system.
00:01:35.759 This system features robust concurrent programming.
00:01:42.720 Its robust nature enables bug-free parallel computing due to no object sharing or very limited object sharing.
00:01:52.079 However, limiting object sharing between Lure threads requires introducing strong limitations to the Ruby language.
00:01:57.840 We separate all objects into unshareable objects and shareable objects. Most Ruby objects, such as strings, arrays, and most user-defined classes and modules, are unshareable.
00:02:13.959 We also have some special shareable objects like immutable objects, some special classes, modules, and Lure objects themselves.
00:02:22.840 Because of this separation, we define that shareable objects can be shared between Lure threads, while unshareable ones cannot.
00:02:31.000 Although this is a simple rule, maintaining it necessitates introducing many strict regulations.
00:02:37.879 For example, we cannot define constants containing unshareable objects.
00:02:43.560 A common Ruby example would be 'C = large_string', but since string objects are unshareable, this is a problem.
00:02:49.440 The main Lure means that only the main Lure can access constant C.
00:02:53.280 Other non-main threads, which we call child threads, cannot access this constant.
00:03:06.760 The same restriction applies to global variables, which are also prohibited from being accessed by child threads.
00:03:13.680 Due to these strict rules, we lack many important features such as require or timeout.
00:03:19.560 We also observe critical performance degradation in memory management and possibly other issues.
00:03:27.479 Today, I will show you how we can enable these important features in Lure.
00:03:35.560 First, let's discuss the require issue.
00:03:42.600 Currently, the Lure system cannot require any features for child threads because require accesses global state.
00:03:55.800 Many features rely on various rules that limit sharing.
00:04:00.680 As a result, we have prohibited require from child threads.
00:04:06.960 However, in many situations, we need the ability to require in child threads. For instance, some methods depend on features that need to be required.
00:04:21.200 Take the PP method, for example; it required the PP library at its first call.
00:04:30.680 Unfortunately, due to our limitation, we cannot call PP.
00:04:39.880 This limitation affects the usability of our programming environment.
00:04:45.680 Additionally, the old ROIs concern the need for access to constants.
00:04:53.280 If we aim to support the old ROIs, we need to allow requires from child Lures.
00:05:06.960 So, the conclusion is that we need to support requires from child Lures while ensuring that all require calls belong to the Main Lure.
00:05:17.120 To enable the LI method in child threads, we will introduce a new method, 'Lure#interactive_exec'.
00:05:21.200 This method invokes the expression on the main Lure.
00:05:26.360 For example, 'main_lure' will execute the block provided.
00:05:35.200 This method interrupts the main thread and runs this assignment to the global variable.
00:05:41.360 This method processes the expression asynchronously.
00:05:48.079 It means that the method does not wait for the result of the expression between blocks.
00:06:02.960 This method is a powerful feature, enabling various systems to be built around it.
00:06:10.520 However, this method also carries risks, much like handling traps and sending signals.
00:06:15.680 The interrupt mechanism can disrupt blocking calls, like a lead method waiting on network calls.
00:06:22.400 When an interrupt signal occurs, it might wake up the lead method.
00:06:31.000 Therefore, the `Lure#interactive_exec` method does similar things.
00:06:40.160 This figure illustrates how interactions happen in Lure.
00:06:47.520 First, the child Lure calls this method, then interrupts the main thread.
00:06:54.720 It runs the expression without waiting for its result.
00:07:01.680 After calling this method, the rest of the logic continues to run.
00:07:05.680 Using this Lure#interactive_exec method, we can accomplish the Lure require method.
00:07:15.120 By calling Lure#interactive_exec, we create a new thread to require on the main thread.
00:07:21.200 Child Lures need to wait for the result of this require.
00:07:27.040 This avoids potential deadlocking scenarios.
00:07:33.680 The diagram demonstrates this Lure require process.
00:07:40.000 Child Lure calls this method, runs the logic on the main Lure, and then waits for the required feature.
00:07:48.520 Most of the time, the require will succeed, returning true or false.
00:07:55.600 However, sometimes, it will raise load errors or exceptions, which we need to handle.
00:08:04.480 Thus, we need to check for various types of errors that may occur.
00:08:09.680 Now let’s look into how we prepare our Lure require method.
00:08:15.120 The Lure could require successfully by using the parent Lure ID.
00:08:21.200 If the current Lure is not the main Lure, we need to add a line to our require method.
00:08:30.360 There shouldn't be an issue with the logic we outlined for this process.
00:08:37.239 However, we need to consider overriding the require method in libraries that developers use.
00:08:47.960 Libraries like RubyGems or others may override require to provide custom functionality.
00:09:04.200 This means if we change the require method, custom libraries might not behave as expected.
00:09:11.600 Therefore, each library overriding require needs to call the Lure require method.
00:09:18.560 To achieve this, it is crucial to communicate such requirements to library developers.
00:09:24.840 Another approach could introduce a module to check for such requirements and ensure no conflicts arise.
00:09:32.960 This would allow us to create a 'Lure aware require' module ensuring consistent behavior.
00:09:42.960 However, the challenge lies in ensuring that the ancestor tree contains this Lure aware module.
00:09:50.720 I welcome any discussion about how to merge this feature effectively.
00:10:00.400 Now, shifting gears from Lure, I want to touch on the timeout feature.
00:10:05.040 The current timeout mechanism creates a one-time timeout monitor thread.
00:10:13.040 This provides other threads the ability to ask for the exception to be raised if a timeout of one second is met.
00:10:22.360 However, this communication flow currently only works on our thread system.
00:10:30.280 Thus, child Lures cannot communicate with the timeout monitor in other Lures.
00:10:38.560 The existing timeout method is, therefore, not supported in Lure.
00:10:46.720 A simple solution would be to create a timeout monitor for each Lure.
00:10:56.760 This means two Lures would each have their own respective timeout monitor threads.
00:11:03.880 This is relatively easy and should take about thirty minutes to implement, but...
00:11:14.080 If we scale this approach to thousands of Lures, we could end up with thousands of timeout monitors, which is not ideal.
00:11:24.520 Alternatively, we could create a new communication path that allows child Lures to reach the main Lure's timeout thread.
00:11:32.240 However, implementing this is quite challenging.
00:11:41.120 In my last presentation two years ago, I discussed how to introduce a timer thread.
00:11:48.720 This thread would manage timer events like I/O interrupts.
00:11:56.000 I propose using a native thread for this timer management.
00:12:03.920 The main Lure and other Lures can request to register or unregister timeout events.
00:12:12.760 This design is still up for discussion, but it’s a starting point for timeout management.
00:12:20.840 The new timeout_exec method accepts a duration in seconds.
00:12:29.840 We would also need to define what occurs when a timeout occurs.
00:12:36.480 With the introduction of this feature, we could facilitate timeout management through Lures.
00:12:45.680 Most timeout management systems do not throw timeout errors but handle register and unregister procedures.
00:12:53.560 I performed some benchmarks, where I initiated a new task that should take zero seconds.
00:13:01.960 Repeating this process a million times on the current thread system took five minutes, whereas the native timing approach only required three seconds.
00:13:09.560 It's not significantly faster, but still an improvement.
00:13:16.840 The slowdown stems from how we interact with the hardware clock.
00:13:24.040 Switching to another API that works better yielded a speedup.
00:13:32.120 This new API allows for some error tolerance, up to four milliseconds, which suffices for our purposes.
00:13:39.200 The result is an approximate two times improvement in performance.
00:13:45.360 In the final five minutes, I want to discuss performance issues we've encountered.
00:13:55.560 I usually demonstrate this example by creating 50,000 Lure objects and sending messages to each in succession.
00:14:01.840 This allows us to measure how much time is required to circulate around this linked structure.
00:14:10.840 Using M threads can provide significant time savings.
00:14:17.840 We have seen performance increases between 10 to 70 times, depending on whether garbage collection is enabled or not.
00:14:24.200 However, creation time when instantiating 50,000 Lures also poses a performance challenge.
00:14:32.840 Last time we noticed this with garbage collection enabled, it significantly slowed down the process.
00:14:38.560 Comparing Lures without garbage collection shows how detrimental it can be.
00:14:45.680 Currently, we see that the one garbage collection cycle is slower than corresponding non-Lure systems.
00:14:51.760 After running some additional benchmarks, we honed in on the problem.
00:14:57.760 Removing the extra layers, we examined performance from the Lure system.
00:15:05.960 The task was to create additional arrays.
00:15:12.000 The expectation was that it should scale effectively, but instead, we noticed excessive garbage collection.
00:15:20.520 Task counts weren't proportional; in fact, they were higher on the Lure system.
00:15:26.040 We must understand the interaction these counts have with the system as a whole.
00:15:32.160 In particular, each Lure that allocates memory affects garbage collection across the system.
00:15:39.760 In conclusion, we observe that increasing Lures inevitably leads to more garbage collection.
00:15:47.360 With the increased demands on management in Lure threads, sustaining efficiency is more complex.
00:15:54.520 We must focus on ensuring manageable garbage collection, monitoring, and mitigation strategies.
00:16:00.920 This presentation proposes new methods to implement required timeout features and enhance memory management in Lucre.
00:16:07.760 I hope you will support us as we work towards these improvements.
00:16:14.320 Thank you very much.