List

Rails and the Ruby Garbage Collector: How to Speed Up Your Rails App

Rails and the Ruby Garbage Collector: How to Speed Up Your Rails App

by Peter Zhu

In his talk at Rails World 2023, Peter Zhu, Senior Developer at Shopify and member of the Ruby Core team, explores how to optimize Ruby's garbage collector (GC) to enhance the performance of Rails applications. He begins by defining the garbage collector's function, emphasizing that it doesn’t just reclaim data but manages the entire lifecycle and memory allocation of objects. Zhu provides a detailed explanation of how Ruby's marking garbage collector operates, including its various phases: marking, sweeping, and compaction.

Key points discussed include:

  • Garbage Collector Mechanism: Ruby uses a marking and sweeping technique to identify and reclaim unused objects. The compaction phase, though not enabled by default, helps reduce fragmentation and optimize memory usage.
  • Generational Garbage Collection: Ruby follows a generational hypothesis where objects are categorized based on their longevity. Young objects are monitored closely, while old objects, which have survived multiple GC cycles, are treated differently during collection cycles.
  • Write Barriers: These are mechanisms to ensure that the garbage collector is informed when references are created between old and young objects, helping to prevent issues like dangling references.
  • Metric Collection: Zhu discusses how to use gc.stat to monitor garbage collection metrics within Rails applications, providing insights that can highlight potential performance optimization areas.
  • Tuning Strategies: To enhance performance, he advises reducing object allocations, which subsequently minimizes the frequency of garbage collection cycles. Specific environment variables can be adjusted to control GC behavior.
  • Case Study: Zhu highlights the performance improvements from tuning the garbage collector in Shopify’s Storefront Renderer application, achieving a 13% improvement in average response times by reducing GC duration significantly.
  • Autotuner Gem: A new tool called Autotuner is introduced as a means for Rails developers to analyze their garbage collector settings and receive recommendations for optimizations.

In conclusion, Zhu emphasizes that effective garbage collector tuning can lead to noticeable performance improvements in Rails applications. The session encourages developers to utilize tools and metrics to better manage memory and optimize application performance.

The Ruby garbage collector is a highly configurable component of Ruby. However, it’s a black box to most Rails developers.

In this talk, @shopify Senior Developer & member of the Ruby Core team Peter Zhu shows you how Ruby's garbage collector works, ways to collect garbage collector metrics, how Shopify sped up their highest traffic app, Storefront Renderer, by 13%, and finally, introduces Autotuner, a new tool designed to help you tune the garbage collector of your Rails app.

Slides available at: https://blog.peterzhu.ca/assets/rails_world_2023_slides.pdf

Links:
https://rubyonrails.org/
https://railsatscale.com/2023-08-08-two-garbage-collection-improvements-made-our-storefronts-8-faster/
https://github.com/shopify/autotuner
https://blog.peterzhu.ca/notes-on-ruby-gc/
https://shopify.engineering/adventures-in-garbage-collection

#RailsWorld #RubyonRails #rails #rubygarbagecollection #autotuner

Rails World 2023

00:00:15.160 Hello, Rails World! My name is Peter, and I'm a senior developer at Shopify on the Ruby infrastructure team, as well as a member of the Ruby core team. In this talk, I'll discuss the configuration and tuning options you can use to optimize Ruby's garbage collector for your Rails app.
00:00:26.640 You can find the slide deck at this URL or by scanning this QR code. Don't worry; I'll keep this URL visible for a few slides.
00:00:34.079 First, let's discuss what a garbage collector is. Contrary to its name, garbage collectors do more than just reclaim data objects; they manage the entire lifecycle of objects. Garbage collectors are responsible for allocating memory when an object is first created and recycling that memory once the object is no longer needed.
00:00:46.840 Garbage collectors determine when an object is no longer alive by tracking the lifetimes of objects. There are various techniques for this, such as reference counting and marking. Ruby employs a marking garbage collector, and we'll delve into what that means later in this discussion.
00:01:09.040 As you may know, Ruby is a garbage-collected language, meaning that you are not required to manually allocate memory for objects and subsequently free that memory when you are done using it. Let’s take a closer look at Ruby objects under the hood.
00:01:28.680 Every Ruby object resides in a region of memory called a slot, allocated specifically for that object. These slots are obtained from the garbage collector, and each slot has a fixed size that cannot be resized. Sometimes, we may need to allocate more memory than the slot provides, and in that case, we'll allocate memory externally, such as through the system using malloc.
00:01:42.320 The garbage collector does not allocate slots individually from the system; instead, it allocates in units of heap pages. In Ruby, heap pages are 64 kilobytes in size, and all slots within the page are of the same size. This design choice promotes simplicity and helps avoid external fragmentation.
00:02:01.799 External fragmentation occurs when smaller objects are allocated between larger objects, leading to gaps that larger objects can no longer utilize. Ruby 3.2 introduced a new feature called variable-width allocation, which allows for the allocation of dynamically sized objects through the garbage collector. Prior to this feature, all slots were fixed at 40 bytes, which forced objects requiring more memory to allocate externally.
00:02:28.519 This allocation method was detrimental to memory efficiency and overall performance. Variable-width allocation enables dynamic-sized allocations and introduces a new concept known as size pools. Each size pool holds a collection of pages, and every page within a size pool has the same slot size. Currently, Ruby has five size pools with slot sizes of 40, 80, 160, 320, and 640 bytes.
00:02:49.280 These sizes were selected to ensure optimal memory efficiency and performance. To recap, we discussed the data structures involved in garbage collection: slots hold data for individual objects, these slots reside in pages, and multiple pages exist within the system—each with varying slot sizes. The same slot sizes are grouped into size pools.
00:03:32.360 Now that we’ve examined the data structures inside the garbage collector, let's discuss how these objects are allocated. Each heap page maintains a linked list of empty slots, referred to as the free list. To allocate an object, we simply pop an element from this free list. Here’s a diagram illustrating how object allocation works.
00:04:02.360 This heap page shows a few allocated objects along with a free list that connects all the free slots on this page. If we want to allocate a new object, we do so at the head of the free list, then move the free list pointer to the next element. We can continue to allocate objects until we exhaust the free slots.
00:04:50.240 If we reach a state where no more free slots are available, we must either allocate a new page or initiate a garbage collection cycle to free up slots. Next, let’s examine what occurs when Ruby's garbage collector runs.
00:05:15.199 Ruby employs a mark and sweep garbage collector, which consists of two phases. In the mark phase, we traverse the object graph and mark the objects we encounter. In the sweeping phase, all unmarked objects are deemed dead and can be reclaimed by the garbage collector.
00:05:39.599 Technically, there is a phase between marking and sweeping, called the compaction phase; however, it is crucial to note that compaction is not enabled by default. If enabled, compaction moves objects around in Ruby's heap to minimize fragmentation, which can lead to reduced memory usage and improved runtime performance.
00:06:06.599 Ruby utilizes what's known as a stop-the-world garbage collector. This means that the execution of Ruby code is paused while the garbage collector runs. This simplification of implementation avoids the need for synchronization and the risk of race conditions. However, the longer the garbage collector runs, the longer the application pauses.
00:06:44.240 In a Rails app, if this pause occurs during a request, it can lead to delays in response times. The garbage collection cycle begins with the marking phase, which involves determining which objects are alive by traversing the object graph and reading the references to the objects using a breadth-first traversal algorithm.
00:07:10.360 For example, when dealing with an array, the references will consist of the array's contents; in the case of a hash, the references will be the keys and values. For an object, the references are the object's instance variables. Ruby uses a color marking system for tracking objects, with three colors.
00:07:44.280 All objects begin as white, indicating they are unmarked. When an object is marked, it is colored gray, signifying that the object has been marked but its references have not yet been traversed. After traversing the references of gray objects, they are marked black. At the end of the marking phase, all unmarked objects are no longer reachable and can be reclaimed.
00:08:21.840 Here’s an example of a Ruby heap with several objects, and object references are illustrated with arrows. For instance, object A refers to object F, which in turn refers to object I. We start marking from the root objects, which include objects on the stack, global variables, and top-level constants.
00:09:06.880 Root objects are colored gray to indicate that we have visited them but not yet their references. We then select gray object A, visit its references, and color object A black, while traversing reference object F and coloring it gray. We repeat this process for other gray objects, leading us to the conclusion that white objects represent dead objects that can be reclaimed.
00:09:44.760 Now I'll add some complexity to the marking phase. Ruby employs a generational garbage collector based on the generational hypothesis, which suggests that objects either live for a long time and remain essentially immortal, or live for a short duration. In a Rails app, most objects created during boot are immortal, such as classes, constants, and database connections.
00:10:24.560 During requests, the objects created typically exist only for the duration of that request, like ActiveModel objects, logging entries, and other supportive objects. Many objects either last forever or only for a very short time.
00:10:59.440 In a generational garbage collector, newly created objects are categorized in the Young Generation. In Ruby, if they survive for three garbage collection cycles, they are promoted to the Old Generation. We leverage this categorization by splitting marking into two types: minor garbage collection cycles, which only mark young objects, assuming all old objects are already marked.
00:11:31.560 This approach makes garbage collection cycles quicker because it processes significantly fewer objects. However, sometimes we need to mark all objects to reclaim old objects, which leads to a major garbage collection cycle, marking both young and old objects. Major cycles can be ten to twenty times longer than minor cycles and should be minimized in Rails applications.
00:12:07.680 Let’s look at an example of a minor garbage collection cycle using a heap with generational data. Here, some objects are considered young while others are old. During a minor garbage collection cycle, we assume all old objects are already marked black. We begin by marking all old objects and then process young objects starting from the root.
00:12:59.680 We continue marking the root objects as gray, and traverse references of object A, skipping object F since it's already marked black. We then check references of object B, marking both objects G and H as gray and progress through our references accordingly. The only white object left is object Z, and when sweeping occurs later on, we reclaim that object but must run a major garbage collection cycle to reclaim dead objects.
00:13:42.200 In Ruby, you can create references from an old object to a young one, but since old objects are not traversed during minor garbage collection cycles, the young object may remain unmarked and appear dead to the garbage collector. This bug, known as a dangling reference, can cause issues, leading to crashes or unexpected behavior in applications.
00:14:12.480 This situation is addressed through write barriers. Write barriers act as callbacks to inform the garbage collector when a reference is established from one object to another. For references of the same age, write barriers have no effect, but when an old object points to a young object, the write barrier places the old object in a remember set.
00:14:58.360 The remember set contains objects that will be traversed during every minor garbage collection cycle. However, write barriers are not mandatory, and some types in Ruby and certain native extensions do not support them. Objects that support write barriers are labeled as write barrier-protected, while those that do not are categorized as write barrier-unprotected.
00:15:38.880 So, what do we do about write barrier-unprotected objects? They lack support for write barriers, making it difficult to know when a reference is added to them. This is managed by always placing write barrier-unprotected objects in the remember set. As these objects are marked in every minor garbage collection cycle, we effectively lose the benefits of generational garbage collection for them.
00:16:25.360 Let’s explore an example. Suppose we have an old object referencing a young object and another write barrier-unprotected object. Note that the write barrier-unprotected object doesn't have an age classification since it doesn't participate in the generational garbage collection process.
00:16:57.360 During a minor garbage collection cycle, every old object is initially colored black. Meanwhile, any object referencing the remember set is marked gray, ensuring that these references are inspected. We proceed with the marking process as usual, picking gray references and marking the objects they reference as gray.
00:17:30.720 Now that we’ve established the liveliness of every object, the sweeping phase can reclaim the resources of dead objects. Aside from reclaiming the slot for the dead object, we also free any externally allocated resources, such as closing file descriptors, sockets, or terminating threads.
00:18:21.440 Here's an example of the sweeping phase. We inspect one object at a time, starting with object A. If it’s marked, we skip over it. However, if it’s unmarked, we reclaim it and thus create an empty slot for future allocations. We continue this process, going through the rest of the slots on the page until all objects have been assessed.
00:19:20.240 The compaction phase in Ruby's garbage collector serves to reduce fragmentation within the Ruby heap. It organizes objects so that all live objects are positioned contiguously, which can lead to decreased memory usage and improved performance. Compaction helps save memory, as it results in more completely empty heap pages that can be returned to the system by the garbage collector.
00:20:14.160 In the context of forking web servers, such as Unicorn or Puma, compaction decreases the amount of copy-on-write, resulting in a reduced memory footprint for your Rails app. Additionally, it improves runtime performance through better CPU cache locality and can expedite garbage collection cycles.
00:20:56.160 Variable-width allocation leverages compaction by moving objects into more optimal sizes. Since objects can vary in size during runtime, this mechanism can lead to further reductions in memory usage and performance improvements. However, keep in mind that compaction is not enabled by default.
00:21:41.440 To manually run compaction, you can invoke gc.compact or set gc.auto_compact to true. Enabling auto-compaction will cause compaction to execute with every major garbage collection cycle.
00:22:12.000 While I couldn't provide a live demo today on how compaction works, that covers a brief introduction to how Ruby's garbage collector functions. If you’d like a more thorough article discussing the garbage collector with examples, check out my blog post linked at this URL.
00:23:41.200 Now that we understand internal workings of the garbage collector, let's apply this knowledge to collecting metrics from it. These metrics will provide insight into your Rails app and highlight possible areas for performance optimization. The method to retrieve basic statistics from the garbage collector is gc.stat. This method offers information such as the total number of garbage collection cycles executed.
00:24:48.840 Minimizing garbage collection cycles is crucial; more cycles mean longer and more frequent pauses during requests. Ideally, we want to avoid executing more than one garbage collection cycle per request. The longer the cycle, the more likely it is that an application will encounter problematic delays.
00:25:43.440 Using gc.stat, we can also quantify the amount of time spent inside the garbage collector (in milliseconds). This information can reveal the proportion of time taken by the garbage collector during each request. If this proportion is significant, tuning the garbage collector may result in noticeable performance improvements. Conversely, a low proportion indicates that tuning may not yield substantial gains.
00:26:20.240 Additional insights from gc.stat include the marking and sweeping phases, allowing you to determine where the longest delays happen. It also lists the total number of pages allocated, which closely correlates with how much memory your Rails app is consuming. While increasing the number of pages can decrease the frequency of garbage collections, it may also lead to a memory performance trade-off.
00:26:59.360 The number of live objects provides insight into the count of currently active objects, while the free slots count indicates how many slots are available for future object allocations. When the count of free slots reaches zero, a garbage collection cycle will run, although the collector may also execute for other reasons.
00:27:43.360 We also track the total object count allocated since Ruby’s startup, which can help identify the number of objects allocated during a specific request. If this number is disproportionately high, you may want to consider optimizing your app to reduce object allocations. The stats from gc.stat also reveal the number of major and minor garbage collection cycles.
00:28:30.000 In a well-optimized Rails application, the majority of garbage collection cycles should be minor. If major cycles occur too frequently, your setup might be poorly tuned, or you could have a Ruby-level memory leak. For instance, if a logging buffer retains entries and is never flushed, it will lead to such a leak.
00:29:33.440 The number of old objects alive is another important metric. With Ruby's generational garbage collection system, objects are initially labeled young, becoming old after surviving a specified duration. If the number of old objects increases post-boot, this might indicate poor tuning or a memory leak in your Ruby application.
00:30:14.320 In Ruby 3.2, a new feature, gc.stat_heap, provides statistics about each size pool. There are five size pools, named from 0 to 4, and the returned hash contains the statistics for these pools. The statistics include the size of slots in bytes and the number of pages and total slots in each pool.
00:31:05.840 With this information, you can ascertain the distribution of object sizes in Ruby, which can inform decisions regarding which size pools are consuming the highest memory within your Rails application. Additionally, observing the number of times major garbage collection cycles are enforced by size pools helps assess if they are sized appropriately.
00:32:10.240 Now that we've explored how to gather and interpret garbage collector metrics, let's discuss some tuning strategies tailored for your Rails application to enhance performance.
00:32:26.560 It’s imperative to allocate fewer objects overall, as this reduces the pressure on the garbage collector. Fewer allocations lead to less frequent garbage collection cycles, and each cycle entails fewer objects to scan during the marking phase and fewer for sweeping.
00:33:36.560 A recommended strategy is to identify controllers that produce high object allocations and explore potential optimizations to minimize those allocations. By reducing the frequency of garbage collection cycles, you'll spend less time within it.
00:34:21.840 This is especially true for major garbage collection cycles, which can often consume ten to twenty times more time than minor cycles. So how can you actually reduce the frequency of major garbage collection cycles?
00:35:04.000 One approach is to set the GC_HEAP_INIT_SLOTS environment variable, which configures the initial number of slots in the garbage collector’s heap. This setting allows the garbage collector to bypass growing the heap during boot, immediately reaching the set size.
00:36:04.880 To find optimal values for this variable, measure the number of slots in the heap after your app has achieved peak performance in production. However, be cautious: if this value is set too high, your app may consume significantly more memory.
00:36:47.120 Another environment variable worth considering is GC_OLD_MALLOC_LIMIT, which determines the threshold for memory allocations before a major garbage collection cycle is triggered. The default limit is 128 megabytes. At Shopify, we set this variable to a very high value to essentially disable this trigger, resulting in a significant performance boost.
00:37:16.400 We also implement an out-of-band garbage collector, which runs the collector between requests while the Rails app is idle. The goal is to minimize the execution of the garbage collector during incoming requests. However, this approach can be challenging to execute with threaded web servers like Puma due to multiple threads operating concurrently.
00:38:16.200 For instance, if one thread is running the garbage collector while others are serving requests, this could lead to potential stalls. While synchronization could help prevent this issue, it may also reduce server capacity as some threads wait for others to finish.
00:39:25.560 The out-of-band garbage collector may be best suited for forking web servers, such as Unicorn or Puma in clustered mode. Implementing it can be complex, primarily because we lack the foresight to gauge when the collector will be triggered.
00:40:21.200 Instead, we are left with periods of stalling in-between requests, which reduces capacity as time is spent doing little. Should we not execute the garbage collector frequently enough, it will become ineffective, leading to the traditional issue of GC processing during requests.
00:41:14.960 Let's delve into the impact of tuning the garbage collector on one of Shopify's highest traffic applications, known as Storefront Renderer. This application is responsible for rendering the homepages and product pages that our buyers interact with, crucial for ensuring quick response times.
00:42:06.360 After tuning the garbage collector, we successfully reduced the average time spent within the garbage collector by 50%, and the 99th percentile duration dropped by 80%. What does this mean in terms of response times?
00:43:03.200 The average response time improved by approximately 13%, while the 99th percentile response time saw a 25% decrease. These performance enhancements are particularly significant, especially given the time spent on I/O operations during requests, such as interactions with the database or caching.
00:43:52.880 Importantly, these gains weren’t solely due to changes in the garbage collector; we also analyzed garbage collection and learned how to enhance Rails app performance in Ruby 3.3. A new configuration variable called the remembered write barrier unprotected objects limit ratio computes the cap on the number of write barrier unprotected objects in the remember set.
00:44:45.440 When the limit is reached, it initiates a major garbage collection cycle. Previously, this limit was determined as twice the total number of write barrier unprotected objects present post-garbage collection. However, if only a few survived, this limit would be unrealistically low, leading to frequent triggers for major cycles.
00:45:38.880 The new limit is calculated as a percentage of old objects, defaulting to 1%. Implementing this feature within Storefront Renderer resulted in a 33% improvement in average garbage collection time and halving the 99th percentile.
00:46:36.080 This change translated into a 4% decrease in average response times and a 14% reduction in 99th percentile response times. The subsequent feature reworked the algorithm governing references from old objects to young objects.
00:47:28.240 In the previous iteration, young objects were instantly promoted to the old generation, resulting in temporary objects frequently residing in the old category. For instance, logging buffers can introduce this behavior, as they might store log entries but flush regularly.
00:48:17.880 The new algorithm only places the old parent object in the remember set when a reference to a young object is added. This permits the young object to age typically and only promote it to the old generation if it survives a sufficient duration. This adjustment improved our average garbage collection time by 19%, and the 99th percentile by 12%.
00:48:52.960 The effects of this shift resulted in a 4% decrease in average response times and an 8% decrease in 99th percentile response times. For additional insights into these upcoming Ruby 3.3 features, feel free to refer to my blog post, where I discuss the details more thoroughly.
00:49:56.560 In conclusion, we covered a range of topics in this talk—ranging from an introductory overview of the garbage collector to methods for collecting metrics, tuning strategies, and exciting new features in Ruby 3.3. There’s a chance you might have left with more questions than before, but I am here to share good news.
00:50:46.480 I'm excited to announce the launch of the autotuner gem! You are among the first people to hear about this. I've been diligently working on this gem over the past few months, and it analyzes the garbage collector between requests, offering tuning suggestions for your app.
00:51:30.560 You can find the source code for this gem on GitHub at github.com/Shopify/autotuner. Setting up this gem is remarkably straightforward—just follow two simple steps in the README to start receiving garbage collector tuning recommendation statistics.
00:52:12.480 After setup, you will receive tuning suggestions that describe the issue addressed and its potential impact on your app. For instance, a suggestion might entail reducing the frequency of minor garbage collection cycles to improve response times, but it could lead to increased memory usage.
00:52:57.840 The suggestions include the environment variables necessary to implement the changes, as well as advice on conducting A/B testing to evaluate the impacts of these modifications.
00:53:39.760 Now, let’s discuss how we conduct tuning experiments and compare their impacts at Shopify. Initially, all servers start untuned, meaning no adjustments are applied. Once autotuner provides a suggestion, we designate a small random portion of servers as the experimental group. If your app experiences high traffic, this might represent about 5%; for lower traffic apps, it may range around 30%.
00:54:39.680 This strategy keeps the experimental group small in case of significant performance degradation due to tuning while remaining large enough to yield meaningful comparative results.
00:55:22.240 After we've applied the tuning, we compare the effects to the untuned group. Request response times fluctuate due to various factors including the type of request, server load, and external variables like database response times, which makes measuring tuning impacts over separate time intervals challenging.
00:56:08.000 To ensure more reliable comparisons, we prefer matching performance impacts over the same time period. If tuning yields a positive performance result, we create a stable tuning group that mirrors the experimental group size, but only incorporates tunings that produced better performance.
00:56:54.080 Following this, we replicate the tuning settings across both groups and move forward with the next suggestion for the experimental group as we evaluate its effectiveness against the stable group.
00:58:00.240 If the new tuning doesn’t provide positive results or leads to unwanted trade-offs—such as improvements in average response times but degradation in extremes like 99th or 99.9th percentile performance—we can easily discard it.
00:58:15.760 We continue this experimentation process with new tuning suggestions one at a time in the experimental group, consistently applying those that yield positive performance to the stable group and disregarding those that do not.
00:59:10.000 When we’re satisfied with the results from our tuning experiments, we then analyze the overall benefits against the untuned group. After collecting sufficient data, it’s time to roll out the tuning adjustments across all servers.
00:59:54.000 If everything transpires smoothly, you should observe performance improvements in your app. You can find the slides for this talk at this GitHub link or the QR code displayed. I encourage you to check out the autotuner gem at its GitHub repository, and feel free to reach out to me on Twitter, where my handle is @Peter2118.
01:00:36.760 You can also connect on Mastodon via ruby.social, or reach out to me via email at [email protected]. I'm happy to take your questions today during the conference. Thank you so much for listening!