Demystifying the Ruby package ecosystem

by Jenny Shen

In this talk, titled Demystifying the Ruby package ecosystem, Senior Developer Jenny Shen from Shopify explores the inner workings of the Ruby package ecosystem, specifically focusing on RubyGems and Bundler, essential tools for managing dependencies in Ruby applications. The presentation serves as a comprehensive guide to understanding how gem installation works and how to effectively manage and debug gems within Rails applications.

Key Points Discussed:

Introduction to Gems: The discussion begins with a light-hearted welcome and a quick audience check on their familiarity with bundle install. Shen emphasizes the importance of gems in Rails applications and aims to shed light on the complexities of gem management.
Understanding gem install: Shen breaks down the process involved in installing a gem using gem install command, explaining how version requirements, dependency resolution, and downloading of gem files from RubyGems.org work behind the scenes.
Bundler's Role: The importance of Bundler is highlighted, which ensures consistent gem versions across various environments through the use of a Gemfile. The talk covers how Bundler evaluates the Gemfile, resolves dependencies using the PubGrub resolver, and creates a lock file to manage gem versions.
Rails Integration: Shen explains how Rails integrates with these gem management processes, mentioning features such as bin stubs for proper version handling and the lack of explicit require statements due to Bundler's automatic handling in application.rb.
Debugging Gems: Tips on debugging within Rails using commands like bundle show and bundle open are provided, illustrating practical ways to examine and modify gem code while advising caution about unintended changes.
Security Risks: The latter part of the presentation addresses potential security risks when using Ruby gems, including typo-squatting, gem account takeovers, and the importance of Multi-Factor Authentication (MFA) for gem maintainers. Shen emphasizes the community's efforts in securing the RubyGems ecosystem, outlining best practices such as minimizing gem usage and verifying gem credibility.

Conclusions and Takeaways:

Understanding the intricate processes behind gem installation and management in Rails applications empowers developers to use these tools more effectively and securely.
Adopting best practices for gem selection and contributing to a secure ecosystem are crucial for maintaining robust and safe Rails applications. Shen concludes by encouraging an informed approach to gem management, highlighting the balance between leveraging open-source contributions and ensuring security.

RubyGems is the Ruby community’s go to package manager. It hosts over 175 thousand gems – one of which is Rails and others that we use to customize our applications. RubyGems and Bundler do an excellent job in removing the complexities of gem resolution and installation so developers can focus on building great software.

In this talk, @shopify Senior Developer Jenny Shen takes a look at the inner workings of the Ruby package ecosystem, including:
- The processes involved in installing gems from a Gemfile
- Insights into debugging gems within a Rails application
- Ensuring you’re selecting the right gems to avoid security risks

Slides available at: https://github.com/jenshenny/demystifying-ruby-ecosystem

Links:
https://rubyonrails.org/
https://rubygems.org/

#RailsWorld #RubyonRails #rails #opensource #OSS #community#Rubygems

Rails World 2023

00:00:15.599 Hey everyone, welcome! I hope everyone’s having a great time at Rails World so far. I can’t believe we’re already halfway through! I’m pretty excited about what is for lunch today; the sandwiches yesterday were really good. Hopefully, you’re excited to learn a little more about the foundations of what Rails applications are made from: gems. If you’re not excited, I don’t really know what to say to you! Get excited.

00:00:34.280 To get us started, raise your hand if you have ever run `bundle install` before. Okay, great! A lot of people have run `bundle install`. Then you have probably seen a long list of gems being fetched, installed, or being used on your machine. If you’re running `bundle install` in a Rails application, you can boot it up, run the tests, and it's working all fine. Yay! You're on Rails! It’s so simple, right? But do you really know what’s going on when you run `bundle install`? How can gems be used right out of the box in a Rails application? These are questions that I have been pondering and wanted to explore. This is a classic example of talk-driven development. Hopefully, by the end of this talk, you’ll have a better understanding of the ins and outs of Ruby dependencies and how they work in Rails.

00:01:21.520 Hi, I’m Jenny. I work at Shopify under the Rails infrastructure team. I have been mostly focused on working in rubygems.org over the past few years, primarily to add security features and policies to make our dependency ecosystem more secure. I’ll touch a bit more on that later in this talk. Rubygems.org is the community’s gem hosting service that hosts over 75,000 gems, and it’s also a Rails app where you can view the available gems and the information about each of them. Additionally, it provides an API to manage gems.

00:02:06.799 The tool that actually manages these gems is RubyGems, which comes bundled with Ruby. It is used to manage gems using the gem command. Bundler is also a gem that is mentioned a lot with RubyGems. Its main purpose is to resolve and standardize the gems used in a Ruby project across all machines and environments, ensuring they all work together well. Fun fact: Bundler and RubyGems live in the same GitHub repository, even though each tool is released separately.

00:02:39.959 Today, we’ll go through how gem installation works under the hood. We’ll first go through how `gem install` works for installing a single gem. Then we’ll see how Bundler tackles installing the correct dependencies for your Rails application. After that, I’ll talk about how dependencies work seamlessly with Rails. We’ll discuss dependency groups, bin stubs, and debugging a gem within a Rails application. During the last few minutes, I’ll share some evil things that can happen when you install gems.

00:03:05.799 If people want a copy of the slides and references to follow along, feel free to scan this QR code. I’ll also have this up at the end of the presentation. Okay, cool. Let’s get started!

00:04:08.720 How does `gem install` work? What happens when you run `gem install rails`? When you run it, it shows that it installed the most recent version of Rails. This might be a bit outdated since Rails 7.1 came out yesterday, but bear with me. What is actually happening when the command is run? It accepts the gem name, like Rails, along with the version requirements. This could be an actual version, a version range, or by default, it will specify a lower bound if you didn’t define a version requirement.

00:04:41.000 Each command that RubyGems supports has a corresponding file in the commands directory with an execute method. The name and the version that gets passed into the command will go through the `install` method. From there, the version will be parsed into a requirement object. If you specify an invalid requirement, RubyGems will yell at you and throw an error saying you did something wrong. It will also initialize something called the dependency installer, which is responsible for installing a gem along with its dependencies.

00:05:07.120 By calling `resolve_dependencies` on the dependency installer with the name and version, it will return something called the request set. A request set represents the list of gem information or activation requests to determine how to download and install a gem. In resolving dependencies, it parses the gem name and version into a dependency request, initializing a request set and parsing the dependency into a set when called resolve. After calling resolve, it creates the request set with these activation requests.

00:06:07.000 RubyGems currently uses the millennial resolver to create a dependency graph to determine what versions of the gem and its dependencies would work. The TL;DR of this algorithm, since it’s quite complex, is that given a gem, it fetches all of the available possibilities to install based on their requirements and chooses the best one—usually the most recent one like Rail 7.0.8—and adds it to the current state of the dependency graph. It continues to find the possibilities of its dependencies.

00:06:47.000 If there is a situation where there are no possibilities present, it will need to rewind to a state where the conflict can be avoided and then select the next best version. To find the version information, the fetcher retrieves specifications of Rails from the RubyGems index, which is just a separate instance of rubygems.org to serve this information. The fetcher parses each line with the version and platform at the front, the dependencies of the gem in the middle, and some of the requirements like the Ruby version.

00:07:33.200 So we are now back at the top level in the install command. We now have a request set with all of the information needed to install the gems that we need. Now it’s time to actually download these gems. In the `install`, we are concurrently downloading all the gems from the remote that aren’t cached on the machine from the RubyGems S3 bucket.

00:08:07.080 Each gem is stored as a gem file with the name and version in the file name. When a gem maintainer wants to push a new version of the gem, they would run `rake release` with the gem spec, if they have the bundler gem task included. `rake release` runs two tasks: `gem build` and `gem push`. `gem build` takes the gem spec of the gem that’s going to be published and creates a tarball file with the gem extension. `gem push` takes a gem file that was built and posts it to the RubyGems API.

00:08:58.760 If everything is good, like if it has sufficient permissions, then the file will be written to the S3 bucket. When we download the gem from RubyGems.org, we get the gem file. We unpack it to get a folder with more compressed files. Checksums provide hash values for other files to act as a signal if they have been tampered with or corrupted. We also store a compressed file of some of the metadata. The data folder actually contains the actual gem contents, like the executables included and the libraries. You can also run `gem unpack` with the gem name to easily view the gem contents of a specific gem.

00:09:57.000 Once you run `gem unpack`, it creates a folder with all the contents of the gem. Once it receives the binary from the S3 bucket, it will store the data on your machine under the gems folder of your specific Ruby installation. It will also store the gem file in the cache in case you want to reinstall at any time, along with the gem spec in the specifications folder. It also installs the executables specified in the spec under the bin directory, so you can run executables easily (like `rails new`, etc.).

00:10:34.560 To actually use your gem in your Ruby project, you probably have to require it. This adds the gem path to the load path variable in Ruby so Ruby can run its code. For example, if we pull up an IRB and try to call ActiveSupport's `blank?` without requiring it first, it will not work. However, after we require it, we can see that ActiveSupport's gem path is now included in the load path variable. So, that’s how `gem install` works in a nutshell.

00:11:16.600 Now that we’ve learned how `gem install` works, how does `bundle install` work? At the beginning of this presentation, I mentioned that Bundler ensures that all of the gems in a Ruby application stay consistent across all machines. It accomplishes this by using something called the Gemfile. When `bundle install` is run, Bundler will create a definition object that represents the information in the application’s Gemfile or lock file.

00:12:03.320 It does this by reading the Gemfile and evaluating it like Ruby code. The Gemfile is a Ruby DSL (Domain Specific Language), a programming construct used specifically in the context of defining what gems to install. The evaluation will create a DSL instance in the DSL class, which then calls a `val_gemfile` as a result. In a `val_gemfile`, there’s a line that will call an instance of `val` to grab the content of the Gemfile and evaluate it in the DSL class.

00:12:50.400 For example, a common line you see in the Gemfile is `gem 'example_gem', '~> 2.0'`, defining the gem name and version requirements. This actually calls the `gem` method in the DSL when you run `instance_eval`. It will take the name, optional version, and other options as hash arguments, creating dependency objects and adding them to a list. Another method you probably have seen before is the `source` in a Gemfile, which is usually `https://rubygems.org`. The DSL adds the string representing the source to the global RubyGems source, throwing an error if you try to define more than one.

00:13:47.520 You can also have a source block if you want some gems to be installed from another source. The context of this block will override the global source until the end of the block. There are many other methods that I won’t get into, but hopefully, you get a good idea of how that works. After the DSL object has been built with the dependency sources and data, `Definition.new` is called, which accepts these values and initializes a definition object. Now we’re at part two, installing the definition.

00:14:52.160 Resolution is done by the PubGrub dependency resolver. This algorithm was originally created by Nayl Weisenbaum for the Dart programming language and was ported over to Ruby by John Hawthorne. PubGrub is known to be faster than the traditional resolver by introducing something called conflict-driven clause learning.

00:15:39.480 But what does that term actually mean? Well, before, when there was a version conflict, the millennial resolver would need to go up the path to find a dependency that wouldn't introduce a conflict. However, it doesn’t do a good job of remembering past conflicts. You can hit the same failure path multiple times. PubGrub introduces a concept called traits, which is just a version range of a gem that either works or doesn’t work, and they can be used to determine incompatibilities.

00:16:49.520 For example, we can see that we need to install a gem, say ‘cool’, which is greater than 1.10 and ‘beans’ at version 2.0.1 or below. The initial run through will determine that you cannot install ‘cool’, meaning it must be less than 1.1 and ‘beans’ must be greater than 2.0.1. While going through the dependencies, incompatibilities are tracked so that the versions we know won’t work will be avoided.

00:17:38.720 So we find that picking a version of the ‘cool’ gem that’s greater than 1.2 requires ‘beans’ to be 2.1 or above. This might sound complex, and it is, but you can read more on the references I shared at the end of the presentation. Since there’s a lot of information needed to resolve these dependencies, Bundler uses something called the compact index to retrieve version information.

00:18:41.200 There are quite a few endpoints, including the version endpoint, which returns versions available for all gems, and the info endpoint, which gives you more information about each gem. They are also cached on your machine and updated if outdated by checking the version of when they were created. Because we’re tracking these conflicts using PubGrub, Bundler can give better error messages to help the user understand what the problem is, rather than just providing a backtrace.

00:19:12.880 Users can then try to update or downgrade some gems once they have figured out the problems. After resolving the conflicts, Bundler downloads and installs all the resolved gems and generates a new lock file with all the specific versions of the gems and their dependencies per source, platform, and the specifications you put in the Gemfile.

00:20:11.480 Now, let’s move on to how gems work in a Rails application. There are some nifty features that Rails has to manage these dependencies smoothly. The first thing I thought is, where are all of the requires in a Rails application? You occasionally see them in scripts, but there are none in Rails applications like this.

00:21:02.880 In the `application.rb` of a Rails app, the Bundler `require` method requires all of the gems in the application’s Gemfile based on its groups. In the Gemfile, the default group is always included, and test, staging, and production gems are included depending on what the Rails `ENV` variable is set to. Rails also has something called bin stubs, which you might have heard of before. Bin stubs are executables that help set up your environment to the right version of the gem executable that you want.

00:22:06.680 For example, you could do this by running `bundle exec rails s`, which picks the version of Rails you have in your Gemfile and runs that version. Without running `bundle exec`, it will just use the most recent version on your machine. So instead of using `bundle exec rails s`, we typically use something called bin stubs. This is why people say to always run `bin/rails s` and not just `rails s`.

00:23:00.240 The custom bin stubs for Rails will include the commands for the current version of Rails that gets installed. You can generate bin stubs with the `--bin` flag or `bundle bin stub`. Bundler will create a generic bin file to ensure that when you run the binary, it will run the gem executable specified in the Gemfile and not the most recent version on your machine.

00:24:07.640 The last thing I’ll talk about is how someone can debug or work with gems in a Rails application. One helpful command is `bundle show`, which shows the path of the gem that the Rails application is using. One step above that, you can use `bundle open` to actually open the code in your code editor. This will open it in VS Code, allowing you to make edits, and once saved, your changes will be visible if your application uses Spring.

00:25:02.960 However, the gem will probably be cached, and you need to stop Spring for the Rails application to reload the files with your changes. Modifying your gems directly is quite simple, but if you don’t revert your changes, they might unintentionally break your code in other projects. This has happened to me before. The command `gem pristine` can reset your gems by reinstalling them to their initial state. But to avoid this problem altogether, I like to clone a version and link it to the Gemfile using the path option.

00:26:18.200 Now that we’re a bit more familiar with installing and working with dependencies, it’s becoming clear that Bundler and Rails make it easy to use open-source libraries and other people’s code. You can add a line to the Gemfile, run `bundle install`, and bam! You can run other people's code. However, it isn’t all sunshine and rainbows. I will be touching upon some somewhat evil things that can happen in the Ruby ecosystem.

00:27:24.720 Firstly, it’s too easy to install the wrong gem. For instance, say you want to install Rails, but due to a simple typographical slip, you run `gem install rils`. If you haven’t noticed it and run `rails new`, it won’t work. You will never get your project running, and you will be let down. Fortunately, instead of your computer's voice letting you down, you’ve got my voice to guide you!

00:28:17.440 Even though installing the wrong gem may seem harmless, it might contain code to grab application secrets, insert back doors, or anything malicious you can imagine. Fortunately, RubyGems performs basic checks to mitigate attacks called typo-squatting by checking the name of the gems being published. If a new gem's name is similar to a popular gem, it uses Levenshtein distance to block the publishing of that new gem.

00:28:46.640 Each gem version that gets released undergoes dynamic and static analysis checks to determine if it’s malicious. Mach is part of the RubyGems security team that aids in many of these efforts, which is great. Speaking of security, he messaged me this morning about another kind of squatting attack. A lot of companies maintain private gem registries or have gems served without registering the name on RubyGems.org. Scanners comb through GitHub repositories for names that aren’t registered on RubyGems.org and push a malicious version, thinking someone might accidentally install the public version.

00:29:43.920 Yesterday at DH’s keynote, he mentioned some gems used internally by a company that were not registered. Fortunately, Mach noticed this and reserved those two gems on RubyGems.org. Something else that can happen is that gems are pushed by RubyGems accounts, and these accounts can be taken over. If someone gets control of a maintainer account, they can publish a malicious version of a gem. Thankfully, this hasn’t occurred yet, but securing your account is crucial.

00:30:44.680 When I say secure, I mean using multiple forms of security keys. Multi-Factor Authentication (MFA) is the best way to prevent account takeover. We currently require that the most popular gem maintainers have MFA enabled. If you want to learn more about how this policy came about, I gave a talk at RubyConf Mini last year. We also introduced WebAuthn support, which is more convenient and secure than time-based counterparts.

00:31:49.120 You can use your Touch ID or security key when on the UI and on the command line by servicing a custom link to authenticate using a security key. However, popular accounts should not be the only ones with MFA enabled. If you own an open-source gem, please enable it. The community will thank you!

00:32:43.200 Samuel from RubyGems is also working on a flow to publish gems safely through CI/CD, using something called OpenID Connect (OIDC). This is a great initiative for securely publishing gems automatically through GitHub Actions. While this feature is currently in Alpha, it’s working towards release for the public.

00:33:24.240 That being said, the Ruby team is actively working on making sure that the RubyGems ecosystem is stable and safe for everyone to use. But what does this mean for you? How do you ensure that the gems you’re installing are safe? While you can never be fully secure, when using gems in a project, remember, less is more. This doesn’t mean not to use gems at all, but avoid having ten gems in your Gemfile that do the exact same thing.

00:34:33.280 This reduces the entry points for malicious code to enter your system and makes it easier to maintain. Also, consider: is this a reputable gem? Are the maintainers reputable? Do they have MFA enabled? How many users or downloads does the gem have? I would rather choose the gem ‘Rails’ with almost half a billion downloads rather than a gem with only 14,000 downloads, even if it’s also popular.

00:35:16.120 That's it for now. Hopefully, you can take at least one thing from this talk about how gem installation works, how they operate in a Rails application, and how to select the right gems for your Rails applications.

00:35:39.240 Thank you! Enjoy your lunch! I will be around to answer any questions and be available to chat afterward.

00:36:24.800 Thank you.