RubyKaigi 2024

Finding and fixing memory safety bugs in C with ASAN

RubyKaigi 2024

00:00:06.040 All right. Hello, everyone! My name's KJ, and I work at Zenes as a Principal Engineer. I'm also a committer on the Ruby core team. I'm here today to talk to you about AddressSanitizer, also known as ASAN. I was a bit terrified to give this speech in such a big hall because this can be a pretty niche topic. But I'm glad to see that at least a few people are interested. Thank you for coming.
00:00:22.519 Apologies in advance if this topic is a bit difficult to pitch at the right level. If this all goes over your head, I'm sorry, and if this is too obvious for you, I'm also sorry. Hopefully, there's something in it for everyone. Let me start by framing the problem that this talk addresses. I'm sure many of you have experienced the misfortune of seeing a C Ruby crash dump on the screen. If you see that, it means Ruby has crashed, which indicates either a bug in Ruby or a bug in a native extension gem that you're using. Ruby code is never supposed to crash the Ruby interpreter like that.
00:01:01.239 So, I started looking into ASAN and worked on integrating AddressSanitizer into Ruby in hopes that it could help with these types of crashes. This is what this talk is about. We'll start by discussing what ASAN is, why people use it, how it works, and then shift gears to talk about the work I did to get ASAN functioning in C Ruby and what we can do next.
00:01:47.200 First, let's address the question: What is AddressSanitizer? The C programming language has numerous rules, such as not writing past the end of buffers and ensuring proper memory management. The compiler is sometimes helpful, but often it is not very accommodating. If you violate these rules, you can encounter undefined behavior, which can lead to various outcomes; everything can appear fine, or your program can crash unexpectedly. Most frustratingly, your program might seem to work correctly for a while before crashing later.
00:03:05.080 The purpose of ASAN is to act as a strict linter, enforcing C's rules more rigorously than the language itself. You can enable it by using a compiler flag, `-fsanitize=address`. When ASAN is enabled, it makes your application crash immediately when it breaks one of these rules, rather than later when something mysterious happens. For example, I created a simple C program that attempts to copy 16 bytes of 'hello' into an 8-byte array. This is a clear violation of C's rules and results in undefined behavior. When compiled and run without ASAN, the program crashes, but it may run some of the code correctly before encountering the error.
00:04:30.959 With ASAN, the same program will crash immediately when it tries to write past the end of the allocated memory. ASAN provides a backtrace of the crash and details about the memory rule that was violated, allowing developers to instantly identify the source of the problem. This is infinitely more useful for debugging than the typical crash behavior.
00:05:29.600 So, why is ASAN significant for developers? It adds some overhead: typically around 2x, but in practice, I found it could be as much as five times slower. This level of overhead makes ASAN unsuitable for production use, where performance is crucial. However, it's perfectly viable to use during development and can be fast enough for continuous integration (CI) testing.
00:06:43.680 In the development workflow, crashes that occur due to memory issues can be detected immediately rather than after releasing code. This can save a lot of time and frustration. ASAN can also assist in diagnosing crash reports. For instance, I handled a case where a user reported that a particular program crashed the Ruby interpreter. By running ASAN with the reproduction script, I received a detailed backtrace that pointed quite clearly to the source of the bug.
00:08:57.239 ASAN has also been effectively utilized in CI. Many projects, such as Chromium, compile their code with ASAN enabled and run their entire suite of unit tests to catch memory errors before they can affect users.
00:09:51.000 Now that we've covered what ASAN is and why it’s valuable, let's delve into how it works under the hood. AddressSanitizer is integrated into the compiler, which modifies the generated machine code. Both the regular build and the ASAN-enabled build will differ significantly. ASAN uses shadow memory to maintain a map of valid versus invalid memory addresses throughout the program's execution.
00:11:03.360 At runtime, ASAN keeps track of which memory is valid. It rewrites memory access within the program, so each memory access checks the shadow memory first. In addition, ASAN introduces red zones, which are areas of invalid memory placed around allocated objects to further catch errors when memory overflows. If a buffer overflows into a red zone, ASAN will immediately catch it and crash.
00:12:40.320 Now, let's talk about the role ASAN plays in C Ruby, and the work I did to make it work effectively. The good news is that if you check the `building.ruby.md` document in the source tree, you'll find instructions on how to build Ruby with AddressSanitizer enabled. I can confirm that it compiles and passes the test suite on my machine.
00:13:29.680 To get to this point, there were some requirements. For example, the garbage collector needed to accommodate ASAN's peculiarities. In C, local variables might ordinarily reside in the function stack, but when ASAN is enabled, local variables are allocated on a special shadow stack to keep them separated properly and to prevent premature garbage collection.
00:15:27.720 I explored how the garbage collector could identify these fake stacks and ensure they were not inadvertently collected. Most of this knowledge came from studying how other large-scale projects, like browsers, manage similar scenarios. Another consideration was ASAN’s interaction with Ruby's custom memory management. ASAN overrides some memory allocation routines to keep track of Ruby's objects as they are created and destroyed.
00:16:44.400 Additionally, ASAN must know how to recognize when an attempt is made to access free memory occupied by Ruby objects. I found that several required calls to ASAN functions for marking and unmarking objects were already spread throughout the codebase, which helped streamline the process.
00:18:31.840 However, one area where ASAN needs to take a firm stance is on unsupported constructs like the `call/cc` feature of Ruby, which has undefined behavior according to C standards. Thus, I decided that if ASAN is enabled, `call/cc` would be disabled as it presents potential chaos in the code.
00:19:32.520 Now moving to the next topic, I anticipated that running Ruby’s test suite with ASAN would shake out many hidden bugs. However, only two minor issues surfaced: one related to an invalid octal literal, leading to potential buffer overflows. I was hopeful for a more extensive list of bugs but was somewhat disappointed that the number was scarce.
00:20:57.920 Looking ahead, I want to ensure ASAN consistently works with Ruby’s CI pipelines and that we can regularly run ASAN checks against the codebase. It would be ideal to establish a recurring schedule to enable ASAN as an option during nightly or newly merged builds.
00:22:39.600 Moreover, it would be beneficial to include ASAN setup in the Ruby GitHub actions, enabling developers to test their gems against various Ruby versions while using ASAN. There's potential for discovering memory safety bugs within popular gem extensions, which could lead to valuable fixes.
00:23:44.720 Finally, while ASAN focuses on enforcing C's memory rules, there's room to encourage C Ruby to implement practices and checks to ensure compliance with Ruby's specific memory management guidelines. I’m considering researching LLVM compiler plugins to help enforce these additional rules in C Ruby.
00:24:41.679 In closing, I hope you leave with three main takeaways: if you work regularly with C Ruby, please integrate ASAN into your development workflow; if you write native extension gems, consider incorporating ASAN into your CI pipelines to catch memory safety errors; and if you work with Ruby, try using ASAN to compile your own Ruby versions to aid in debugging and reporting issues.
00:27:05.000 Thank you for your attention, and if you have any questions about the relationship between ASAN and Valgrind or anything else, I’ll be glad to take them.