List

Parsing Ruby

Parsing Ruby

by Kevin Newton

In the keynote speech titled 'Parsing Ruby' at RubyConf 2021, Kevin Newton explores the various tools and methodologies used to parse Ruby code and how these concepts can be applied to individual projects using the Ripper standard library. The presentation starts with a warm welcome and an acknowledgment of the audience, before diving into the complexities of parsing Ruby.

Key points discussed include:

- Foundation of Parsing: Newton emphasizes the importance of understanding the theoretical aspects of parsing, starting with defining a grammar, which is a syntactical representation of what is allowed in a language. He constructs a simple grammar that can handle numerical expressions and demonstrates how to expand it with operations such as addition and parentheses.

- Building a Parser: He illustrates the process of creating a parser that can tokenize input strings using lexical analysis and how these tokens are processed semantically through a series of shifts and reductions to create a valid syntax tree.

- Ruby's Parsing History: A significant portion of the talk focuses on the evolution of Ruby's parsing mechanisms, starting with early implementations and transitioning to the current parser generator systems. He details the transition from using Yacc to Bison in Ruby's core implementation.

- Introduction to Ripper: Newton provides insight into the Ripper standard library, designed for easy access to parsing events. He explains how Ripper allows developers to hook into tokens and rule reductions, making it easier to build commenting and syntax tree tools.

- Community Tools: The speaker highlights various parser-generating tools and libraries available for Ruby, such as the parser gem and re-parser, discussing their functionalities, limitations, and community support.

- Future Implications: Nearing the conclusion, Newton suggests that Ruby's pace of introducing new syntax may slow down. He advocates for a standardized parser that accommodates various Ruby implementations to maintain compatibility and ease of use.

Through the exploration of these topics, Newton aims to inspire developers to engage more deeply with parsing concepts to create innovative tools and applications within the Ruby ecosystem. The presentation concludes with a call to action for the community to rally around building more developmental tools, facilitated by a robust understanding of parsing techniques.

Since Ruby's inception, there have been many different projects that parse Ruby code. This includes everything from development tools to Ruby implementations themselves. This talk dives into the technical details and tradeoffs of how each of these tools parses and subsequently understands your applications. After, we'll discuss how you can do the same with your own projects using the Ripper standard library. You'll see just how far we can take this library toward building useful development tools.

RubyConf 2021

00:00:11.200 oh okay so we're gonna get this party started my name is kevin newton um and i would
00:00:17.039 like to formally welcome you all to the um
00:00:22.560 if this is going to turn on hold on a second aha welcome to the live keynote yeah
00:00:28.400 i want to thank matt's for the introduction um since we're in the keynote room and i'm the first person speaking here i feel like i have this
00:00:34.000 privilege to just say welcome to the live keynote apparently that joke's not landing so that's okay you can laugh if you want
00:00:40.800 just a little pity laugh would be appreciated okay cool anyway my name is kevin newton uh i work at shopify uh i
00:00:47.600 work on the yjit team along with aaron allen noah and maxime uh
00:00:52.960 if you want to talk about that or any other thing come and find me at the booth where the nerds with the the green
00:00:58.559 background so yeah i'm gonna open up with a quick
00:01:04.559 oh this thing is dying i'm gonna open it up with a quick warning um so i talk pretty quickly when i'm nervous i am
00:01:10.080 nervous i've had a lot of caffeine we're gonna talk pretty quickly um i'm also going to say if you're a junior
00:01:15.119 developer please don't leave the room um this is a somewhat complicated topic i've spent hours agonizing over this
00:01:21.920 trying to make it accessible for everyone so if you're a junior developer don't leave the room if you're a senior developer
00:01:27.439 please stay with me i promise there's content in here for you it's not the very beginning but it will be there
00:01:34.079 uh so i want to talk to you about person ruby i want to talk to you about how ruby has been parsed over time how we we
00:01:41.040 get uh from plain source text into a structure that we can deal with uh but
00:01:46.640 in order to do that i need to back up and talk about the fundamentals that underlie those concepts in order to give you an understanding of how these things
00:01:52.640 work i need to give you the the theory before i can give you the practice so
00:01:58.240 here here's the game plan we're going to build a grammar for a simple language i'll explain what a grammar is in a
00:02:04.399 second we're going to build a parser for a simple language we're going to look at the history of the ruby parser and how
00:02:09.759 that has evolved over time and how it's used and finally we are going to look at
00:02:14.879 how ripper works which is an internal library standard library that is used to
00:02:20.879 gather information from the parser as it is parsing so first step
00:02:26.319 building a grammar a grammar is a syntactical representation of what is allowed in a
00:02:32.000 language language here is used kind of loosely it's not english it's not ruby it's it's just a language it's any kind
00:02:38.239 of concept that puts together a series of tokens so if we look and we just wanted a language that only accepted one
00:02:44.480 single number this is what the grammar would look like it's actually overly complicated for this because it could
00:02:49.920 just say program points to number but i'm doing this for illustrative purposes a program is going to be our root node
00:02:55.360 it's going to point to our overall grammar it says the only thing that it accepts in this grammar is a number the
00:03:01.040 number is a non-terminal token as opposed to a terminal token and it accepts only a single number token
00:03:08.560 i realize i just said token a whole bunch of times this will accept something like 1 or 2
00:03:14.879 or 7 or any number but it's not extensive enough for us to do anything with so i'm going to extend
00:03:19.920 it a little bit we're going to add the ability to do addition
00:03:25.680 now in when we're doing addition we are now accepting a number plus a number or just an individual number so
00:03:32.319 now we can accept a couple things we can accept one we can accept one plus two but we can't accept one plus two plus
00:03:38.159 three why there's no recursion here all right so we need to extend it a little bit further
00:03:43.599 this is now if i clicked the right thing this is now a little bit of recursive so
00:03:48.959 right so this is now left recursive is what we say in the theory this is pointing at itself the expression node
00:03:56.080 in the tree is pointing at itself it can go and accept more and more information it accepts one except one plus two now
00:04:02.480 except one plus two plus three we can extend it first subtraction and go all the way as far as we want it's infinite
00:04:08.239 right this is left recursive we can continue on and make another one
00:04:14.400 we can make another rule that includes terms this is the reason we're splitting this up and not making it expressions is
00:04:20.400 something i'm going to explain in a second but that's called operator precedence we'll get into that but the function of this is to accept another
00:04:27.280 set of rules and if we put it back into our grammar so that it will understand itself it can recurse down all the way
00:04:34.320 understand plus minus times divide the last step the last thing you want to add
00:04:40.160 to this is going to be parentheses this kind of language can accept one times two
00:04:45.680 you have to do a little bit of substitution in your head you say okay one times two is the entire language
00:04:50.880 right okay our language is an expression we go down to the expression we say an expression is
00:04:56.960 just a term a term is a term times a number a term is a number so it's a number
00:05:03.120 times a number stay with me i promise this is going to make sense in a second the last thing we want to add is
00:05:09.280 parentheses this one's a little fun this is uh any single token or
00:05:16.400 symbol in this language can be made into an overall program
00:05:22.400 because you see a program is equal to expression by wrapping it in parentheses so you can put it anywhere in our language all of a sudden we support
00:05:28.320 parentheses this is our grammar this is what we're going to use throughout this presentation to understand this language
00:05:34.320 to build on top of it all right so we've got we've got our language
00:05:40.000 the next step is going to be building a parser that understands this language the grammar was a
00:05:46.000 uh abstract concept we're going to now implement that so this is our source we're going to
00:05:52.639 take this source file this is your dot numbers or whatever uh you know language you want to call it
00:05:59.120 we're going to loop over it in ruby it's going to be a language implemented in ruby we're going to say until the
00:06:04.479 input string is empty we are going to uh we're going to click our mouse is what
00:06:10.479 we're going to do come on um we are going to switch over the input we're going to
00:06:15.840 skip over white space and the little dollar sign uh apostrophe
00:06:21.360 there means skip over the last match uh we're going to
00:06:27.039 go and take our numbers anything that matches that regex
00:06:32.319 and we're going to yield out a number token same thing with these operators we're
00:06:37.520 going to yield that operator token that dollar sign ampersand means take the the last match string
00:06:43.120 and finally we're going to throw a parse error if we don't understand any remaining syntax
00:06:48.639 the important part is here the this is called lexical analysis tokenization any kind of thing like that there's a lot of
00:06:54.400 names for it the point is that we are taking these segments of the file and yielding out individual tokens
00:07:01.680 okay so if we go when we we parse this and we execute over this source string we're going to find a number
00:07:07.440 plus all these different tokens as we build them up and now we have our list
00:07:13.120 this is what was parsed by that file everyone with me i see some nods
00:07:18.240 yes oh more nods excellent great okay so this is called a token stream
00:07:24.319 or a token buffer or just a list of things and we're going to take this token
00:07:29.919 stream and we're going to pass it through our grammar so at this point we have tokens we have the concept of
00:07:35.199 lexical analysis we don't have semantic meaning yet so we're going to add that so it it's called accepting the input
00:07:41.919 we're going to run through how it works this is our grammar we already went over this we're going to add a stack and this
00:07:48.639 stack is going to be a token stack and it's also called the semantic stack
00:07:53.680 semantic meaning stacks manic token stack people really don't like having one name for things um but what it's going to do is it's
00:08:00.319 going to take the first token and this is called shifting we are shifting a token
00:08:05.440 we have shifted it off of the list of input tokens and then we're going to say okay we have our token what are we going
00:08:11.039 to do with it we need something equivalent to it in the terms of our grammar okay well we can see from our grammar
00:08:17.680 that a number is equal to a factor so we're going to replace it with factor
00:08:23.039 on our stack that thing we just did is called reducing shift and reduce those are the two terms
00:08:28.479 i'm going to add to your idiolect today so we're going to shift more tokens
00:08:34.080 shift more tokens continue shifting until we get to something else we can reduce okay we've got a number we can
00:08:39.599 reduce that again using our grammar down to a factor continue shifting continue shifting
00:08:46.160 again we get a factor and finally we get to something we can actually do now this our this stack is looking a little bit
00:08:51.200 large and i've been working for aaron for a little bit so i couldn't resist just showing that this is in fact overflowing
00:08:56.959 um so we can go and take the factor and we can say okay a factor is actually
00:09:02.240 a term a term minus a term can
00:09:07.279 just be an expression uh as it gets reduced further we can go
00:09:13.920 through this process and just keep substituting things if we ever get to a point where we can no longer substitute something or shift
00:09:20.880 a token then we throw a syntax error right we continue down this process we look at
00:09:27.519 the various rules we continue to shift overall we finally get to the end of our input we substitute we substitute we
00:09:34.480 substitute we substitute we substitute all the way down to program at this point we have accepted our input and
00:09:40.640 this is a valid syntax for our grammar the thing that we just did
00:09:46.240 in taking those tokens and passing it through this process of semantic analysis of going through that stack and
00:09:53.040 replacing them over the course of time with shifting and reducing it turns out that that's very repetitive it turns out
00:09:58.560 that that's language agnostic it doesn't have specific meaning it's just something that we can do over and
00:10:04.160 over again and in the late 80s early 90s the
00:10:09.360 thing in vogue to do if you were building a language was to use a parser generator a parser generator is a
00:10:14.480 program that takes that kind of grammar and those kinds of actions and does that for you
00:10:19.839 not to say it does everything but it it takes the shifting and reducing part out of the need of your
00:10:24.959 head using very unfortunately large integer arrays anyway
00:10:30.880 what this would look like with a parser generator so ruby has a parser generator in the same library it's called rack
00:10:36.399 there's a reason for that i'll show you in a minute and it looks something like this this is a parser generator using rack um
00:10:43.040 the thing up top that is that we're looking at is the operator precedence up where it says left uh left is
00:10:49.519 associativity that i'm not gonna get into today but that operator precedence tells you if you have to determine
00:10:55.120 between a shift and a reduce that you're gonna go with the operator with the higher precedence
00:11:01.120 the expressions down there are actually an equivalent grammar to the grammar we already have
00:11:06.399 uh when you when you pass a grammar into a parser generator it's going to take that and do all that shifting and reducing for you and generate it for you
00:11:13.600 the last thing you can do on top of this is you can execute actions when rules are reduced
00:11:19.440 so when rules are reduced you can do something this one is going to evaluate it immediately this is going to take whatever the input
00:11:25.279 is and just evaluate it as soon as we find it so if you find an expression position expression you're gonna do the value plus the value
00:11:31.360 you can also do something like this which is building up a syntax tree this is what was done in ruby before one
00:11:38.240 nine we built up a syntax tree and then in order to execute ruby it walked over that tree and understood what it was
00:11:43.600 doing over time this is as opposed to building a bytecode interpreter which is what your voice in 1.9
00:11:49.680 if we take this file and we pass it through rack and we build our file and we go into irb
00:11:56.800 and we require it and we parse it you get this
00:12:02.079 this may not look like much but it's kind of interesting because if we blow it up a little bit
00:12:07.760 this is actually a tree this is a syntax tree this is what we have built using our parser generator
00:12:14.079 if you look back at the source you can see how that relates to it and it takes care of precedence for us
00:12:20.000 um right this was our original grammar and and this can build that kind of thing and we're going to come back to this okay
00:12:26.000 okay we're actually on track here for time so it's it's good good progress so far
00:12:34.000 so the next thing i want to talk about is the history of the repressor now the reason that it's called rack the
00:12:39.519 partial generator that standard library uses is because yak was the original one yacc
00:12:46.000 um yet another compiler compiler i believe it's a partial generator that was built
00:12:52.480 way back when and it was in vogue in 1993 when matt
00:12:57.519 started using it for building out the ruby parser so in the very very early days this is
00:13:03.040 the earliest changelog entry i could find ruby 0.06 in 1994. um
00:13:09.760 it's i had to go through the wayback machine and find a tarball that had a change log that was entirely japanese
00:13:16.639 but i found it so it's there um for the first uh pre
00:13:21.760 1.0 all of the change log entries are in japanese so i had some fun with google translate um there's some fun things in
00:13:28.560 here ruby didn't used to look like ruby ruby used to look like both python and c plus um this the top one there is saying
00:13:36.320 that uh dicks are hash literals and he added like braces for hash level syntax
00:13:42.880 which didn't used to exist um it's a backwards incompatibility because that used to be a race in text
00:13:48.000 back in you know whenever it was um rescue was misspelled for about 90
00:13:53.680 versions it got renamed to rescue like the correct one um that in japanese that's
00:14:00.800 saying that it was embarrassing so i feel badly sorry bats but i thought i can felt concluding that
00:14:06.800 uh after a while i guess we get up to ruby 1.0 uh ruby 1.0 started looking more and
00:14:12.320 more like ruby the super class syntax went from a colon like c plus plus to a less than that we recognize today the
00:14:18.000 continued keyword went from being continued to next which we also recognized for movies today they added syntax to access the
00:14:23.680 singleton classes we started getting regex flags that were specific to encoding which was kind of an
00:14:28.720 interesting thing because at the time a lot of stuff was very western-centric um i i love the fact that ruby was
00:14:34.959 written in japan because we have encoding kind of as a first-class citizen especially ruby 1.9
00:14:40.160 rupee 1.3 comes out the day before ruby 1.2 i know that's confusing uh the odd
00:14:45.279 versions were developer releases and were used to be development branches effectively this was sdn or actually i
00:14:50.880 don't know it was sbn yet um but we get a couple of interesting things like body statements that we begin rescue else and
00:14:57.680 clauses we get in dental here docs uh the next day ruby1.2.0 is released
00:15:04.160 and uh you know more more and more things the true and false keywords were added for the first time which is
00:15:10.160 kind of a funny thing to add to a language in ruby 1.2 um percent w array literals stuff like that
00:15:16.959 the next year we get ruby 1.4 um we get binary number literals a couple of other interesting things multiply character
00:15:23.120 identifiers again with the multi-byte strings this is something that ruby was ahead of time on
00:15:29.120 uh 1.5 had compiled time streaming concatenation i don't know if you know this but in a ruby file if you put a
00:15:34.320 string and then a space and then another string it becomes one string when ruby parses it kind of a weird
00:15:39.519 thing came from c uh it the reason i have it in here is because it was on the to-do list since
00:15:44.720 ruby 0.06 i don't know why i mean it's great i guess if you like that kind of thing
00:15:50.720 uh the next year we get the rescue modifier form modifier means inline so like foo rescue bar
00:15:57.120 um and finally the next year we get no dump
00:16:02.240 no dump is not a reversion no dump is a project that came out of the pragmatic programmers and it's a c extension to
00:16:07.680 ruby and it's written in english in the us and this is kind of interesting because
00:16:13.839 ruby is starting to pick up steam and starting to get some popularity no dump was an extension so at the time ruby
00:16:19.279 before 1.9 was a tree walk interpreter meaning it took the ast that we already built up like we looked at it walked
00:16:25.519 over it and interpreted it as it went no dump took that yeah excuse me ast and printed it out in
00:16:32.399 a human readable format so you can understand it this was the first attempt to my knowledge to take the ruby ast and
00:16:38.480 do something with it that was not execute ruby ruby171 comes out
00:16:44.079 this was right around the time of the first rubyconf and we get some interesting things break and next now accept values
00:16:50.560 um rescue and singleton method bodies more and more things are getting added around this time jruby gets created
00:17:02.160 file the grammar file that we looked at you remember the rack file from earlier it takes that grammar file takes all of
00:17:07.679 the action blocks which were the things in the in the braces and rewrites them all and and i love
00:17:12.880 this i love just thinking about this like what would make this language better if it were written in java that would be better
00:17:19.280 and at the time it was i mean jvm is still one of the greatest inventions we've ever done as programmers it's
00:17:24.559 incredibly powerful and it's achieved better peak performance numbers than you are at the time that it was introduced
00:17:30.320 but the interesting thing about this is to take this file to take these things
00:17:36.000 okay a one-time translation is one thing but how are you going to stay up to date
00:17:41.520 you're not it's freaking hard you have to watch the
00:17:46.799 commits on parse.y and just stay up to date you got to keep translating and that's a hard thing to do and you know
00:17:53.440 tons of props to the j ruby team for for keeping up with this because this is not an easy thing to do and the reason i'm
00:17:58.480 harping on this and taking time here is because this is what you have to do in order to maintain
00:18:04.480 another parser in ruby you have to watch the commits on this file and just
00:18:09.919 make it work um so i'm gonna come back to that point later around this time ripper 0.0.1 is
00:18:16.880 released minero aoki uh published it publishes it on his website it takes the grammar file rewrites it to dispatch
00:18:24.000 parser events and scanner events instead of building the ast that ruby17 would build
00:18:29.840 still to this day in the documentation it says ripper is still early alpha version that's what it says in ruby310
00:18:36.799 it's been 20 years still early alpha maybe we'll get to beta someday
00:18:43.360 ruby one eight comes out two years later and we get a couple of more interesting things i'm gonna start skipping through this a lot faster so if you thought i
00:18:49.520 was going fast now um around the same time parser gets released parse tree is a project by ryan
00:18:55.039 davis that builds out an ast from the source file using a c extension it relies on ruby 1.8
00:19:01.760 internals so unfortunately ryan had to deprecate this when ruby 1.9 came out
00:19:06.799 because it completely changed to a bite code interpreter rubinius comes out about this time robinius is a very fascinating project
00:19:13.440 rewriting the standard library and rewriting ruby in ruby it bootstraps ruby
00:19:18.480 unfortunately it is not at this point what it was at the time it's it's it's
00:19:23.520 no longer kind of up to date it's it's a different project at this point um but there are a couple of interesting things
00:19:29.120 like having to rewrite again those action blocks in in ruby i included cardinal because it's a very
00:19:35.120 interesting thing it's a rewrite of the ruby thing except this time instead of taking the purse.y
00:19:40.880 file it actually took the grammar matt's had published a grammar the the abstract grammar that we looked at on a website
00:19:47.840 way back when this was a fork of the ruby 1.4 grammar and they rewrote it in order to put it on the parrot vm iron
00:19:54.799 ruby was written around this time for the.net framework finally ruby parser comes out ryan davis
00:20:00.080 rewrites his parser so that you can use it with yarv in 1.9 uh completely had to rewrite
00:20:07.039 it and used rack to generate it around this time yard comes out yarv is a
00:20:12.400 phd thesis that is a bytecode interpreter for ruby it upgrades a couple things it changes it so that instead of using uh
00:20:19.679 yak it uses bison which is just a successor that has a lot of the same functionality ripper is merged into the
00:20:25.840 standard library we'll show how that works and a couple other of uh controversial things are done at the
00:20:32.720 time including simple hashkeys and lambda literals review 1.9 we keep moving along we get
00:20:38.799 the ruby intermediate language this is a research project that rebuilds the ruby parser in ocam
00:20:43.840 if you were wondering of the list of if you had your list of bingo languages of of places that ruby had been rewritten
00:20:50.159 in okay i doubt camel would be on your list but but there you go uh ruby 1.3 is released is the last of
00:20:55.520 the 1.0 series and these two standards are recognized by the international
00:21:00.799 committee as uh international standards for this language we get into the ruby 2.0 series
00:21:08.159 and we get all kinds of fun things we get refinements we get uh percent i symbol lists keyword arguments ruby 2x
00:21:14.640 keyword arguments not really theoretical arguments and finally we get the parser gen the parser gems from white quark
00:21:20.159 it is a published gem with a parser api that uses rival to generate using another parser generator to build out an
00:21:26.960 ast this one has been stayed up to date miraculously over time and you can see
00:21:32.559 this is a very short list of all a very short abbreviated list of all the different tools that are built on top of this gem
00:21:39.039 truffle ruby comes out around this time i'm going to start skipping through these roots 2.1 we get required keyword arguments 2.2 dynamic symbol hash keys 2
00:21:46.480 3 we get here docs the frozen string little fragment around this time everyone starts freaking out about memory and they start opening prs to
00:21:52.880 rails saying dot freeze dot freeze dot freeze dot freeze the entire pull request is just dot freeze finally we
00:21:58.480 get the frozen string literal pragma and people can calm down with that we get the two proc so you can call
00:22:04.080 refinements with two proc we get top level return multiple assignments and conditionals around this time a project comes out
00:22:10.320 called tree sitter tree center is a very interesting parser project a parser generator
00:22:15.520 library that is used mostly for ides
00:22:21.039 and it has a plugin for the ruby grammar that is not all the way there but it's enough that you can use it for like go
00:22:27.200 to definition on github type 3b comes out around this time which is a type system written in rust the parser is written in c plus i included
00:22:34.400 this because it's kind of interesting they basically forked the grammar the parse.y from ruby they
00:22:39.440 took elixir from ruby parser and then eventually sorbet when they wanted to have a ruby parser they just used this
00:22:45.679 one as well two five comes out with uh rescue and a short at the block level two six comes
00:22:51.120 out ruby vm abstract syntax tree is introduced rubium abstracts insecurity was not
00:22:56.320 what i would call an intentional release it was part of a test suite for another feature um and it's a very very
00:23:03.360 interesting thing i encourage you to go to the ruby issue tracker and go read about it it's another
00:23:08.400 form of a parser that can be used though it doesn't it's unclear if it's going to be supported in
00:23:14.960 the long term or how it's going to be supported it might do some optimizations before it hands things back to you not
00:23:20.320 entirely clear flip-flop is decorated deprecated much to the sugar and the flip-flop fans everywhere
00:23:26.559 2.7 flip-flop is undeprecated so it's okay yeah we get a lot of introduction of
00:23:33.039 syntax in route 2.7 as you saw in matt's keynote uh method reference operator was added method reference operator was
00:23:38.960 removed um we get some interesting syntax for like star star nil for saying this method
00:23:44.320 accepts no keywords uh rightward assignment unprogrammers argument forwarding all these different things
00:23:49.600 finally ruby three comes out happy about ruby three we get keyword arguments that are different keyword
00:23:54.880 arguments previously the keyword arguments were based on allocating a hash and then there was a whole bunch of syntax errors it was very confusing now
00:24:01.200 we get some actual separated keyword arguments we get endless method definitions you can do defu equals bar
00:24:07.760 and that is all working we get some interesting pragmas for raptors and we get
00:24:14.080 keyword pattern matching in single line ruby31 preview one came out today
00:24:21.760 there you go yeah all right uh we get some hash literal syntax uh hash a little shorthand so that you don't have
00:24:27.840 to write the value in a hash this is somewhat like javascript um and we get
00:24:33.200 in pattern matching you can pin expressions not just variables okay
00:24:38.320 that that was the abridged history believe it or not i have a website i will share with you a link at the end that has
00:24:43.919 way more information with way more implementations of ruby and you can go and click around your heart's desire
00:24:49.840 i got five minutes left i want to tell you how ripper works
00:24:55.919 ripper is a standard library it hooks into the parser it gives you events
00:25:01.760 we go back to our grammar we go back to our parser generator we see this is right this is our lecture down
00:25:08.320 at the bottom it's our grammar up at the top ripper is going to hook in here and it's going to hook in here
00:25:14.799 what that is to say is when ripper finds a token say a token initially it is going to
00:25:20.080 fire a scanner event using this in quotes because that doesn't really mean anything to a lot of people but it's a
00:25:25.360 scanner event up top anytime a rule is reduced you remember when i showed you what reduced meant at the beginning it's
00:25:31.440 going to fire an event whenever a rule is reduced so you're going to get events for these different things i'll show you the syntax for what that means
00:25:38.559 internally to ruby in the source this is in parse.y you can see how this kind of works it found a comment
00:25:45.520 so it's going to dispatch a scan event for that comment okay so we can go and we can build a subclass of river
00:25:51.840 we can say on comment we're going to build the subclass we're gonna add an add-on reader for comments
00:25:58.320 we're gonna have this on comment this is how you handle an event in ripper you define this on method there are 200 some
00:26:03.679 odd methods i will show you the link to the docs later if we go and we build a parser with this
00:26:10.559 source we tell it to go in parts itself and then we pull the comments back out you will get a list of comments this is
00:26:17.600 how you can use ripper in your own thing if you want to to go and pull tokens out it's a lot
00:26:25.120 easier on the token side than it is on the node side and i'll show you why
00:26:30.480 remember that i said it hooked into these different spots up top it's hooking into the reductions of the rules
00:26:35.600 so this is also in parse.y in in ruby ruby and this is the part of the grammar
00:26:41.200 that is handling the super calls whenever you call super you can call super with no
00:26:46.559 arguments or you can call them with arguments and you might see in the faint faint font that weird comment this is
00:26:52.640 how ripper works ripper has comments all over the parse.y file ripper then builds
00:26:57.760 its own parse.y file based on this file using those comments using macros yeah
00:27:03.600 it takes those comments and then builds its own parts.wi-fi which it can then dispatch different events the important part is right here
00:27:10.640 that's a dsl it's a tiny little language that's baked into the language parser generator that is using
00:27:16.480 by ruby but then gets generated by makefile which then gets generated by ripper
00:27:21.600 i couldn't tell you that again in if i tried um if we go we try to build a
00:27:27.600 parser using ripper to handle this and we go when we pass in this stuff we build these handlers for these events
00:27:34.720 z super took no arguments super took one argument and we tell it to go parse we'll get
00:27:40.240 this super called without arguments great z super worked just fine super cold with arguments and we get
00:27:45.679 nothing why did we get nothing if we go back to our grammar we can see
00:27:51.039 okay actually this is being passed an argument what argument is it being passed it's being
00:27:56.799 passed the paren args argument remember the way that this works is as things are getting reduced whatever value you have
00:28:03.360 for that node gets passed up the tree we didn't implement a handler for those events so we get nil
00:28:09.360 so okay we have a problem what are we going to do ripper ships with two subclasses
00:28:15.120 there is sex builder s expression builder in s expression builder pp for
00:28:20.240 pretty printer it has an implemented handler for every single one using method missing it's a
00:28:26.559 whole thing if we go and we call it with this
00:28:31.919 subclass we actually will get something that looks a lot like the ast we built earlier right so
00:28:37.760 what are your options if you want to use ripper you can implement every method handler yourself which is fine you can do it
00:28:45.279 you can inherit from ripper sex builder or ripper sex builder pp or you can do some combination of both
00:28:51.520 as we saw with the comment handler earlier it just worked you can just get your tokens out it's fine
00:28:57.200 uh i have implemented one for you it's here it's in prettier um
00:29:02.720 and it's you know it's there it happens uh there are 200 some odd tokens there is a hash for each of them
00:29:09.679 i'm actively working on upstreaming it so that you can have one for yourself if you want to build a syntax tree what
00:29:16.080 are your options to this day ryan is still maintaining reparser this is an option for you it
00:29:22.080 has some community option it's not 100 compatible though he did just bring it up to ruby 3.0 new stuff may break
00:29:27.360 because it's not shipped with core the parser gem tons of community adoption it backs rubric up it backs
00:29:33.120 standard which is a wrap around rubric it's very well documented not 100 competitiveness necessarily new stuff
00:29:38.799 may break every time new syntax comes out they have to play catch up it's just part of the name of the game it doesn't ship with or test with core so i mean
00:29:46.559 that's just what it is ruby apps ruby vm abstract syntax tree it's still too early to tell what's actually happening
00:29:51.760 with this it's on the issue tracker you can check it out it's not implemented on any other ruble implementation so if you're interested in portability don't
00:29:58.000 choose that option finally there's ripper as we talked about it's built into the parser generator it's well tested in court
00:30:03.600 ships with ruby there's no documentation uh in the last hack days for shopify
00:30:09.279 thank you shopify i uh spent a fair amount of time uh and there is now documentation for
00:30:15.200 every event and with an example for every single one that will trigger all those events you can go and find it i
00:30:20.320 will send you the link uh it is here all of that is to say
00:30:27.360 the ruby ruby uses a parser generator it originally used yak it now uses bison
00:30:32.720 parcel generators are complicated technologies that use shift and reduce
00:30:37.840 operations to build up syntax trees cartridge generators are difficult to maintain across implementations of
00:30:43.840 languages they're not the most intuitive of technologies and it's difficult to maintain upstream compatibility it's a
00:30:51.200 good thing that ruby is going to slow down on syntax and future development because it's going to give an opportunity for all the other
00:30:56.799 implementations to catch up but this is not something that we can solve as a community without the help of
00:31:02.240 core this is not something that we can really fix without having a standard
00:31:07.440 library parser because as you saw with the massive list of tools that i went through
00:31:13.200 all of those went away when the money dried up you can only keep up with core for so
00:31:18.320 long um it's exhausting and you you have to have
00:31:23.360 the parser ship with ruby that's really the only option so ruby vm ast or ripper
00:31:28.720 those are the options i'm pushing for ripper but i'd be happy if we just had one that was standard that rubicon used
00:31:34.480 so that we could all get used to that and use it and yeah i hope this inspires
00:31:40.480 you to learn more about syntax trees i hope this inspires you to build more tools on ruby you know we're here to answer the call
00:31:45.760 that matt's just put out to build more tools and uh yeah that's all i got thank you very much