List

Zeitwerk Internals

Zeitwerk is the Ruby gem responsible for autoloading code in modern Rails applications.

Rails Core member Xavier Noria walks you through how Zeitwerk works, from a conceptual overview down to implementation details and interface design.

This talk is geared towards both seasoned Ruby developers looking to have a deep undertanding of Zeitwerk, as well as Rails beginners curious to know how this aspect of Rails works.

Slides available at: https://www.slideshare.net/fxn/zeitwerk-internals

Other links:
https://rubyonrails.org/
https://rubyonrails.org/community
https://github.com/rails
https://hashref.com/
https://github.com/fxn/zeitwerk

#RailsWorld #RubyonRails #rails #Rails7 #opensource #oss #community #RailsCore #Zeitwerk #rubygem

Rails World 2023

00:00:04.330 [Music]
00:00:15.480 yeah okay so uh dyber is a a ruby library that provides autoloading
00:00:23.560 reloading and eager loading for Ruby projects and in case your OCD is suffering a lot right now because autol
00:00:30.480 loading is one word and Neer loading is two words I feel your pain because that's the way it
00:00:38.960 is but uh that is how uh current ra
00:00:44.360 applications uh do these things using driver behind the scenes and perhaps if
00:00:51.320 you are new to rails you do not even notice because transparency has been
00:00:57.000 like a a key design goal of this Library what does what what doesn't mean is that
00:01:03.320 your rubby code has no trace on zyber so you write your normal application code Z
00:01:09.520 is nowhere to be seen even more there are several hundred gems uh nowadays
00:01:16.880 loading with zyber so perhaps some of your dependencies are using zyber as
00:01:21.920 well and you do not even notice because transparency also is translated to the
00:01:27.320 user of the code in this case of the Library so it does not belong even to the public interface the gem may say uh
00:01:35.479 to use this gem uh you know load this entry point it doesn't even publish that he using zy internally all right so
00:01:43.520 transparency has been a goal it's nowhere D to be seen um however uh this this does not
00:01:52.439 have to be magic I don't like the word magic okay so I want this user
00:01:58.840 experience but uh when you use the library if you
00:02:04.240 know what it's doing I instead of magic I like to talk about uh enhancing
00:02:10.239 something accelerating something about catalysts all right that's the point
00:02:15.280 that the library is going to do something that you could perhaps manually do but it's not for you it's a
00:02:21.239 catalyst okay so your your user experience I I
00:02:26.760 believe is better if you are interested into that if you know in addition to the
00:02:32.000 features that you get how it it is doing that all right so this is what the talk
00:02:37.680 is about now uh di is uh you you can
00:02:43.400 think of that uh Library as a code loader all right but technically is a
00:02:49.120 constant loader all right so in order to understand how zyber Works uh we need to
00:02:55.319 be more or less on the same page about certain key aspects of con in Ruby so we
00:03:01.280 are going to have two sections in this top the first section is that we put in
00:03:07.239 common these uh observations and then the second section is the proper thing about understanding how the uh zyc is
00:03:15.480 implemented but we need the first one first so the first thing is that the class and module keywords create
00:03:24.519 constants they store the the classes and modules that get defined in constants
00:03:30.959 this is a very unique thing in the Ruby programming language so for instance we
00:03:36.200 have here two chunks of code the first one the normal one class C you define a class call it C but this the class
00:03:44.280 keyword here is doing what the second CH is doing you create a class object
00:03:50.120 class. new and assign that object to the C constant that is what happening behind
00:03:56.599 the scenes in the class keyword same for modules if we Define a module called M
00:04:02.159 that's the same thing as creating a module object and assigning that object to a constant the constant the M
00:04:10.079 constant all right so this is linking to the second remark very important rubby
00:04:16.000 does not have Syntax for types there's no Syntax for types so let's just study
00:04:21.759 for a moment this slide right simple slide we we got 0 into X the x constant
00:04:30.520 and after that we say are you even yes okay let's let's see what's happening
00:04:37.600 detail here we are storing zero into an x constant and then we ask that x
00:04:44.199 constant if it's even what is happening there is that X is an expression the expression evaluates to
00:04:50.800 the value that it contains like it would happen in a variable same thing right so X which is your value zero okay zero
00:04:58.639 zero responds to even question mark so we get the thing this slide I believe is
00:05:04.919 not surprising to anyone it's what we expect now the next Slide the next slide
00:05:11.120 is very important in the presentation which is that when we write something like this project. find something that
00:05:19.039 project is a constant that's the point okay when we Define this CLA this project let's assume is a is a class
00:05:26.319 okay when we Define with class keyword there's a con project created as we saw before and a
00:05:33.479 class object is stored in the constant so that is a constant is not a type this
00:05:39.199 is this is a very important technical detail to understand how zyber work so that project is exactly the same as this
00:05:45.319 x is a constant regular constant there's nothing special about
00:05:50.639 that all right the next remark is that constants
00:05:56.880 belong to class and module objects and and this is the way Ruby kind of
00:06:03.479 emulates the concept of an name in space so class of module objects have
00:06:09.319 internally you can think conceptually as a high table constant table that Maps
00:06:15.280 constant names to their values and in the case of top level
00:06:21.080 constants in which class or module object are stored in object all
00:06:28.000 right so the module class has this uh API to
00:06:33.720 manipulate this constant table you can set constants you can get constants remove constants list constants okay
00:06:39.680 there's an API that reflects this model OKAY the model in which these things have this collection
00:06:47.479 inside for instance when we Define c the class C we saw before that we are
00:06:54.400 creating a constant if we list the constants of object we see the C
00:06:59.599 constant include there as a symbol is the name all right
00:07:04.960 now this C is listed among other things why do have ellipses there because
00:07:11.759 there's no types so so when we write string for instance capital s that is
00:07:19.199 not a type that's a constant it's a top level constant therefore belongs to
00:07:25.240 object H same thing all right so if if we see the actual listing here it
00:07:33.319 would include all these things all right we only highlighted C
00:07:38.400 here now I want us to see the same thing but now using the API which is the third
00:07:45.479 Chun in the in the slide so uh these three things basically uh as far as this
00:07:53.159 talk is concerned they are doing the same thing they are creating a class object and storing that class object in
00:07:58.319 a c constant in object because it is top level okay so in the last one concept is
00:08:05.479 creating a constant in theer so the second argument is the value so the
00:08:10.800 second argument is class new new class object stored in the C constant that
00:08:16.360 belongs to object the receiver is this the three things are doing the
00:08:22.280 same now let's go and introduce nesting here all right we have a top level
00:08:27.840 module M and nest we have class C this is the the emulation of name spaces because that
00:08:34.039 that class C is in the M name space we think about so if we list the constants
00:08:40.479 in this slide in object we have M that's the top
00:08:45.480 level one but C is not there C is in the constants of M okay we have created a c
00:08:52.760 constant in the M module so whenever we find a constant in a in a ruby listing
00:08:58.920 we have always to think this constant okay where where where is this constant
00:09:05.200 store in which class or module object which name space we could we could think all
00:09:10.959 right since there are lookups uh Ruby searches you know uh in certain places
00:09:16.959 that we are not going to see because it is not relevant to this presentation so again let's expose us to
00:09:24.240 this API so this is the same thing using the API all right we create a module
00:09:29.959 object in the first line assign that module object to the m constant in object which means a top level m in the
00:09:38.480 second line we already can refer to M how is that possible we do not have a modu keyword we do not say we do not do
00:09:46.600 a constant assignment with the equal sign yes we can refer to that because
00:09:51.720 there's nothing else is constant store in class and module object so that M
00:09:57.399 Ruby says well I have have a constant here I am going to look up this constant
00:10:03.920 in certain places one of them is object since we just created that constant in
00:10:09.320 object is found so we can call Concept on that thing and create the class so
00:10:17.240 this code is doing the same thing that the first CH is doing the same thing
00:10:22.760 only using the API now the last remark is a call a meth
00:10:29.320 called it autoload in module all right this allow you to allows you to load
00:10:35.440 constants on demand so let's see an example this is a real example from from
00:10:41.399 backround if you open the uh entry point of background you will see that it
00:10:46.760 defines u a namespace background and then it says outad action with this
00:10:53.959 string the thing here is that um whenever you use background colon colon
00:11:00.440 action or you want to access that constant somewhere in you know in background or in client code if the
00:11:07.079 constant is already in place normal but if it's not it's going to trigger this
00:11:13.160 autoload and it's going to issue a require on the second argument so it's
00:11:18.279 going to do require background slash action and if when that require returns
00:11:24.279 if everything is normal the constant is going to be defined and the the you know the the point that triggered the
00:11:30.120 autoload resumes and continues execution this is done on the Fly by Ruby itself
00:11:37.480 so why does Baron do this well one of the benefits of this is that you no
00:11:43.880 longer have to put requires in your code because Ruby is going to autoload On Demand okay and that that is why we do
00:11:53.079 not need to put requires in ra applications because diver is based
00:11:59.880 precisely on this API now again let's let's look at the
00:12:06.240 same thing from a different perspective this is doing the same thing but explicitly all right because in the
00:12:12.560 previous slide out is a method a method that is receiving two uh arguments and
00:12:19.880 is invoked in which receiver in who is self in the body of a module the module
00:12:26.560 all right so the this is the same thing the module aoad the name of the constant
00:12:33.240 and a string to be required all right okay so we are
00:12:40.560 ready to see how di works now the next slides uh the code
00:12:49.199 that we are going to see is heavily heavily edited okay because uh well the
00:12:55.560 library is not is not big it's like 1,000 lines or something like that but there's way more
00:13:01.480 stuff uh that we are that what we are going to see more details but the
00:13:06.800 essential ideas is what we are going to know and that's the context that that I would like people to have in mind when
00:13:13.639 they use the library uh if they want to know how it's how it's implemented these
00:13:18.680 are the key ideas all right so this is the way you uh create a
00:13:24.040 loader using the generic API the generic API is is it's very simple you
00:13:30.360 instantiate an object and then you say uh push these directories those
00:13:36.240 directories represent the top level name Space by default so in this case this is this is handmade okay we are pushing AP
00:13:43.480 controllers AP models nothing else two only right and that is saying okay this
00:13:51.759 please track these directories and they represent the top level name space what does that mean it means that if you have
00:13:58.240 AP models user. RB since AP models represents the top level name space that
00:14:05.720 means that user dob represents the top level constant user with capital u that's that that what that's what it
00:14:12.279 mean now once you have configured this this uh root directories you call set up
00:14:18.920 and in in the next line you can use anything in your project there's nothing else to
00:14:24.120 do so what does setup do
00:14:29.440 let's see first an example we are going to see what we are going to do first with concrete example okay so let's
00:14:36.320 imagine that our project has AP controllers users controller and then there's an admin name
00:14:42.920 space with a Rolls uh roles controller then in models we have user and admin
00:14:49.360 role let's let's work with this particular example all right now what zyber is going to do for
00:14:57.160 us is super simple is going to Define AOL loads for these
00:15:02.519 things only one level only one level so it does not descend to the admin name
00:15:10.399 space it sets three aols one for user controller one for user and one for
00:15:17.720 admin so whenever you refer to users controller if it's not loaded is going
00:15:23.560 to be loaded thanks to this thing that zyber has done for you
00:15:29.560 how is that done it's simple we iterate through this root directories this is
00:15:36.560 the implementation and we call this method that says basically please define the
00:15:43.560 the necessary autoads in this directory taking into account that uh we are now
00:15:50.160 in this name space so this directory represents this name space a name space
00:15:55.399 is a class or module object okay now the root name space we can assume is
00:16:02.120 object could be something else but at this point we are going to iterate through AP controllers and AP models
00:16:08.519 there's two iterations and the name space the second argument is object
00:16:15.800 right now we are going to this is what happens with one of them right one
00:16:22.199 iteration we have this LS internally there's this LS there's a lot of private apis here it doesn't matter this LS
00:16:29.720 utility basically yields to the block only things that are of the concern of the loader it's going to be either Ruby
00:16:37.880 files with airb extension or directories anything else is ignored so for instance
00:16:42.959 you can have JavaScript files in the same directory if you want you can have uh DS story Macos it's going to be
00:16:50.560 ignore right it's fine now since we have removed you know
00:16:56.319 all the things that are not interesting if we get into the block we are in two situations either we got a file or we
00:17:03.519 got a directory now let's bran let's see what happens with files and what happens
00:17:09.280 with directories with files we caly the name so if we got user
00:17:17.199 dob we caly user lower case to get camel case that gives us user with capital
00:17:24.799 u that's the C name variable okay constant name that inflector thing is
00:17:31.799 it's a inflector that that uh has the loader and it's independent of any other
00:17:38.160 inflector that can be uh you know affecting other loaders each loader has its own independent deterministic
00:17:46.400 inflector now look at the second line that's the autoload call that's what we saw in the example that is what barran
00:17:53.280 does but zyber is doing doing it for us so we say name is space object in this
00:17:59.640 case Okay object autoload user with this absolute path to
00:18:07.840 userb why an absolute path J gyrid has performance always in mind if you pass
00:18:14.679 an AOL path to require there's no look up in the low path in the low paths okay
00:18:20.760 so if you pass an absolute path require goes straight to the file right and then we do some hoste keeping
00:18:28.600 we remember the AOL loads that we have set that's the third line we we remember
00:18:33.640 this these ones and then there's an internal registry that says this loader
00:18:39.559 self is responsible for this path we are going to see later why do we need
00:18:46.400 this so simple we caliz set the autoload and store some state in the loader for
00:18:54.039 future use Simple now with directories just a little bit more things to do but
00:18:59.919 not a lot same thing we caliz and a name space as you know can be
00:19:07.520 spread in multiple directories in our example admin is in two places right is
00:19:13.480 below AB controllers and Below AP models but they represent the same name space
00:19:19.480 the top level admin name space so we might find admin multiple times okay so
00:19:26.320 if this is this is the first time we find find it same thing we set an autoload object autoload admin capital A
00:19:35.919 with this absolute path okay the absolute path is going to be the absolute path to a directory which is
00:19:41.760 something a little bit weird but we are going to understand later why we do this same housekeeping we keep the AOL
00:19:48.960 loads uh that we are setting uh We call we register that we are managing this
00:19:56.480 directory and then we keep track of all these several admin you know uh
00:20:03.480 directories in the last collection name is space de all right so in the loop in
00:20:09.240 the initial Loop that we have we first visit AP controllers and we say okay there's an admin here first time okay
00:20:15.679 set an autoload and remember that we have an admin directory here now in AP models there's also admin the the unless
00:20:23.280 is going to be skipped and but we keep track of that second admin so that name space this indication of admin is going
00:20:28.840 to have two entries right that is all that is set
00:20:35.039 up so let's recap the loader has scann the root directories only one
00:20:42.520 level only one level zy is as lazy as possible
00:20:48.120 always now at this point the AOL loads have been Define it but they have not
00:20:54.039 been triggered there's nothing loaded only the autoo only that slide that we saw with the three loads that is what
00:20:59.080 has happened that and some internal State now the setup call at this point
00:21:06.200 returns and the loader stops and waits does nothing else now with this we are able to load
00:21:14.559 the entire project in the next line how is that possible well the autol loads
00:21:20.600 are triggered on demand when you refer to one of those constants that you have an autol for then Ruby is going to
00:21:29.480 trigger the outo Ruby so the the actual outo loading is performed by Ruby which
00:21:35.679 is something made on propose because uh that's built in in The Interpreter it
00:21:41.880 respects the out the the lookup algorithms and everything it's built in in Ruby so we are using this key feature
00:21:49.120 in Ruby so they are trigged by Ruby when the constant is reference as we saw
00:21:55.000 before so that's the opportunity that we have to keep track of things being outo
00:22:01.080 loaded because there's a thin wrapper around require when an autolot is triggered Ru
00:22:09.440 is going to require the second argument right so we intercept that that's what
00:22:15.799 dver does and here you can see the registry used okay remember we have a
00:22:21.679 registry that says this loader is managing this path so the first thing we ask is is
00:22:30.039 there any loader uh um responsible for this let's imagine we are loading noiri
00:22:36.279 for instance noiri is not managed all right then we go to the else Clause so
00:22:42.600 this is not managed by dver no problem call the original required done right
00:22:48.600 now the interesting part is when we are managing so we get the loader that is loading this and we need to do this
00:22:55.240 because maybe there are seven of them okay we need to know know which one of them that's the registry about again we
00:23:01.360 need to Branch if this is a file we're going to do something if it's a directory we are going to do something
00:23:08.679 else the file part is quite simple uh first of all we call the
00:23:16.000 original requir so anything that Ruby has to do with this file please do it okay we are delegating the work to Ruby
00:23:23.919 then if the file was actually loaded we are going to do some housekeeping and finally we are going we are going to
00:23:31.240 return the same flag to comply with the contract of requir now what is that
00:23:38.360 housekeeping super simple as well autoload is that collection where we
00:23:43.919 store the AO loads that have been set the first thing that we do is delete
00:23:50.120 that entry from that collection so that collection grows as we set outo loads and then rings as AO loads are being
00:23:57.640 used okay it's Dynamic it grows and rings so it it tries to keep memory low right
00:24:03.600 only the necessary information is stored now we check was the constant
00:24:09.559 that we expected actually loaded if it wasn't we raise an error fine
00:24:15.559 but if it was loaded we continue no reloading by default is disabled in
00:24:22.600 zyber okay you have to enable reloading uh if you want to reload
00:24:28.880 so in the case that reloading is enabled then we keep track of the things that
00:24:34.520 have to be unloaded and that's the if the if um
00:24:40.120 condition there all right that to unload har is keeping track of this and is
00:24:46.240 storing the information that is going to be necessary to
00:24:52.159 unload now directories we saw how what
00:24:57.360 happens when you refer to user for instance okay there's a requir and
00:25:02.399 happens what we saw now if you refer to the admin name
00:25:09.240 space the wrapper is even thinner all right so only this housekeeping is going
00:25:15.159 to happen and then we return through because we control this we we intercept
00:25:20.399 this call is is there's no call to the original require because require does has nothing to do with
00:25:26.679 directories what happens here similar we delete the
00:25:31.760 entry so grow Ring We delete the entry and now you see a con set call we are
00:25:39.159 going to set a call to define the admin module because this admin module there's
00:25:45.120 no admin. RV file defined in the module however when you use the app when we use
00:25:50.880 the application the admin module somehow comes to life how is that here is that
00:25:56.840 so that concept set is is looking like like the examples that we saw in the
00:26:02.919 first half of the presentation so we are creating a module object and storing the
00:26:08.559 module object in the well the the CF constant reference is that per that we saw before it's a per that has the name
00:26:16.120 space and the constant name so the first element C zero is the name space name
00:26:21.640 space con set the con name and assign this module object that I just created
00:26:27.279 to that uh thing so it's object conet admin module new so in the next line
00:26:35.000 admin is reachable admin is created now if reloading is enabled the
00:26:40.559 same thing we keep track of things and then we need to go to the list of
00:26:45.679 directories that conform the name space and here is the recursion here is the recursion in those directories We Now
00:26:53.760 set outo loads with the same call that we saw at the beginning only that know the directory is going to be uh admin
00:27:01.720 some some of the admins and mod is going to be the module that we just created so
00:27:06.919 in admin the name space is admin that's the point but this is the same call that
00:27:12.000 we saw before and here's the recursion so dver does one level and then uh it
00:27:18.600 only descends to the branches of the project that I used and only one level
00:27:24.360 of time as lazy as possible so after that code this is this is the
00:27:31.360 situation now we got the module admin defined and then an
00:27:36.880 autoload for rollers controller an autoload for roll all
00:27:42.960 right uh there's an H case here because the name space could be defined in a file all right so in this case Hotel be
00:27:50.919 and hotel pricing do V we cannot load the first one because we need the pricing constant but we cannot load the
00:27:57.320 second second one because we need the hotel constant how do we solve this okay if we
00:28:03.039 wanted to do this by hand we could do this set an autoload there this is
00:28:08.279 artificial but we could do this okay this this would work now zyber uh is not going to edit
00:28:17.720 your files because of transparency so Z is going to do this
00:28:24.200 without editing the file and in order to do this it has a trace point that is
00:28:29.279 enabled if it detects name spaces this way otherwise is
00:28:34.519 disabled is a trace point on the class event that's cheap okay it's the class
00:28:39.760 event so this is called it when new classes and modules are created with the
00:28:45.480 keywords basically this is doing okay Hotel I got hotel but I have also the
00:28:51.559 list of directories where hotel is defined so when the hotel class has been
00:28:57.440 loaded the trace Point gives you the class object and now we can go and do
00:29:02.840 what we saw with directories we go we iterate through the directories and say okay this directory the name space is
00:29:09.760 this please set the outo loads one level more so this is the summary all right we
00:29:17.240 scan root directories only one level Define aoo for the entries that we
00:29:24.519 find then we wait for the aolo to be trigger it if you don't use the application nothing is going to happen
00:29:31.640 if you use it it's going to be loaded as less as possible now when the autoo if the autoo
00:29:39.760 are triggered we intercept the requires and there is where we can do
00:29:46.279 housekeeping for instance Define module um uh module objects on the fly if
00:29:51.679 needed as we saw with the admin example and at that point descend one level
00:29:56.760 descend in that particular branch of the project all right so this is the main
00:30:05.799 thing in the presentation how outo loading works and which is the state that we keep because as you are going to
00:30:12.640 see reloading and eager loading is just a corollary of all this so this is the
00:30:18.279 call to reload simple uh reload what does it
00:30:23.960 do Ruby does not have API to remove things from memory all right so we have
00:30:30.559 to kind of emulate this and this is where the conventions enter if you follow the conventions this is going to
00:30:36.919 happen cleanly all right so the first thing that we do is we set AOL loads if
00:30:42.080 there's any outo that has not been triggered we remove the autoload because
00:30:47.480 if you deleted the corresponding file and we do not delete that autoload in
00:30:53.000 the after reloading it going to make sense the the state of the aoad would not correspond
00:30:58.320 to the state of the file tree so if you remove the file we need that file that the corresponding autoload to be deleted
00:31:05.399 so we delete all of the pending ones now the con the if reloading is
00:31:11.440 enabled remember we keep track of the constants that have been uh loaded so we remove those ones to do this we use the
00:31:18.840 API remove cons that API that we saw in module then there's a technicality here
00:31:25.639 because you know that requir is them poent okay and when after reloading we are going to set AOL loads and they are
00:31:32.840 going to trigger requires okay if userb was required in the first place uh
00:31:39.080 require is going to say I I already required this thing there's nothing to do and the file wouldn't be loaded so we
00:31:46.080 need to trick a little bit this system by editing loaded features which is the collection where uh require looks for to
00:31:54.600 know if something has been already loaded we need to remove those things by hand okay and when we have unloaded
00:32:02.279 these things in in principle if everything is okay none of those class
00:32:08.200 and module objects are reachable anymore because the constants are gone so they are going to be eventually garbage
00:32:15.960 collected now we have unloaded the code and we are a square one square one is
00:32:22.600 run setup start again at the root directories and finally eager loading
00:32:29.679 also simple call eager load you may think that eager loading is a recursive
00:32:35.440 required where it is not by a number of reasons uh and this is a a a part of the
00:32:41.600 implementation of zyber that uh I like very much which is that here you could
00:32:47.240 say that zyber is using itself because eager loading is not a recursive requir
00:32:53.200 eager loading is a recurse of AOL loading so we do a breath first project
00:33:00.360 traversal and since we know which AOL loads have not been
00:33:06.120 triggered when we Traverse the the tree we say okay this file this has been this
00:33:13.279 uh trigger it no I have it so cons cons get cons get is going to
00:33:19.639 autoload now when you autoload perhaps that file is referring at the top level
00:33:25.080 to other I don't know five or six that in turn are going to be autoloaded
00:33:30.720 but remember then we remove them from the state of the
00:33:36.120 loader because we gr and we ring what happens that when we find those files
00:33:42.760 this thing is going to skip them directly so he going only to do this Con this Con conate
00:33:50.360 once as efficient as possible so that was it I hope that this
00:33:56.679 um you know clarifies at least a little bit how things are working behind the scenes and that's all I have all right
00:34:02.799 thank [Applause]
00:34:08.190 [Music]
00:34:17.839 you