Devly, a multi-service development environment

by Eric Hodel and Ezekiel Templin

In this video from RailsConf 2018, Eric Hodel and Ezekiel Templin from Fastly present "Devly," a multi-service development environment designed to simplify and enhance collaborative development across various teams. The session outlines the challenges faced by their growing engineering organization, which required a change in approach to managing development environments as the team expanded and began handling complex systems across different programming languages.

Key points discussed include:
- Challenges of Growth: As Fastly grew, their original development environment became less reliable due to increased complexity and a failure to meet diverse team needs. Each engineer initially ran a local copy of each service, leading to inconsistencies and frustration as the engineering team doubled frequently.

Themes in Developer Productivity: Several critical themes emerged from their experiences, such as the need for reliable, accessible, maintainable, and reproducible development environments. These themes shaped the design of Devly, ensuring that developers could focus on their work without excessive overhead.
Components of Devly: Devly allows developers to build images from their repositories, which enable communication both within and across teams. It integrates Docker to manage containers and streamline the setup of services, ensuring that images are shared across teams consistently.
Development Workflows: The presentation showcases various workflows for setting up and managing development services, including pulling the latest images, running migrations, and displaying logs. Friendly commands facilitate tasks that developers frequently perform, helping reduce friction and improving usability.
Feedback and Community Building: Hodel and Templin emphasize the importance of early adopters and community feedback in refining Devly. Their experiences led to the establishment of a supportive culture within Fastly and a significant uptick in tool adoption among teams.

In conclusion, the session provides insight into how organized communication and thoughtful engineering practices in tool development can lead to improved developer experience. Devly was created to address the overarching challenges of software development teams and facilitate better collaboration despite the complexities of varying technology stacks.

RailsConf 2018: Devly, a multi-service development environment by Eric Hodel & Ezekiel Templin

Devly, a multi-service development environment
Writing a system alone is hard. Building many systems with many people is harder.

As our company has grown, we tried many approaches to user-friendly, shared development environments and learned what works and what doesn't. We incorporated what we learned into a tool called devly. Devly is used to develop products at Fastly that span many services written in different languages.

We learned that the design of our tools must be guided by how teams work and communicate. To respond to these needs, Devly allows self-service, control, and safety so that developers can focus on their work.

RailsConf 2018

00:00:11.000 hello welcome to a deadly a multi-service development environment I'm Eric Hodel I work at fastly in the

00:00:18.720 developer engineering department on our development environment which I'll be describing for you today I've written a

00:00:24.119 lot of Ruby code some of which you use every day I've been writing software one

00:00:34.980 kind or another for more than 20 years now we currently work within fastly site

00:00:40.559 reliability engineering organization focusing on improving the internal engineering experience fastly for those

00:00:47.190 who don't know is a content delivery network and edge cloud provider we serve traffic for github New Relic Spotify and

00:00:53.670 many other popular websites and services we also provide service for all Ruby and Ruby gems downloads and do the same for

00:01:00.570 many other open source projects free of charge discussed at the talk if you interested in using fastly for your open

00:01:05.939 source project we have servers all over the world which served more than 14 trillion with a tea requests each month

00:01:13.290 this constitutes more than 10% of all Internet traffic which still kind of blows my mind because it was not always

00:01:19.710 a large we also employ the owners of a hundred percent of the world's best

00:01:25.170 dobbs very dog from the company we've got dogs all over the world we are

00:01:32.909 currently hiring Ruby application engineers if you're interested please find Eric or I after the talk we can

00:01:38.430 provide you with details a quick note before I continue I currently live in Portland - Oregon but

00:01:44.520 I was born and raised in a small town called me Phil about two hours north of here this is my first public talk so I'm

00:01:51.450 excited to be giving so close to home yeah thank you for the opportunity

00:02:04.659 problem that we believe impacts organizations of all sizes can I raise this up is that cool okay so today like

00:02:23.650 to discuss a problem we believe impacts organizations of all sizes to help us illustrate this problem I'd like to tell

00:02:29.440 you a story about the evolution of fastest API and this is a rough approximation of the service

00:02:34.600 architecture that back the fastly api circa 2012 the original fastest a

00:02:40.180 development environment consisted of a copy of each component of the fast a pai running on each of engineer's laptop

00:02:45.760 soon after the early team decided the virtual machine should be employed to provide a degree of operational

00:02:50.890 uniformity and parity between development production another attribute of fastly in the early days that all the

00:02:57.100 engineering work was being done by very small group of people changes to the systems were easily introduced and

00:03:02.440 distributed through source control which allowed the teams to rapidly develop and deploy changes another side affected the

00:03:09.820 small size of the company was the focus discussions were possible and this made decisions easy to communicate let's zoom

00:03:17.620 back in and step forward in time a few years fortunately for us the company was

00:03:25.120 successful during this period of time and that success opened doors for new opportunities to expand the business by

00:03:31.510 adding additional functionality to our API in some cases when we added new functionality we added new supporting

00:03:37.750 systems and when we as software engineers everyone in this room add new functionality and dependencies to our

00:03:43.870 systems we introduced complexity and I don't mean to imply the complexity isn't necessarily a bad thing to the contrary

00:03:50.709 would argue the complexity is an unavoidable side-effect of growth there is something else they haven't mentioned

00:03:56.170 yet that complicates matters even more which is that we like to use the right tool for the jobs and many of them our

00:04:01.209 services were written in entirely different languages with very different workflows so despite the increase in the

00:04:07.419 number of languages and services our development environment stayed much the same moreover the gaps between each groups

00:04:15.400 development workflows grew considerably this became increasingly problematic as our engineering department doubled in

00:04:21.250 size every six months for a number of years so as a result our original

00:04:27.220 development environment became increasingly unreliable and established processes to communicate changes broke

00:04:33.370 down maintaining any single engineers development environment was problematic

00:04:39.150 so we we have engineers working on everything from code that runs to the Linux kernel to code that runs on the

00:04:44.500 browser and the needs of each team in those different areas are dramatically different our original development

00:04:50.080 environment was unable to meet the needs of one team without compromising the needs of another this growth continued

00:04:56.560 regardless of our development Mo's companies may continue to grow that's just what its gonna do so writing and

00:05:04.060 scaling and software is complicated and there are many moving pieces and things to keep in mind while you're doing it as

00:05:10.750 an industry we've established and continue to improve upon strategies that help us direct our time and energy I

00:05:16.300 believe this is due in large part to our ability to observe software systems in isolation organizations on the other

00:05:22.780 hand are far more complex and much harder to observe in systematic ways but

00:05:28.150 by introspecting on our own experiences and listening to our co-workers we were able to find the themes and common frustrations of which these are some

00:05:35.260 examples so you know here's your laptop we'll see you in two weeks near the velum environment is running does anyone

00:05:42.729 know why the api gateway crashes in a loop I obtained the rest of my development

00:05:48.789 environment now nothing works what happened I can't do my work today because I need to rebuild my development

00:05:54.729 environment that's clearly untenable and becoming worse over time increasingly

00:06:00.430 problematic how many people just out there have actually well I'm sorry but

00:06:08.140 yes I'm glad it wasn't just us so you

00:06:14.200 but ruber than the development alarm and our friends and co-workers becoming increasingly frustrated by the situation

00:06:21.120 so what to do during the same period of time a lot of new tools arrived on the scene none that met all of our needs so

00:06:28.830 through observation observation research and a lot of discussion with our friends co-workers it appears in other companies

00:06:35.200 we arrived at a few important themes and we believe these themes embody the traits are desirable developer focus

00:06:41.050 productivity tools the development environment must be reliable I should be

00:06:47.680 able to run a small number of commands to get what I need running I should not have to know how every system works to

00:06:53.620 do my job and I should be able to easily see the local health of systems I rely upon I should never ever have to spend a

00:07:01.960 day rebuilding my heart development environment must be accessible maintain

00:07:08.200 Urza systems must be allowed and encouraged to maintain their development environments collectively I should be

00:07:13.450 able to build and test new changes across systems owned by different teams easily on a development environment that

00:07:19.599 spans multiple teams and workflows must be maintainable by the community of folks that are using it managing changes

00:07:25.090 in source control illuminates past and present ownership even with many components so structure and form should

00:07:30.580 be encouraged through convention documentation who tolling in feedback loops rather than enforced by gatekeepers we

00:07:37.840 want to develop an environment to be able to run the street services together to composable units so it should be

00:07:43.479 really easy to try new supporting systems and swap things in and out without having to worry about writing like a bunch of chef's code or doing

00:07:49.539 bunch of other things like that a development environment must be reproducible we need the ability to

00:07:56.409 determine and apply the last known good state of all systems source control with similar mechanisms should allow us to

00:08:02.140 determine how we around it this known good state and we should be able to leverage existing tools like git

00:08:07.419 rubygems Perl Sipan pythons pip and go line step to arrive here so through the rest of

00:08:16.780 this talk we hope to show you how we start to meet the needs coworkers at fastly by applying these

00:08:22.610 things to a tool we've been building together for the last year we call the tool Devlin to tell you more about

00:08:27.860 deadly we'd like to hand things off to my friend close product collaborator and the lead engineer on the deadly project

00:08:33.050 at Air code Thank You Zeke I will talk

00:08:38.659 about deadly and some of its components and features Zeke covered definitely is

00:08:44.390 developed designed for developers definitely builds images from your repositories it uses those images to

00:08:51.290 manage containers and it enables communication both within and across teams of developers everything

00:08:57.980 definitely is hmm my slides now okay

00:09:05.029 family is distributed for Mac OS and Linux we provide a standalone executable built by a ruby packer and provide

00:09:11.630 packages for Mac OS and Debian Bentley that's can you configure all of your

00:09:16.850 services it helps you build images from your repositories using docker files

00:09:22.270 allows you to configure those images to run as services and runs groups of

00:09:28.130 services together as part of Iraq an image contains the files necessary to

00:09:33.380 run a service the audit log image uses Ruby so it has a copy of our application code this code requires some libraries

00:09:40.190 like rails sidekick and a JSON parser so the images in which contains those installed gems and the JSON parser

00:09:47.120 requires the C library so we install that along with the pack OS packaged system in our repository there is a

00:09:53.570 docker file that contains the instructions for building this image images can contain applications for any

00:09:59.420 language our stat service is written in go it's code has a go binary compiled

00:10:04.670 from the stats application code the web app our customers use is written in ember this image runs a copy of the

00:10:11.750 application code ready to run we share all these images across all the teams by uploading and downloading them from the

00:10:17.600 Google container registry and this allows us to be sure we're always using the latest images and the latest source

00:10:23.720 code definitely service is a runtime configuration for an image here we've

00:10:29.660 created the audit log service using the auto log image the service runs of command so the hall

00:10:35.890 audit log service provides an API for managing event data it runs a rails server to set provide the HTTP interface

00:10:41.980 for events our audit log service needs to be able to need to be accessible to

00:10:47.830 other services so they can read and write events to all other services to communicate with us we expose port 8888

00:10:54.460 and if you use a development framework like rails that supports live development you can mount your

00:11:00.610 repository on top of the files in the image this allows you to work in your favorite editor with from your favorite

00:11:06.550 OS you can change a file on your host OS and see the changes in your browser this

00:11:13.030 service runs the audit log API but we also have some sidekick background jobs to run to make it easier to read our

00:11:19.240 logs let's use a separate service to run those background jobs since the background jobs use all the same models

00:11:25.690 and databases as our application we can use the same image we create the audit

00:11:31.270 workers service but we run the site the sidekick command instead of the rails server command and the audit worker

00:11:37.570 service then we can start up the audit log service our log API service it only

00:11:43.570 runs the rails server and when we start the audit worker service it only runs the background jobs the separation helps

00:11:51.400 make development a little more accessible because the logs are separated we can also test our hard

00:11:57.310 workers in complete isolation from the API we'll create a few more services for

00:12:02.860 our applications including the authentication API the configuration API and some databases they use we're going

00:12:09.850 to work on the configuration API we don't want to startup the services that we don't need and the same before working on the authentication API we

00:12:18.160 create a rack for developing the configuration API of it only contains the services it needs we need a my

00:12:23.320 sequel database the audit log service and the config API services to do our work a rack can customize a service

00:12:30.820 since we want to access the services running in the rack for development we expose ports for a few services to the

00:12:37.210 host OS this allows it to connect to the allows us to connect to those ports of our browser you can also set environment

00:12:43.810 variables or mount different files to Beach the behavior of the service definitely

00:12:49.920 allows you to configure multiple racks the authentication team ease or work on its services which include the Postgres

00:12:55.500 database the authentication API the authentication development rack also uses the auto log service just like the

00:13:01.740 configuration team when we start these racks they use independent containers to

00:13:07.170 run their services this allows the teams to have different configurations and software versions for the audit log

00:13:13.050 service that won't collide with each other for example you can start up both racks at the same time and isolate bugs

00:13:19.590 that span multiple services using common configuration to replicate services

00:13:25.080 across teams makes sharing your work easier the configuration for the images

00:13:30.420 the services and the racks are in the shared deadly library repository fastly we allow any of developer to make

00:13:37.110 changes to the deadly library and have them discuss the proposed changes with people that develop that service the

00:13:43.440 authentication configuration and audit API teams all have racks but using the audit service when the audit dev team

00:13:50.460 proposes changes of the audit log service all of those teams need to be able to discuss them by tracking the

00:13:55.920 connections between teams and services through the deadly library repository they become more visible which improves

00:14:02.310 the maintainability of your services and the communication across your teams now

00:14:07.560 that we've had an overview of the components of dev lead and how they combine I'll show demos of some common development tasks using definitely from

00:14:13.380 the perspective of developers on the various teams we've just seen well one gone through some workflows like getting

00:14:18.660 started with development sharing changes within and across teams and set up some

00:14:23.820 convenience tools that will make development easier for ourselves and our co-workers let's start at the beginning

00:14:31.440 by setting up dev Lee as a first time user we run deftly set up and give

00:14:36.810 Debbie a git repository to pull a definite library from the stem loads the library repository and the other

00:14:43.470 repositories for our services along with checking out the repositories set up perform some additional checks including

00:14:50.670 the darker version and your Google SDK version the set up command will try to

00:14:55.710 fix things it can or give you a message to help you fix it if it can't that by itself this step takes no more

00:15:01.619 than a few minutes to fetch your repositories and perform the necessary checks once setup completes we can run

00:15:08.429 deadly info to see what racks and services are available to us Deadly will give us a list of the racks and services in our dev Lee library we

00:15:18.420 can retrieve information for Iraq which includes the services it starts and we can retrieve information for a service

00:15:24.660 which includes the image the repository and metadata for ensuring the image is compatible with the files in the

00:15:30.779 repository that we've mounted and it's up-to-date with the image in the registry now that we have completed

00:15:38.399 setting up definitely let's start Iraq and perform some basic development tasks like viewing logs using our service and

00:15:44.550 making a small change the deadly up command starts Iraq this is we don't

00:15:50.100 have all the necessary images downloaded from the registry first we see definitely pulling one of those images once all those images are downloaded

00:15:56.990 definitely creates a network to isolate this rack and starts all the containers when containers are dependent upon each

00:16:03.240 other definitely can serve them in parallel to speed up startup now let's

00:16:08.879 check to see if everything is running ok we run deadly status to see which racks and services are currently running we

00:16:16.709 can see that the two API services and the database are running and we can see

00:16:22.019 that the two API services are accessible to the host OS on ports 8 8 8 8 and 9 9

00:16:27.779 9 9 we can view the logs for the rack by running deadly logs this command will

00:16:34.350 continue to follow any new logs until we exit with control C since everything seems to have started for real let's try

00:16:40.799 out the configuration service by switching to the browser the configuration service was running on

00:16:46.589 port 80 at 88 so when we load it we see the main page with the configuration API now we're triple sure that the rack is

00:16:53.490 working let's back switch back to the terminal and view the logs from this request the

00:16:59.999 logs shows our HTTP requests from viewing the config API main page everything is definitely working so

00:17:06.390 let's do some work by opening up the main page in our favorite editor then we

00:17:12.120 open the source for the paid from the host OS and add some text this is an example service because no

00:17:19.470 one has ever figured out how to exit vim we only save the file so let's switch

00:17:24.839 back to our browser and see if our change worked reloading the configuration API shows the text this is

00:17:31.559 an example service as appeared our change was successful let's check the log to see if it wasn't a fake so of

00:17:39.390 course the request from the refreshed page appears in the logs and since this

00:17:44.640 has a 200 response with a different page size we definitely loaded new content

00:17:49.730 now that we are done with our work let's shut down the rack by running deadly down this stops all the containers we

00:17:56.250 had running this might save me a little bit of CPU power but normally our services fastly are lightweight even

00:18:02.880 when iraq has a dozen services running when we work within teams will be

00:18:08.940 pushing and pulling changes to our repositories and when we work across teams with definitely the other teams

00:18:14.160 will push images for their services when they have a new set of features ready

00:18:19.850 for this workflow the audit log team has updated the auto log image to add a source field for events and we need our

00:18:26.820 services to use this new feature first let's see if we already have the source

00:18:32.370 field by loading the audit log service in the browser I see the user ID the time stamp and the action fields no

00:18:38.549 source field so we're using an old audit log image let's switch to the terminal and update our image so we verified that

00:18:46.620 we don't have the source field will pull the latest image and sorry about the lack of output it's a bug our running

00:18:52.830 audit log service is still using the old image so we need to shut it down and start a new image we can do this with

00:18:58.620 definitely restart which will replace our audit log service with a new run new one running the updated image let's

00:19:04.980 switch back to our browser to see the updates we refresh the audit logs service in the browser and see that the

00:19:12.030 source fields field has appeared as we expected it now that our audit log service is

00:19:17.100 running the latest image we can continue updating our service to use the source field in the auto log

00:19:23.240 so far we've worked outside the container sometimes we need to run commands from inside the container where all our

00:19:29.400 dependencies are loaded so let's pretend now that we're on the auto log team and we'll go back in time a bit we're not

00:19:36.300 working on adding that source column to our database and to do this we need to run this migration we've just finished

00:19:41.430 writing we can't do this from the host OS because none of our applications gems are available they're only installed

00:19:47.610 inside the container so in either run the migration from inside the audit log service let's start with a browser again

00:19:54.660 we view the audit log homepage and of course we don't have the source field because we haven't run the migration yet

00:20:02.480 w exec lets us run commands inside the container we don't exactly remember the image layout so we start a bash shell so

00:20:09.900 we can explore after the shell is open we remember to change of the hot log source directory but to check to make

00:20:16.620 sure we're in the right place we run rake capital T then we can run the migrations we see the migration said

00:20:23.250 that it added the source column so let's go back to the browser and check it reloading the page in the browser shows

00:20:30.210 the source column migration is complete now that we've remembered where the rake tasks run rate break test live let's run

00:20:37.470 the command directly so we can use our shell history in case we need to roll back and retry the migration if there was an error so you switch back to the

00:20:43.980 terminal and run our migration using dev the exec with a complete rake command

00:20:49.020 line now this is a little better but it's really only cave okay for this one tasks this one time when we share our

00:20:55.920 this work with our other teams or team members how will they remember how to run the migrations what we done is not

00:21:02.130 very usable and it would be nicer if the migrations ran automatically when we started a rack so we can get to work

00:21:07.410 right away we can automate running the migrations a track startup using a post

00:21:13.860 build tasks the post build tasks run for a service after the rack starts to perform any extra tasks and the extra

00:21:20.610 setup tasks you might need such as migrations like we saw or seeding data this lets users who are unfamiliar with

00:21:27.840 a service get to work right away post build tasks live indefinitely library

00:21:34.170 and are built as rake tasks that deadly up runs the tasks live in the post build

00:21:40.450 namespace the task is named the same as the service it will run for the rack argument allows EE to run migrations on

00:21:47.470 the correct service if you have multiple copies running in multiple racks at the same time till we run definitely exec on

00:21:55.120 the audit log service just like we saw running on migrations earlier and we use

00:22:01.030 the same rate command line to run the migrations after shutting down the rack

00:22:07.960 we can start it up again with definitely up we go through all the steps we saw before from the starter rack section

00:22:13.500 then at the bottom we see definitely exec run our migrations including the

00:22:19.090 migration output now whenever someone starts our service the migrations will run automatically so they won't have to

00:22:25.390 look up or ask what to do of course we still need to run migrations during

00:22:31.570 development for the next time we want to change the database schema shutting down

00:22:38.140 and starting the rack takes several seconds and we don't want to want to take all this time to make this easier

00:22:44.080 we can save long that the long dev the exact migration command as an

00:22:49.210 easy-to-remember command we don't want to have to remember or look up or type

00:22:55.000 this long command to run migrations let's give this command a friendly name that's easy to type and remember the

00:23:09.669 devil Yama file for the repository we are working from here the audit log of depository each repository can have its

00:23:16.510 only animal with custom commands let's zoom in and look closer the run commands

00:23:24.370 are a collection of the Friendly command names that we want to run I chose auto migrate as the name of the

00:23:32.080 command which will run the migrations command runs on the audit log service

00:23:37.350 and the command line is the one we've seen earlier that runs on migrate database migration tasks we can also

00:23:46.000 define a test command that runs the tests inside the service this way anyone can run the tests where all the

00:23:51.520 dependencies are to date so now we can run deadly Ron migrate from audit log directory and we

00:23:58.830 see the migrations run or we can run the tests since these tests are inside a

00:24:04.590 container which is running as part of Iraq they may communicate with other services in their rack you can have

00:24:09.600 separate racks where one is configured to run unit tests that don't talk to other services and a larger rack with

00:24:15.480 more services that runs integration tests either test suite could be started from the saved command sometimes we have

00:24:23.279 to work on a service together with another team and definitely as a workflow for cross team development Autolog team is working on some new high

00:24:30.570 security features their work isn't complete yet but they want our feedback before they continue and make something

00:24:35.639 that's too difficult to use or integrate to give them feedback we need to work with their work-in-progress branch we

00:24:42.029 were told that if we went to the autolog page we would be running the correct code if a high security logo appeared we

00:24:48.330 go to the hog page and see the same one as usual no high security logos anywhere so we'll need to switch to their branch

00:24:54.110 the high security branch may have new dependencies that our image doesn't have so we can't mount the copy of the

00:25:00.690 updated branch on top of our existing image because the updated gems and code won't be there we'll need to build a new

00:25:06.899 image to be sure everything will work to build the new image from the high security branch first we need to tell

00:25:13.919 dev Lee to use our repository we control B is definitely linked to tell dev Lee about our copy of the break repository

00:25:20.460 this will let us build an image from the correct branch we see the audit log of repository is now linked inside of the

00:25:27.570 devlin library next we change to our repository copy and we check out the

00:25:35.309 high security branch then we use deadly build to create a new image for the audit log service now that our new image

00:25:47.100 is built we can restart the audit log service we use deadly restart again like

00:25:53.130 we did when we pulled the auto log image that had the source field now when we

00:25:59.429 reload the browser we can see we're using the high security branch because the high security logo is present we can

00:26:04.620 now do some test integration with the new code to give feedback to the Autolog team on the high security features as adoption increases will want

00:26:13.149 to centralized image building through continuous integration so you always have an up-to-date images in your

00:26:18.279 registry by running your tests through dev Lea you have a more consistent environment because the image service

00:26:24.460 and RAC are all built and configured the same way both continuous integration and local development environments regularly

00:26:32.289 set up may take too long in a CI environment as it performs more checks and retrieves all the deadly library history the CI mode for deadly setup

00:26:39.669 reduces the history and repository saved to fetch to fetch to save time the CI CI

00:26:46.210 environment has a repository checked out to the correct commit already so we can use deadly link to use the

00:26:51.609 correct source files overwrite file flag make sure we replace any existing files

00:26:58.049 the new code we're testing may have new dependencies so we need to build a new image just like we did with the high

00:27:03.309 security branch finally we start the correct rack for testing this service

00:27:08.470 then everything will be ready to run tests same as our local environment the

00:27:14.200 next thing to do is run our tests this uses the saved command we saw earlier if

00:27:20.289 the tests successfully pass and were on the master branch we can then push our new config API image to the registry to

00:27:26.139 share this image with all the teams adopting dev Lee has given us a common

00:27:31.239 way to start more and more of our services once you have a sufficient set of teams using deadly you can build on

00:27:36.669 top of this capability beyond the workflows I've demonstrated all the workflow demos I showed were for Ruby

00:27:43.330 applications but they are no different for developing a go application which runs a compiled binary inside of an

00:27:48.340 image with a go app you added the source code create a new image with dev we build and test it with a save test

00:27:54.879 command this makes the development process more accessible because you don't need to learn as many new things when working on different languages I've

00:28:03.609 already shown the basics of CI in dev Li but with a common way to start services and run their tests you can go beyond

00:28:09.369 running tests for a single service the services you build are composable into larger racks the more services in Iraq

00:28:16.090 you have the closer to a real deployment you come so the easier it is to run integration tests or end-to-end tests across your

00:28:21.919 services by ensuring your images services and racks are reliable at every level you can more easily move your

00:28:28.760 containers toward development deployment the image as the base of all your

00:28:34.279 services makes the contents of any application accessible to various security scanners this allows you to run

00:28:39.409 internal compliance processes run vulnerability scans of the libraries you're using in your images or find

00:28:45.500 issues through static analysis you can perform enhanced testing such as Iraq

00:28:51.110 for fuzzing where bizarre inputs are sent to your service that try to break it you can get started with chaos

00:28:57.620 engineering from an isolated stable environment you can build separate staging environments for groups of

00:29:03.110 services or to run integration tests and Zeke will share with us a few things

00:29:08.960 we've learned while building and collaborating on dev Lee with our co-workers finding the early adopter

00:29:22.070 group thank you thanks Bonnie early adopters is key we were fortunate enough to have a really diverse group of early

00:29:28.580 adopters with varying degrees of experience who were willing to provide us constructive criticism early on we

00:29:34.250 had a few early adopters of previous container experience we were able to provide us with feedback on containerization and orchestration

00:29:39.620 strategies which is really really helpful we also had a few early adopters who were relatively new to the company and who had little or no previous

00:29:46.010 experience all of our early adopters make a deadly a better product or made

00:29:51.649 it made deadly an early project a better product early on and and you know for

00:29:58.940 the especially with the the folks who are new to the company it had a little bit of container experience

00:30:07.070 desirable traits so on top of the feedback of our early adopters that

00:30:13.550 their advocacy helped us increase our internal adoption from 5% of proximate product engineering groups over 50% in a

00:30:19.850 little over six months which is pretty fast so that kind of adoption rate we

00:30:24.890 learned the importance of building and sustaining net no pretty supportive community within the company a very very

00:30:30.620 early on the process so we talked very openly about our plans successes and especially our failures acknowledging

00:30:37.130 the fact that we making mistakes and we're learning with everybody as we're going along as a today are the deadly

00:30:44.180 library repository has 30 contributors and includes people from most everything in FAFSA engineering which is kind of

00:30:49.600 bad so all this fosters a sense of sort

00:30:54.770 of shared ownership and togetherness that has been really important in the development and adoption of establishing

00:31:04.640 thila feedback loops with the people with this community developing is very key the more heard people feel the more

00:31:12.290 likely they will be to talk and ask questions and provide feedback and that's ultimately what we're trying to do is to get people to talk more that's

00:31:18.410 kind of interesting that it's software engineers we're building tools and we're kind of thinking about everything is

00:31:24.260 being codes and requests one between things but really the communication gets

00:31:30.170 introduced one of the ways that we really working to establish these feedback loops is to sort of acknowledge

00:31:36.560 discuss and ticket bugs that our users are finding very very quickly and when we fix them we let people know that

00:31:43.130 reported them and we try to get them involved in sort of the review process to make sure that he's actually solving a problem that they're seeing

00:31:53.680 documentation is really good so good for getting started Docs and reduce friction

00:31:59.540 and they absolutely need to be maintained one of the things that we did early on that I think was really

00:32:04.640 impactful was to have sort of like separate operating our operating system

00:32:10.910 getting started directions so that people would you say well go they're actually accurate that in

00:32:17.570 combination with the packaging that we're doing right but people say like here's your laptop install this package

00:32:22.600 go through these directions and you actually hasn't it's functional within pretty relatively short period time our

00:32:29.060 desired target was sort of to go from multiple days for the time to start up a

00:32:36.440 new development environment to 15 minutes and we've been achieving that for three or four months now I think

00:32:42.230 which is pretty it's kind of phenomenal so we also leave because we're building

00:32:48.440 this community we want our users to know that they are free to update the documentation if they want to we don't

00:32:54.110 tell people like you up that Docs yourself but we will do it as well but we wanted to sort of become a shared

00:32:59.810 resource that everyone is using to to learn from and teach their peers another

00:33:05.750 thing to document and this is something I think is a major learning for that he helps enforce that I was not doing a

00:33:12.500 very good job at was documenting the administrative tasks and release processes that were easy to get these package things out you'll be very glad

00:33:19.400 you did that because the last learning was that I made everything that you can

00:33:25.130 as part of the eternal pulling it turns out that the QA automation for cross-platform see eliezer is really

00:33:31.400 really hard I think you a in general is really hard I find it really hard but but doing it when you're dealing with

00:33:37.640 multiple operating systems and a lot of moving pieces makes it incredibly hard so despite the deadly CLI having 97

00:33:46.010 percent test coverage we find bugs and weird state-specific educates all the time or I guess our users do and so the

00:33:55.670 more automation that we have especially around our release process proved to be a real time saver if you're going to be

00:34:01.250 releasing a lot of new versions pretty rapidly users we've also started to extend both

00:34:08.060 the tooling and test harnesses so we're able to check for and report common issues around like things like the rack

00:34:13.250 schemas beginnings that make it really really easy for people to write Iraq and

00:34:19.040 that we do some sort like sort of automated testing against it to see if if all of the all of its services are

00:34:25.190 exposing the right courts or if they're doing anything that might be unto worry for the environment so that we have a way to serve inject new conventions into

00:34:32.450 our CSS we regret to say that devily is

00:34:38.000 not yet open source supporting our users at fastly and preparing this conference did not allow us the time to prepare

00:34:44.419 Debbie for open source release but we're very close watch us on Twitter's probably at fastly as well publicity on

00:34:52.460 the blog once again we are hiring if

00:34:58.340 you're interested in talk with us we're both very nice

00:35:03.680 we like to think our reviewers for helping us previous to talk but also I

00:35:10.370 could think that Hipple users I took a

00:35:15.800 village and we have

00:35:23.620 logos and things that we used and thank you yes thank you for your time are we

00:35:32.390 managing configuration for services so the W library contains how the

00:35:38.660 application code gets shared within Delhi and is a reflection of how it will run and pop and production inside of the

00:35:46.940 service if there's any whatever ports it needs to connect you for other services

00:35:52.640 those are handled internal to the image so those live in the repository anything

00:35:58.490 that connects two services together like exposing the ports or setting

00:36:03.560 environment variables through at a certain way that lives in the devi library so how do we manage two services

00:36:10.700 sharing a configuration or to projects how do we manage to projects

00:36:18.360 having the same service so there's many of our racks we use services I think the most used service

00:36:25.410 is used in 15 racks or so and that one

00:36:31.850 the configuration inside of dev lis is very small because there's been a

00:36:37.950 convention about talking within all the teams that use it about we're going to run on this port we provide this API and

00:36:46.160 there's the regular communication the dev Lea is facilitating between the teams that use it so definitely doesn't

00:36:52.350 necessarily need to manage the service and more as a focus on managing the communication that needs to happen to

00:36:58.620 share that information so the question

00:37:15.090 was this looks a lot like docker compose is it what's the relationship so we started by using docker compose we found

00:37:23.760 that docker composes features seem to more devout more production focused and

00:37:31.020 we wanted to have more control over what commands our user had to type what error

00:37:36.600 messages we would give because if we use docker compose as a command another command line tool so we'd have to parse

00:37:42.840 its errors convert those into something that we could tell our users this is what you do to fix it so because of that

00:37:49.770 we are borrowing large parts of the docker compose schemas but we provide

00:37:55.800 all of our own docker interactions we have our own own client for that

00:38:12.400 i sat in a room for three days and wired up dr. Campos docker and a bunch of rake

00:38:20.230 files and made a mono repo and then sort of we did it that way and pretty quickly

00:38:25.600 is like yes gonna become unmanageable it was also a really good opportunity I think for both of us to really learn a

00:38:31.210 lot more about the docker api's and sort of was sitting under the covers and that was that really it's been good as we're

00:38:37.630 looking at as well thank you for your

00:38:45.100 time and you can speak with us afterwards outside