Keynote: On the Care and Feeding of Feedback Cycles by Elisabeth Hendrickson

00:00:10.719 ah i'm gonna confess this is a little bit surreal for me um even without kovid i

00:00:18.560 wasn't getting out a whole lot uh so in 2019 i don't believe that i did any

00:00:25.599 in-person live conference events like this so it's been probably at least three years

00:00:31.199 i'm a little nervous so i hope this all works for all of us but it is so good to see you all in

00:00:37.680 person and i'm so grateful to be here and to see you

00:00:43.040 so let's let's talk about feedback i this um

00:00:49.200 where to start well better just get started this is talking four parts first we're going to talk about the

00:00:54.559 nature of feedback in sort of an abstract way we'll go through that fairly quickly then we're going to talk about software and how it applies to

00:01:01.280 software then i'm going to tell some cautionary tales and then i'm going to bring it home so talk in four parts

00:01:09.040 let's start with just what is feedback it is the simplest thing in the world you do a thing you see what happens

00:01:15.040 you get empirical evidence that tells you if the thing that you did had the

00:01:20.960 effect that you intended and the empirical evidence part is the super important part of that

00:01:26.640 because opinions are not actually feedback if they are not giving you concrete information about the effect of

00:01:34.320 the thing that you did now we have a lot of very fancy ways of talking about feedback loops there's the

00:01:41.520 deming cycle plan do check act it's just a feedback cycle we plan what

00:01:46.799 we're going to do then we do it then we check to see how it went and then we act on the information that we got because

00:01:52.799 if you get a lot of feedback and then you do not act on it well that's a problem

00:01:59.040 um there's the ooda loop john boyd gave us the ooda loop of observe orient yourself

00:02:05.759 check act that's another feedback cycle there's lean startup build measure learn

00:02:12.080 build measure learn and of course the idea here is build the smallest amount possible so that you can test your

00:02:18.800 hypothesis it's basically the scientific method we're forming a hypothesis about what the market wants or what would be

00:02:25.680 successful for the outcomes that we're attempting to achieve and then we're going to

00:02:31.599 design that experiment the cheapest experiment we possibly can and then we're going to observe what happened

00:02:36.640 when we ran that experiment and then that's going to lead us to form our next hypothesis so it's just all feedback

00:02:43.280 so far so good sweet let's talk about software

00:02:49.200 this is where things get a little bit complicated um once upon a time when i gave variations of this talk by the way i've

00:02:55.599 given this talk a lot of times it's slightly different every time this time for reasons i still don't understand

00:03:01.200 myself i decided to rebuild all of the slides so you're looking at an entirely new

00:03:06.319 deck even if it looks a little bit familiar uh i will also say this is the first time

00:03:12.159 i'm giving this talk presenting from google slides this is not a google ad but i will say that the world has gotten

00:03:18.959 sufficiently different that i no longer feel the need to have everything local on my computer the internet is actually

00:03:24.159 ubiquitous plus also the whole download work offline mode actually works anywho

00:03:29.760 um so let's talk about every software project ever whether you work in an agile way and you're shipping frequently

00:03:37.440 or you work on big honking things that ship every five years

00:03:42.640 still yes some of you are laughing hi i'm an old that was the way things worked

00:03:51.360 been in the end wrote my first line of code in 1980 so

00:03:56.480 every software project ever at the start you're analyzing some stuff now that

00:04:02.000 might be an analysis phase or that might simply be doing a little

00:04:07.439 bit of user research to understand what we want to build the the problem really this is about analyzing the problem that

00:04:13.680 we want to solve this is the stage at which we're forming some hypotheses and then we're doing some design and

00:04:20.400 that might be designing a ui it might be capital a architecture but we're doing some design

00:04:26.560 and we're implementing now notice there's this curve that i haven't talked about yet that's the speculation curve because at

00:04:33.520 each stage we're speculating we are building assumptions on top of

00:04:40.080 assumptions we're speculating that we understood the problem to be solved that when the business analysts went to

00:04:46.240 gather requirements from the corporate stakeholders that they actually understood the corporate stakeholders

00:04:51.680 that the corporate stakeholders were actually saying um what their needs were as opposed to offering solutions

00:04:58.400 packaged as needs that weren't going to solve the problem that they actually had that never happens

00:05:04.639 um so we're speculating at each stage we're speculating that our design will in fact

00:05:10.000 solve the problem that we identified we're speculating that our implementation

00:05:15.120 is going to work that the so at each step we're building assumptions

00:05:22.000 uh and then we end up iterating whether you were doing waterfall or agile you still end up iterating you'll notice the

00:05:27.759 slope of that curve is starting to come down because now we do get some empirical evidence

00:05:34.320 then we do whatever is final testing if you worked in a very traditional kind of way you might have an entire qa

00:05:40.000 department doing the final test cycle if you work

00:05:46.160 in a very agile way you might be putting things onto the staging server whatever it is you're doing some form of final

00:05:53.039 testing and then there's a release and that's really when the

00:05:58.240 the chickens come home to roost and you find out so this is the entire life cycle of

00:06:04.240 what's happening inside that feedback cycle from do a thing to see what happens and

00:06:10.400 that area under the curve that's all risk so clearly the longer this cycle goes

00:06:18.720 the more risk we're incurring by the way this talk has a bunch of

00:06:24.639 digressions here is one of them let's talk about schrodinger's cat why on earth am i talking about schroedinger's cat well first

00:06:30.800 you may have heard of schrodinger's cat let me do a quick very quick summary

00:06:36.240 this is a thought experiment that was proposed by schroedinger i believe in a letter in 1935 to einstein i might have

00:06:43.360 the details a little bit wrong but there was the copenhagen school of thought in in quantum

00:06:49.759 mechanics that said that well it's all probability waves and so there is the potential to have superimposed

00:06:55.840 probability waves in which two states coexist at the same time and until there's an observation that is made

00:07:01.840 those probability waves do not collapse and schrodinger was trying to say

00:07:07.440 let's let's bring determinism back to physics let's take a completely absurd absurd

00:07:13.520 thought experiment that starts with one cat we have a box it is a sealed chamber

00:07:18.880 said sealed chamber has some some poison gas or a

00:07:24.000 cyanide gas that cyanide gas will be released based on completely random

00:07:29.280 decomposition decomposing of a radioactive isotope

00:07:35.280 that it has a half-life of time t so at the end of time t there's a 50 50 shot

00:07:40.720 that said radioactive isotope has decayed if it has decayed the cat is dead because the hammer hit the flask

00:07:46.639 and the flask has released the poison gas but if it has not decayed again 50 50 shot

00:07:52.080 uh the cat's a little alarmed um 50 50 shot that the cat uh you know could be alive if it did not

00:07:58.479 decay so uh this was a thought experiment and the

00:08:04.240 what he was trying to say is this is clearly absurd i'm not a physicist but it's my understanding that yeah

00:08:09.280 basically the probability waves collapse at the moment of observation and the cat is either alive or dead at that point

00:08:14.560 but until then probability waves still exist why why would i be telling you a story

00:08:20.800 about physics so let's talk about schredinger's release until the moment that we release we do

00:08:27.840 not know if it's alive or dead we don't have the empirical evidence we haven't made the observation and the

00:08:33.120 release exists in two states simultaneously which you uh may have enjoyed that that

00:08:39.599 possibility when you go into a status meeting and there's wild optimism on one

00:08:45.519 side and tremendous pessimism on another that is an example of these probability waves being superimposed and until we

00:08:52.320 actually get empirical evidence we don't know

00:08:58.320 so until you observe in the wild you're speculating

00:09:03.760 now in theory agile solves all this for us agile made the world perfect right 20 years old

00:09:10.000 uh-huh yeah thank you penelope for those of you in the back in case you couldn't hear uproarious laughter

00:09:17.920 so let's talk about agile there's a lot we could talk about about agile and i'm not going to i just want to focus on one

00:09:24.080 aspect of it in theory you're shipping very frequently

00:09:29.440 and in fact in my experience that when we do all of the disciplines of agile and do them well we're able to ship with

00:09:36.640 confidence much more frequently and in those circumstances what happens to that risk curve it gets so much smaller so we

00:09:43.440 get empirical evidence and then we can steer when we discover that we made invalid assumptions we can steer towards

00:09:50.320 value we can steer away from risk so iteration after iteration shipping after

00:09:56.399 shipping we're able to control the risk that is the theory of agile

00:10:02.320 however let's talk about the reality

00:10:07.839 even if you work on software as a service where you have the ability to deliver multiple times a week and it

00:10:15.440 goes in front of real users there still can be

00:10:21.200 a a difference a thing that we are not yet getting

00:10:26.320 empirical evidence about because you're putting it behind say a feature flag right so even if you're actually

00:10:32.399 delivering multiple times a week not everybody works on software as a service

00:10:37.600 i've shipped enterprise products where we could have a notion of release a bull

00:10:42.640 but we could not actually release as frequently as we could create a releasable artifact

00:10:48.959 and in those cases you definitely have these longer periods where you're not getting the empirical evidence

00:10:54.240 and particularly in those environments here's the thing that i've seen happen

00:10:59.360 over and over and over again here's a theoretical line this is how much we would like to test

00:11:07.040 that's all of the the system tests the unit tests the the performance tests

00:11:12.399 everything that we need to gather information about because a test is a thing that gets you information about

00:11:18.560 the behavior of the thing that you're producing that we theoretically want to do all of this

00:11:26.320 and um unfortunately we don't quite get there and sometimes it's a very conscious

00:11:32.240 decision sometimes it's a conscious decision that says it is so expensive for us to do our

00:11:37.440 performance tests it requires such a large environment that is realistic and

00:11:42.560 there is contention for that environment across multiple different projects in the same organization

00:11:48.160 therefore we're going to schedule that for the end sometimes it's a

00:11:53.600 less good reason for scheduling it for the end like the performance testing is a pain in the butt and i don't want to do it so we'll

00:12:00.079 just wait until later hence the whole xp mantra if it hurts do more of it but in any case there is a

00:12:07.040 gap a gap between what we actually the the information we actually get the

00:12:13.360 feedback we actually get and the ideal state and then what happens

00:12:18.880 iteration after iteration we're delivering and there's that gap

00:12:26.399 and what does that look like well that gap is speculation we're

00:12:31.839 speculating that it'll be fine there wasn't going to be that much information to discover in that gap

00:12:41.519 penelope i told you i wasn't going to make it better right okay

00:12:47.120 so so the speculation buildup happens until we get to the final ta-dah when we

00:12:53.279 actually release stuff and guess what the area under that curve is risk which is why it is much easier to do fragile

00:12:59.839 than agile

00:13:07.040 okay so i and yeah if you wait for that in the wild feedback so i've been making this

00:13:13.040 whole case that the in the wild feedback is the only valid empirical evidence that tells you

00:13:18.079 whether or not what you released is any good but if you wait for that oh you've

00:13:23.600 waited way too long it's way too risky there is a very high probability there's a wonderful wonderful talk in here

00:13:30.320 yesterday on the 737 max story you end up with a 737 max class disaster if you

00:13:37.760 wait and for in the wild feedback so part of what i'm hoping that you will

00:13:43.760 be thinking about is the different levels and types of feedback that you could be getting

00:13:50.399 a unit test answers a very specific type of question as a programmer did the code that i write

00:13:56.720 do what i intended it to do and if you have a comprehensive suite of unit tests without violating the

00:14:03.279 expectations that any of the other code already had in the system that tells you nothing at all and i mean

00:14:11.360 literally nothing about the overall behavior of the system from the user's perspective

00:14:18.160 i one a long time ago one of my consulting clients i used to consult

00:14:24.240 one of my consulting clients brought me in for a project that was three years into

00:14:29.440 a three-year schedule and they had not yet made it to formal qa and they felt that they needed help

00:14:34.880 with testing and quality and at the time that was what i was most often brought in to do and they said well

00:14:41.440 but we've got all these calm objects this was way back in the days when that was a thing that was the old micro services

00:14:47.680 okay so we've got all of these calm objects and we've tested them all so we should be fine right

00:14:55.600 oh yeah that that project by the way i was only involved for a for a short period of time but i later heard it was

00:15:01.760 five years the three year schedule ended up being five years because of late breaking surprises so

00:15:07.440 there's all these different types of feedback um the ci system is giving you

00:15:12.639 information about it running presumably in the different environments or configurations that it needs to run in

00:15:17.920 you probably don't run every permutation of that locally so you probably have a ci pipeline there is

00:15:24.240 probably somebody who is doing what i'm calling acceptance testing here and what i mean by that is somebody who is

00:15:30.320 accepting that the work represents the value that it was intended to represent so i i don't mean

00:15:36.880 acceptance tests like cucumber tests i mean like there is a say product manager who asked

00:15:43.040 for a thing and they're saying yes i got what i asked for so that's a different type of feedback

00:15:49.839 stakeholder feedback are we going in the right direction user feedback is the ultimate did did we deliver the value

00:15:56.240 that we intended to and for each of these different levels of feedback there is a different cycle

00:16:02.079 time a different natural cycle time your unit tests seconds to minutes

00:16:07.839 technically if you're running in minutes they probably aren't unit tests but that's

00:16:13.600 seconds to minutes is is not that bad uh integration systems yet yes somebody

00:16:18.720 groaned i'm with you but let's recognize the amount of legacy

00:16:24.560 software out there and give people a pass okay um integration systems uh ci tests

00:16:30.959 minutes to hours probably those acceptance tests it probably takes hours to potentially even days before

00:16:37.680 that person who is accepting the work whether they're a product manager or they are a qa person before they

00:16:44.399 actually take a look at the thing um stakeholder feedback can take days to

00:16:49.920 weeks and the user feedback that can potentially take years even if you're releasing very frequently because

00:16:55.680 if you work on enterprise software it could take years before your customer gains to

00:17:02.839 upgrade a reality of life in that context all right so let's talk about some

00:17:09.120 cautionary tales

00:17:15.120 huh but wait first in digression that is a

00:17:20.640 fruit fly anybody know why i've got a fruit fly on my slide short generational lifespan

00:17:29.919 they are awesome for science experiments because the generational the love span of a fruit play is like 50 days or

00:17:36.640 something but you get new generations every 10 to 12 days give or take so you can do longitudinal studies with

00:17:43.760 multiple generations in the span of weeks to months sweet

00:17:49.520 hold that thought let's talk about code reviews

00:17:57.039 so this is a cautionary tale of a team that i was involved with uh

00:18:03.120 at the time that i got involved with this this team uh this this

00:18:08.559 project uh this was their process it was fairly traditional it's one that you see in a lot of places it was a pull request

00:18:16.160 based process uh a loan developer ta-da developer

00:18:22.640 writes a whole bunch of code this was in an environment where individuals were incentivized to push

00:18:28.880 their features through that's how you got a promotion individual ownership of things so

00:18:35.039 the loan developer would do all of the brilliant work and then

00:18:40.960 when they felt that they were ready to have their thing reviewed it was at that

00:18:46.480 point a whole thing they would run the tests locally that took about an hour because it was all of the tests and

00:18:53.039 the tests were really slow and then they would check in on a branch and this was a garrett-based flow

00:19:00.240 i am not here to bash garrett it is a tool if you are in an environment where you need something

00:19:06.240 that automates the workflow of pull requests code reviews

00:19:12.640 i actually know nothing about how it is now this was years ago it is a tool but

00:19:18.000 i will also tell you this is a this is a preview of what's coming i took tremendous pleasure in

00:19:24.320 ripping garrett out of this process uh in any case you check in on a branch

00:19:29.600 ci would then run a set of unit tests taking about 10 minutes and then your pull request is

00:19:36.559 now sitting in a queue waiting for somebody to approve it and in this particular organization there

00:19:42.640 was a hierarchy there were people who were so junior they only had the ability to comment on

00:19:48.960 prs there were people who had earned a plus one they could give you a point

00:19:55.760 and you needed to get two points so you could get two of them or you could get one of the very few people

00:20:03.280 in the organization who had been ordained with a plus two

00:20:08.559 now as you might imagine in an environment like this uh who you knew

00:20:14.559 uh was kind of the whole the whole thing so if if you were one of those plus twoers and you needed your

00:20:21.120 code reviewed you just kind of uh nudged your your buddy and they would do the

00:20:26.559 code review and your code would get in and if you were low on the totem pole so

00:20:32.799 to speak i'm really sorry i used that phrase in any case if you were

00:20:38.480 um if you were a of lower status

00:20:44.480 your pr could wait for a very long time especially if it wasn't considered

00:20:50.320 critical which is why one of the saddest stories that i took away from that

00:20:55.360 particular experience was a junior developer who expressed to me her tremendous

00:21:02.000 frustration because she had a very small like couple lines of code change

00:21:07.280 that she couldn't get in for an entire week and during that week the there were other commits that were getting merged

00:21:15.120 and so she was constantly rebase rerun all the tests resubmit the pr

00:21:20.159 for an entire week for a few lines worth of change she didn't get anything else done that

00:21:25.280 week how incredibly demotivating worse that meant that she had fewer

00:21:32.400 opportunities to get code merged because through no fault of her own she's waiting for somebody to get around to

00:21:38.880 dane to review her pr okay so then it would merge to maine ci

00:21:44.640 runs the full set of tests this was the entire set of things that had to happen to get a change all the way through

00:21:50.799 and that process could take a day if you were one of those plus tours who got to

00:21:55.840 kind of jump the queue but it could take potentially weeks and in fact at the point where we turned off

00:22:02.000 garrett there were still pr's that were sitting there that were essentially abandoned they were stale nobody was going to go

00:22:08.320 through and update them whatever changes they represented either got lost in the sands of time or got

00:22:13.760 subsumed had been made in something some other change that did get pulled in

00:22:19.039 so we made a change we made a process change now i will note that uh the only reason this process

00:22:26.240 change was possible was that we had support all the way from the top this was not a universally

00:22:31.440 popular change there was support at a grassroots level for this change there were people who were so

00:22:37.760 happy and grateful that they were going to be able to move faster but not everybody felt that way

00:22:43.520 and i feel i need to be honest about the fact that this was not universally popular i still in hindsight this this

00:22:50.400 all was happening many years ago and in hindsight knowing how the rest of the story then evolved this was the right

00:22:55.919 decision because we were able to go so much faster after that we were able to

00:23:01.600 introduce so much more innovation the process went to okay we don't do prs we

00:23:06.799 pair on code that's how you get another set of eyes we all agree that it's a good and healthy thing for this project

00:23:12.320 to have multiple sets of eyes on things furthermore what's not represented here is we we practice collective code

00:23:18.080 ownership a team owns the code base there is no my feature your feature and consequently since we're pairing and

00:23:24.720 we're rotating pairs frequently everybody ends up touching that code so you're getting more than two sets of eyes on any given thing over a long

00:23:31.520 period of time but two is enough to get it merged into the code base for local tests we only run the fast

00:23:38.320 tests and yeah they took 10 minutes so yeah they technically weren't unit tests

00:23:43.520 but that was so much better than sitting there for an hour waiting for your tests to be done so the

00:23:48.720 fast tests we run locally then we check in we lived on main we merged to maine

00:23:55.039 and then ci would run the full set of tests and yeah sometimes stuff broke and then we would fix it but because we were

00:24:00.080 working in tiny tiny increments fixing was pretty quick too and so

00:24:05.279 now we could get changes in minutes to hours before they got merged in so when you think about your process and

00:24:12.320 the latency that it introduces the weight states that it introduces remember the fruit flies

00:24:19.919 we want to be able to get so many more cycles that that's what this kind of process change can give you

00:24:27.840 okay now let's talk about branching strategies batch sizes and latency

00:24:33.200 my personal preference is to live on main i recognize that's not possible for everyone but if you are

00:24:39.520 able to do this in your context i it means that yeah if you're using git

00:24:45.919 that local copy that i have is essentially a branch but i'm making a few changes get to green

00:24:52.000 make it clean and then check it in everybody else is doing that as well and so at any given moment in time the

00:24:58.720 amount of inventory work in progress that's sitting out unchecked in is a fairly small amount

00:25:06.080 so we don't get a whole lot of churn we certainly don't have the experience that that poor junior developer who

00:25:12.880 incidentally wasn't actually that junior that was just the status they had been given

00:25:18.400 but that poor junior developer had of waiting an entire week to get something merged because of the amount of churn

00:25:23.440 that was happening and they're anyway moving on

00:25:28.799 feature branching also very common i am not here to argue with you about whether you should live on main or should do

00:25:33.840 feature branching there are trade-offs there are good reasons to do both it does introduce the challenge that now

00:25:40.480 you've got uh larger batch sizes and so you can merge less frequently and so if

00:25:46.000 you notice that although this looks like a nice neat little diagram and it feels all warm and fuzzy that it's not quite

00:25:53.440 that simple and the way that that shows up in this diagram is if we flip back and forth between the two we can see

00:25:59.039 that we're getting fewer merges to main so that means that the generation cycle that that that uh life cycle is that

00:26:05.919 much longer and that introduces what do we know about that area under the curve risk more risk

00:26:12.320 so far so good okay that was all the background to the cautionary tale

00:26:18.720 let's talk about a process that i do not recommend anybody anybody anywhere ever do under any

00:26:26.600 circumstances but in this organization and i i have to confess i did not see it at the moment

00:26:32.559 that it was happening my involvement happened some number of months later i came in and i was still hearing the

00:26:38.640 echoes of the screams of pain from this particular situation

00:26:43.760 uh with a long-running team branch it started off simply enough we have a main and the organization had decided for

00:26:50.880 reasons that i am sure made sense at the time that every one of the teams would have

00:26:57.279 their own team branch i understand that in hindsight this does

00:27:04.400 not seem like a good idea i'm hearing some of you laugh but i am willing to believe that they were doing the best

00:27:10.080 they knew how given everything that they knew at the time it still didn't turn out so well um but

00:27:17.360 i'm getting ahead of myself let me go through the rest of the story so and you know it now the team is

00:27:23.200 treating the team branch kind of like you would treat maine in a feature branch scenario developer a developer b

00:27:30.559 developer c they're all working on stuff developer a is is working on their feature they now merge back to the team

00:27:36.720 branch develop and then you know start another feature uh and then of course in the meantime

00:27:42.320 maine is changing and they did have a plan for rebasing off of maine onto the team branch

00:27:49.039 uh but it was way too easy for a developer to just ignore everything that was uh

00:27:54.880 i'm really sorry you're completely having a reaction to this um it was

00:28:01.600 i am so sorry um developers see

00:28:08.159 tries to merge back discovers that of course they have ignored the fact that the world was changing around them

00:28:14.000 uh and so they now have to okay in the meantime developer b has

00:28:20.720 been cranking away on their feature and the world has been moving on

00:28:32.080 this is in fact what happened um the really sad part is an entire team

00:28:40.000 threw away six months of work can you imagine

00:28:48.080 oh i'm really sorry i'd better move on some of you are having very strong reactions

00:28:54.240 let's talk about test pollution oh it's not getting better is it

00:29:00.559 i warned you did i not warn you up front by the way our spec is lovely thank you so much

00:29:13.120 the state of javascript testing is a different problem which incidentally i just as an aside um

00:29:19.679 my new thing is it's not stealth it's just i'm not quite sure what it's going to grow up to be

00:29:25.520 it's curious duck digital laboratory i am building a simulation game thing uh

00:29:31.600 it is uh the the back end is a pure ruby gem yay gems um and massive shout out to davis frank

00:29:38.880 who convinced me to make it a gem i and that has had massive uh payoff

00:29:46.240 um i but the front end is a rails front end with stimulus for javascript and so i

00:29:52.000 spent like six weeks getting my mison plus on plus for javascript testing

00:29:57.440 so incidentally if anybody wants to talk about that stuff i'm happy to show you what i've done get your feedback that is

00:30:02.799 actually not the point of the test pollution although there is a reason why i am mentioning this because we'll talk about

00:30:08.799 it with pollution first let me explain what i mean by that now your your feedback cycles the information you get

00:30:15.919 back can be polluted in a variety of ways what does pollution mean it just means that we don't trust the information that

00:30:21.840 we're getting back if you mix opinion in with empirical

00:30:27.039 evidence you now have a polluted stream that's one example of pollution

00:30:35.520 um another big source of pollution uh let's see if this feels familiar to y'all

00:30:41.840 you're a new person on a project you start doing some work uh you run the tests or the tests are

00:30:48.320 running in ci whatever and you see failures and you go wait those failures do not appear to have

00:30:53.919 anything to do whatsoever with the stuff that i changed what happened and

00:30:59.120 your your new buddy the person who's helping to onboard you says don't worry that's fine they do that just kick it

00:31:05.200 again and you kick it again

00:31:10.240 i hope i still have i hope i still have friends after this talk cause

00:31:15.840 i'm watching some of y'all's faces and i'm really worried

00:31:20.880 okay so you kick it again and sure enough it's a different set of tests that are failing and you kick it again

00:31:27.360 and you kick it again okay y'all have seen this

00:31:37.120 so um one one group that i got in involved with was working with a legacy system

00:31:43.919 that was massively distributed and had a whole lot of

00:31:50.240 threading and parallelization and very difficult to find race conditions and i

00:31:56.000 they had a very long long history so long that there had historically been a very talented and skilled qa group that

00:32:02.880 has built an incredibly sophisticated test harness but because the test harness had a tendency to expose things

00:32:09.200 that were both real and also not something that would be real the uh

00:32:15.200 developer team that did not feel any sense of ownership whatsoever over those tests or that test harness had a

00:32:20.720 tendency to just discount those results until somebody went through and

00:32:25.799 painstakingly did the analysis to discover whether or not that was real information

00:32:31.760 and in this environment we attempted to reduce our cycle time which meant

00:32:37.200 developers had to own the tests but developers had no intention of owning that set of tests but that set of tests

00:32:42.559 was the only set of tests that was giving us real information about the system so as we reduced our cycle time

00:32:48.080 we ended up increasing risk and at that point i was responsible for the the group i i've i've held all sorts of

00:32:54.960 rules and in this case i was a vp of r d and i did a really terrifying thing i i

00:33:01.039 pulled the big red cord i said we're not shipping any more features until we clean this up and i had expected that it

00:33:08.320 would take a few weeks of a concerted effort with everybody all hands on deck everybody cleaning this up

00:33:15.200 i was wrong it took months and i held my ground and i'll just know

00:33:20.480 that even in in the level of authority that i had within that organization it was a scary thing it takes an enormous

00:33:26.480 amount of intestinal fortitude to say no we're not going to write new features

00:33:31.840 until we can trust that our tests are giving us information that that we that we can believe it was not a popular

00:33:39.440 decision with sales what a shock um the product managers um who incidentally

00:33:45.519 reported to me they were not happy with me the developers who reported to me were not happy with me some of them were

00:33:51.039 but some of them were just of the opinion that um this was unfair that

00:33:56.080 they had to clean up this mess that they hadn't made and that they would much rather be developing new features and so this

00:34:02.960 was not a popular decision and yet our customers and support organization

00:34:08.320 needed me to make that decision because we were shipping objectively worse software with every

00:34:15.040 increment that we delivered so this is a very difficult situation that is the cautionary tale by the way

00:34:21.919 if you don't clean it up while it's a small mess if you wait until it's a superfund site

00:34:28.320 you're gonna end up in a situation where you have to make the excruciating decision between stop the line and don't

00:34:34.639 do anything but clean up the superfund site or be at a very serious risk of shipping a

00:34:39.919 product that frankly doesn't doesn't meet the value proposition that it's supposed to

00:34:46.839 okay so then and i noticed that i'm running a little short on time so i will tell the abbreviated version of this

00:34:52.800 cautionary tale what happens when you have both pollution and you have delayed feedback cycles and

00:34:59.680 that was this project i was again a vp and i had a peer who

00:35:05.280 was another vp who came to me and said what are you doing

00:35:10.960 what do you mean well your group theoretically ships software and has been unable to ship software so your group isn't doing what

00:35:17.280 your group is supposed to do what are you doing now i knew we had challenges but

00:35:23.200 let's just say that that was a galvanizing conversation in which i decided it was time for me to understand

00:35:29.440 in depth so i started asking and we ended up i interviewed a whole lot of the

00:35:35.599 individual contributors who were on that project we would stand in my office and i would

00:35:40.880 ask them help me understand again from the moment that a developer has something ready that theoretically is

00:35:46.640 ready to ship to the moment that we actually ship it what do we do again tell me again no tell me like i'm five

00:35:52.720 and we we mapped out collectively through the series of conversations some of which happened one-on-one and some of

00:35:58.400 which happened in groups we ended up on with a diagram on my whiteboard that that lived there for months that showed

00:36:05.839 the pipeline and it showed the following information about the pipeline so the first stage was there was a build step

00:36:11.760 this was shipping enterprise software first stage is there's a build step and then some set of fast tests run and then

00:36:18.160 if that's green it goes on to the system tests that were in a fan out across multiple

00:36:23.680 configurations and environments as you do um and then there was a final final

00:36:29.760 final packaging step of some kind so basically four stages and okay well how long does each stage

00:36:35.200 take that first build step a few minutes some of you are laughing in anticipation

00:36:40.880 um i that first step takes a few minutes the second step takes a few minutes uh

00:36:47.440 that third step that could take anywhere from four hours

00:36:53.520 to over 24 hours that's a lot of variation why what what

00:36:59.680 is different between the four hour runs and the 24 hour runs well it turns out what i i learned it took a long time

00:37:06.720 because nobody had ever really looked at this in in this particular framing in this way until we all got together

00:37:13.760 because everybody had one piece of the puzzle and we were assembling a jigsaw but the spoiler is the reason ultimately

00:37:20.800 that the things sometimes took a lot longer was because we were waiting for a lock

00:37:26.240 on an environment in a pool of very restricted environments

00:37:32.400 and there was a fair amount of at the peak of activity right before a theoretical release there would be a lot

00:37:38.160 of contention vying for one of those coveted locks on

00:37:43.280 an uh on those environments so there could be a very long wait state

00:37:48.480 so that's that is the delayed feedback piece right that thing was causing a weight

00:37:55.200 state that could be variable length but but end up being very very long okay so the next question i had to ask

00:38:01.599 was well how often are things failing and and let's talk about pollution why

00:38:07.280 are they failing when when tests fail have we learned anything new or were they failing for spurious reasons

00:38:14.560 and um well the build never failed we always got something out whether or not

00:38:19.760 it worked was a different question but the build never failed the fast tests almost never failed the failures were mostly occurring in the

00:38:26.880 system tests and they were flakes we had a burgeoning superfund site

00:38:34.800 uh so the solution to this incidentally we we had gotten to the point where we were just wedged we

00:38:41.200 kind of like a car spinning in sand um we were struggling to to ship

00:38:47.040 um and so the solution ended up being to focus on doing two things

00:38:52.560 one was to reduce the amount of time that the tests took and to reduce the weight states by trying to increase the

00:38:59.280 number of those very coveted environments that were available but also reducing the contention by

00:39:05.280 reducing because it turns out that if when the tests fail that thing ends up going through that

00:39:10.800 whole cycle again so that increases the amount of contention for those locks

00:39:16.480 so we needed to reduce the flakiness in the test suite until the point where when it failed it was telling us

00:39:22.400 something real so if you are facing a potential

00:39:28.079 superfund site here are some strategies that you can try one is to just separate the streams you've got blocking and

00:39:33.760 non-blocking caveat this only works if

00:39:39.839 you trust that you have sufficient coverage in the blocking tests to tell you whether or not

00:39:46.000 it to tell you about the risk in the software that was not the case in the story that i told of the massively

00:39:51.359 parallel uh we didn't have enough test coverage we did not know what we were shipping we

00:39:57.359 were definitely shipping schroedinger's releases all over the place um if part of the reason why flakiness

00:40:03.760 is persisting is that sense of tragedy of the commons that nobody feels a sense of responsibility

00:40:09.920 or ownership or maybe even agency to clean it up getting cross-team

00:40:14.960 partnerships going can help tremendously and then just carving out time on a regular basis which i recommend

00:40:20.960 even for a problem projects that don't have these problems yet i recommend

00:40:26.720 carving out time in some way shape or form you may not need to do entire tidy tuesdays or whatever you choose to do

00:40:33.280 uh maybe just every time somebody finishes a feature there's an expectation that if there are

00:40:38.400 some flakes or mysteries in the code base that that's the next thing they tackle before they tackle something else

00:40:46.400 and then the other thing i strongly recommend is reducing test execution time here's where i get to tell a story that

00:40:52.240 um incidentally this is this is for aaron tender love are you here

00:40:57.839 okay well this is being recorded so just so you know this entire story is for him

00:41:04.880 uh which the reason why we'll become a parent momentarily

00:41:10.400 so once upon a time we were struggling with long build times or long long test times and i was uh sorting encouraging

00:41:17.760 teams to think about shortening their test times it kind of wasn't working i so i i am not above bribery

00:41:25.839 um at one point i basically said hey tell you what you lop

00:41:31.280 an hour off of that test cycle time and i'll bake you a pie

00:41:37.040 and the person i was talking to said i like pie and i said well it turns out i'm really

00:41:43.040 good i'm really good at making pie these are real pies that i made

00:41:51.760 i'm pretty good at pie so he lopped an hour off the next day i

00:41:58.560 brought him a pie the other team members said i like pie

00:42:04.880 and thus we formed a tradition you lop a sufficiently large amount of

00:42:10.240 time off of that very very very long cycle time you get a pie and it can be it can

00:42:16.000 be a collective pie it doesn't have to be a heroic individual effort

00:42:21.359 but you get a pie because we want to reduce the time in our pipelines

00:42:35.359 you now see why this was dedicated anyway let's bring it home

00:42:42.240 uh healthy feedback loops the things we have been talking about

00:42:47.280 have to do with seeing to so to see to the care and feeding of your feedback loops you want to make sure you keep

00:42:53.119 them tight keep them short watch those weight states make them as short as you possibly can

00:43:00.000 given the context and those constraints therefore that you live within but attend to the the time in the feedback

00:43:06.720 cycle think about the fact that you have these multiple levels too often i have had developers argue

00:43:12.560 with me that the unit testing is a waste because it's just going to get tested at the system level anyway so why bother

00:43:20.240 i'm so glad i'm speaking to a community that doesn't buy into that like modulo javascript which we can talk

00:43:26.560 about separately um uh and then keep them clean keep the

00:43:32.000 pollution out of your feedback cycles and i want to introduce you to one more feedback cycle

00:43:38.000 this is yet another one should look very familiar it's kind of like the others that we talked about but this is the cold learning cycle

00:43:44.400 it turns out that you know experiment experience observe

00:43:50.319 and reflect and then abstract the lessons learned that is also a feedback cycle so in short every feedback cycle

00:43:57.359 is a learning cycle and the more of those feedback cycles you get the more you get to learn

00:44:04.160 which is why i say there is no failure there's only learning but i'll also tell you that there are some weeks when i do

00:44:10.079 a lot of learning all right i am down to 55 seconds on the

00:44:15.839 clock which means that i don't think we've got time for q a i am here all day though love to talk about this stuff

00:44:22.000 we'd love to talk with y'all thank you so much for having me and thank you for laughing at my jokes