List

Keynote: On the Care and Feeding of Feedback Cycles by Elisabeth Hendrickson

Keynote: On the Care and Feeding of Feedback Cycles by Elisabeth Hendrickson

by Elisabeth Hendrickson

The keynote speech by Elisabeth Hendrickson at RubyConf 2021 focuses on the significance of feedback cycles in software development. Hendrickson breaks down the essence of feedback into understanding empirical evidence—what works and what doesn't—through various feedback cycles like Deming's Plan-Do-Check-Act and Agile approaches. She emphasizes the risks associated with delayed feedback, which can lead to unnecessary speculation and inefficiencies in development processes.

Key points discussed include:

- Understanding Feedback: Feedback is the empirical assessment of actions taken, distinguishing valid data from mere opinions.

- Feedback Cycles in Software: The life cycle of software projects is laden with speculation, requiring systematic feedback through testing and iterations to minimize risks.

- The Importance of Release Feedback: Relying solely on in-the-wild user feedback can lead to disastrous outcomes if not approached cautiously.

- Types of Feedback: Different types of feedback are highlighted, including unit tests, system tests, and user feedback, each with varying cycle times and importance in the development process.

- Cautionary Tales: Hendrickson shares various cautionary tales from her experience, including the drawbacks of long pull request processes, the ramifications of not addressing test pollution, and the importance of developer ownership in maintaining the quality of tests.

- Improving Feedback Loops: Strategies to improve feedback loops are proposed, emphasizing the need for shorter cycles, addressing pollutions in feedback, and fostering a culture of continuous improvement.

- The Learning Cycle: The seminar concludes with the notion that every feedback cycle offers learning opportunities, turning failures into growth experiences.

Hendrickson underlines that by nurturing feedback mechanisms and reducing latency in responses, teams can enhance their agility and deliver higher-quality software products more confidently. The session wraps up by asserting that there’s no failure in the software process, only learning, thus encouraging an iterative and reflective approach to software development.

RubyConf 2021

00:00:10.719 ah i'm gonna confess this is a little bit surreal for me um even without kovid i
00:00:18.560 wasn't getting out a whole lot uh so in 2019 i don't believe that i did any
00:00:25.599 in-person live conference events like this so it's been probably at least three years
00:00:31.199 i'm a little nervous so i hope this all works for all of us but it is so good to see you all in
00:00:37.680 person and i'm so grateful to be here and to see you
00:00:43.040 so let's let's talk about feedback i this um
00:00:49.200 where to start well better just get started this is talking four parts first we're going to talk about the
00:00:54.559 nature of feedback in sort of an abstract way we'll go through that fairly quickly then we're going to talk about software and how it applies to
00:01:01.280 software then i'm going to tell some cautionary tales and then i'm going to bring it home so talk in four parts
00:01:09.040 let's start with just what is feedback it is the simplest thing in the world you do a thing you see what happens
00:01:15.040 you get empirical evidence that tells you if the thing that you did had the
00:01:20.960 effect that you intended and the empirical evidence part is the super important part of that
00:01:26.640 because opinions are not actually feedback if they are not giving you concrete information about the effect of
00:01:34.320 the thing that you did now we have a lot of very fancy ways of talking about feedback loops there's the
00:01:41.520 deming cycle plan do check act it's just a feedback cycle we plan what
00:01:46.799 we're going to do then we do it then we check to see how it went and then we act on the information that we got because
00:01:52.799 if you get a lot of feedback and then you do not act on it well that's a problem
00:01:59.040 um there's the ooda loop john boyd gave us the ooda loop of observe orient yourself
00:02:05.759 check act that's another feedback cycle there's lean startup build measure learn
00:02:12.080 build measure learn and of course the idea here is build the smallest amount possible so that you can test your
00:02:18.800 hypothesis it's basically the scientific method we're forming a hypothesis about what the market wants or what would be
00:02:25.680 successful for the outcomes that we're attempting to achieve and then we're going to
00:02:31.599 design that experiment the cheapest experiment we possibly can and then we're going to observe what happened
00:02:36.640 when we ran that experiment and then that's going to lead us to form our next hypothesis so it's just all feedback
00:02:43.280 so far so good sweet let's talk about software
00:02:49.200 this is where things get a little bit complicated um once upon a time when i gave variations of this talk by the way i've
00:02:55.599 given this talk a lot of times it's slightly different every time this time for reasons i still don't understand
00:03:01.200 myself i decided to rebuild all of the slides so you're looking at an entirely new
00:03:06.319 deck even if it looks a little bit familiar uh i will also say this is the first time
00:03:12.159 i'm giving this talk presenting from google slides this is not a google ad but i will say that the world has gotten
00:03:18.959 sufficiently different that i no longer feel the need to have everything local on my computer the internet is actually
00:03:24.159 ubiquitous plus also the whole download work offline mode actually works anywho
00:03:29.760 um so let's talk about every software project ever whether you work in an agile way and you're shipping frequently
00:03:37.440 or you work on big honking things that ship every five years
00:03:42.640 still yes some of you are laughing hi i'm an old that was the way things worked
00:03:51.360 been in the end wrote my first line of code in 1980 so
00:03:56.480 every software project ever at the start you're analyzing some stuff now that
00:04:02.000 might be an analysis phase or that might simply be doing a little
00:04:07.439 bit of user research to understand what we want to build the the problem really this is about analyzing the problem that
00:04:13.680 we want to solve this is the stage at which we're forming some hypotheses and then we're doing some design and
00:04:20.400 that might be designing a ui it might be capital a architecture but we're doing some design
00:04:26.560 and we're implementing now notice there's this curve that i haven't talked about yet that's the speculation curve because at
00:04:33.520 each stage we're speculating we are building assumptions on top of
00:04:40.080 assumptions we're speculating that we understood the problem to be solved that when the business analysts went to
00:04:46.240 gather requirements from the corporate stakeholders that they actually understood the corporate stakeholders
00:04:51.680 that the corporate stakeholders were actually saying um what their needs were as opposed to offering solutions
00:04:58.400 packaged as needs that weren't going to solve the problem that they actually had that never happens
00:05:04.639 um so we're speculating at each stage we're speculating that our design will in fact
00:05:10.000 solve the problem that we identified we're speculating that our implementation
00:05:15.120 is going to work that the so at each step we're building assumptions
00:05:22.000 uh and then we end up iterating whether you were doing waterfall or agile you still end up iterating you'll notice the
00:05:27.759 slope of that curve is starting to come down because now we do get some empirical evidence
00:05:34.320 then we do whatever is final testing if you worked in a very traditional kind of way you might have an entire qa
00:05:40.000 department doing the final test cycle if you work
00:05:46.160 in a very agile way you might be putting things onto the staging server whatever it is you're doing some form of final
00:05:53.039 testing and then there's a release and that's really when the
00:05:58.240 the chickens come home to roost and you find out so this is the entire life cycle of
00:06:04.240 what's happening inside that feedback cycle from do a thing to see what happens and
00:06:10.400 that area under the curve that's all risk so clearly the longer this cycle goes
00:06:18.720 the more risk we're incurring by the way this talk has a bunch of
00:06:24.639 digressions here is one of them let's talk about schrodinger's cat why on earth am i talking about schroedinger's cat well first
00:06:30.800 you may have heard of schrodinger's cat let me do a quick very quick summary
00:06:36.240 this is a thought experiment that was proposed by schroedinger i believe in a letter in 1935 to einstein i might have
00:06:43.360 the details a little bit wrong but there was the copenhagen school of thought in in quantum
00:06:49.759 mechanics that said that well it's all probability waves and so there is the potential to have superimposed
00:06:55.840 probability waves in which two states coexist at the same time and until there's an observation that is made
00:07:01.840 those probability waves do not collapse and schrodinger was trying to say
00:07:07.440 let's let's bring determinism back to physics let's take a completely absurd absurd
00:07:13.520 thought experiment that starts with one cat we have a box it is a sealed chamber
00:07:18.880 said sealed chamber has some some poison gas or a
00:07:24.000 cyanide gas that cyanide gas will be released based on completely random
00:07:29.280 decomposition decomposing of a radioactive isotope
00:07:35.280 that it has a half-life of time t so at the end of time t there's a 50 50 shot
00:07:40.720 that said radioactive isotope has decayed if it has decayed the cat is dead because the hammer hit the flask
00:07:46.639 and the flask has released the poison gas but if it has not decayed again 50 50 shot
00:07:52.080 uh the cat's a little alarmed um 50 50 shot that the cat uh you know could be alive if it did not
00:07:58.479 decay so uh this was a thought experiment and the
00:08:04.240 what he was trying to say is this is clearly absurd i'm not a physicist but it's my understanding that yeah
00:08:09.280 basically the probability waves collapse at the moment of observation and the cat is either alive or dead at that point
00:08:14.560 but until then probability waves still exist why why would i be telling you a story
00:08:20.800 about physics so let's talk about schredinger's release until the moment that we release we do
00:08:27.840 not know if it's alive or dead we don't have the empirical evidence we haven't made the observation and the
00:08:33.120 release exists in two states simultaneously which you uh may have enjoyed that that
00:08:39.599 possibility when you go into a status meeting and there's wild optimism on one
00:08:45.519 side and tremendous pessimism on another that is an example of these probability waves being superimposed and until we
00:08:52.320 actually get empirical evidence we don't know
00:08:58.320 so until you observe in the wild you're speculating
00:09:03.760 now in theory agile solves all this for us agile made the world perfect right 20 years old
00:09:10.000 uh-huh yeah thank you penelope for those of you in the back in case you couldn't hear uproarious laughter
00:09:17.920 so let's talk about agile there's a lot we could talk about about agile and i'm not going to i just want to focus on one
00:09:24.080 aspect of it in theory you're shipping very frequently
00:09:29.440 and in fact in my experience that when we do all of the disciplines of agile and do them well we're able to ship with
00:09:36.640 confidence much more frequently and in those circumstances what happens to that risk curve it gets so much smaller so we
00:09:43.440 get empirical evidence and then we can steer when we discover that we made invalid assumptions we can steer towards
00:09:50.320 value we can steer away from risk so iteration after iteration shipping after
00:09:56.399 shipping we're able to control the risk that is the theory of agile
00:10:02.320 however let's talk about the reality
00:10:07.839 even if you work on software as a service where you have the ability to deliver multiple times a week and it
00:10:15.440 goes in front of real users there still can be
00:10:21.200 a a difference a thing that we are not yet getting
00:10:26.320 empirical evidence about because you're putting it behind say a feature flag right so even if you're actually
00:10:32.399 delivering multiple times a week not everybody works on software as a service
00:10:37.600 i've shipped enterprise products where we could have a notion of release a bull
00:10:42.640 but we could not actually release as frequently as we could create a releasable artifact
00:10:48.959 and in those cases you definitely have these longer periods where you're not getting the empirical evidence
00:10:54.240 and particularly in those environments here's the thing that i've seen happen
00:10:59.360 over and over and over again here's a theoretical line this is how much we would like to test
00:11:07.040 that's all of the the system tests the unit tests the the performance tests
00:11:12.399 everything that we need to gather information about because a test is a thing that gets you information about
00:11:18.560 the behavior of the thing that you're producing that we theoretically want to do all of this
00:11:26.320 and um unfortunately we don't quite get there and sometimes it's a very conscious
00:11:32.240 decision sometimes it's a conscious decision that says it is so expensive for us to do our
00:11:37.440 performance tests it requires such a large environment that is realistic and
00:11:42.560 there is contention for that environment across multiple different projects in the same organization
00:11:48.160 therefore we're going to schedule that for the end sometimes it's a
00:11:53.600 less good reason for scheduling it for the end like the performance testing is a pain in the butt and i don't want to do it so we'll
00:12:00.079 just wait until later hence the whole xp mantra if it hurts do more of it but in any case there is a
00:12:07.040 gap a gap between what we actually the the information we actually get the
00:12:13.360 feedback we actually get and the ideal state and then what happens
00:12:18.880 iteration after iteration we're delivering and there's that gap
00:12:26.399 and what does that look like well that gap is speculation we're
00:12:31.839 speculating that it'll be fine there wasn't going to be that much information to discover in that gap
00:12:41.519 penelope i told you i wasn't going to make it better right okay
00:12:47.120 so so the speculation buildup happens until we get to the final ta-dah when we
00:12:53.279 actually release stuff and guess what the area under that curve is risk which is why it is much easier to do fragile
00:12:59.839 than agile
00:13:07.040 okay so i and yeah if you wait for that in the wild feedback so i've been making this
00:13:13.040 whole case that the in the wild feedback is the only valid empirical evidence that tells you
00:13:18.079 whether or not what you released is any good but if you wait for that oh you've
00:13:23.600 waited way too long it's way too risky there is a very high probability there's a wonderful wonderful talk in here
00:13:30.320 yesterday on the 737 max story you end up with a 737 max class disaster if you
00:13:37.760 wait and for in the wild feedback so part of what i'm hoping that you will
00:13:43.760 be thinking about is the different levels and types of feedback that you could be getting
00:13:50.399 a unit test answers a very specific type of question as a programmer did the code that i write
00:13:56.720 do what i intended it to do and if you have a comprehensive suite of unit tests without violating the
00:14:03.279 expectations that any of the other code already had in the system that tells you nothing at all and i mean
00:14:11.360 literally nothing about the overall behavior of the system from the user's perspective
00:14:18.160 i one a long time ago one of my consulting clients i used to consult
00:14:24.240 one of my consulting clients brought me in for a project that was three years into
00:14:29.440 a three-year schedule and they had not yet made it to formal qa and they felt that they needed help
00:14:34.880 with testing and quality and at the time that was what i was most often brought in to do and they said well
00:14:41.440 but we've got all these calm objects this was way back in the days when that was a thing that was the old micro services
00:14:47.680 okay so we've got all of these calm objects and we've tested them all so we should be fine right
00:14:55.600 oh yeah that that project by the way i was only involved for a for a short period of time but i later heard it was
00:15:01.760 five years the three year schedule ended up being five years because of late breaking surprises so
00:15:07.440 there's all these different types of feedback um the ci system is giving you
00:15:12.639 information about it running presumably in the different environments or configurations that it needs to run in
00:15:17.920 you probably don't run every permutation of that locally so you probably have a ci pipeline there is
00:15:24.240 probably somebody who is doing what i'm calling acceptance testing here and what i mean by that is somebody who is
00:15:30.320 accepting that the work represents the value that it was intended to represent so i i don't mean
00:15:36.880 acceptance tests like cucumber tests i mean like there is a say product manager who asked
00:15:43.040 for a thing and they're saying yes i got what i asked for so that's a different type of feedback
00:15:49.839 stakeholder feedback are we going in the right direction user feedback is the ultimate did did we deliver the value
00:15:56.240 that we intended to and for each of these different levels of feedback there is a different cycle
00:16:02.079 time a different natural cycle time your unit tests seconds to minutes
00:16:07.839 technically if you're running in minutes they probably aren't unit tests but that's
00:16:13.600 seconds to minutes is is not that bad uh integration systems yet yes somebody
00:16:18.720 groaned i'm with you but let's recognize the amount of legacy
00:16:24.560 software out there and give people a pass okay um integration systems uh ci tests
00:16:30.959 minutes to hours probably those acceptance tests it probably takes hours to potentially even days before
00:16:37.680 that person who is accepting the work whether they're a product manager or they are a qa person before they
00:16:44.399 actually take a look at the thing um stakeholder feedback can take days to
00:16:49.920 weeks and the user feedback that can potentially take years even if you're releasing very frequently because
00:16:55.680 if you work on enterprise software it could take years before your customer gains to
00:17:02.839 upgrade a reality of life in that context all right so let's talk about some
00:17:09.120 cautionary tales
00:17:15.120 huh but wait first in digression that is a
00:17:20.640 fruit fly anybody know why i've got a fruit fly on my slide short generational lifespan
00:17:29.919 they are awesome for science experiments because the generational the love span of a fruit play is like 50 days or
00:17:36.640 something but you get new generations every 10 to 12 days give or take so you can do longitudinal studies with
00:17:43.760 multiple generations in the span of weeks to months sweet
00:17:49.520 hold that thought let's talk about code reviews
00:17:57.039 so this is a cautionary tale of a team that i was involved with uh
00:18:03.120 at the time that i got involved with this this team uh this this
00:18:08.559 project uh this was their process it was fairly traditional it's one that you see in a lot of places it was a pull request
00:18:16.160 based process uh a loan developer ta-da developer
00:18:22.640 writes a whole bunch of code this was in an environment where individuals were incentivized to push
00:18:28.880 their features through that's how you got a promotion individual ownership of things so
00:18:35.039 the loan developer would do all of the brilliant work and then
00:18:40.960 when they felt that they were ready to have their thing reviewed it was at that
00:18:46.480 point a whole thing they would run the tests locally that took about an hour because it was all of the tests and
00:18:53.039 the tests were really slow and then they would check in on a branch and this was a garrett-based flow
00:19:00.240 i am not here to bash garrett it is a tool if you are in an environment where you need something
00:19:06.240 that automates the workflow of pull requests code reviews
00:19:12.640 i actually know nothing about how it is now this was years ago it is a tool but
00:19:18.000 i will also tell you this is a this is a preview of what's coming i took tremendous pleasure in
00:19:24.320 ripping garrett out of this process uh in any case you check in on a branch
00:19:29.600 ci would then run a set of unit tests taking about 10 minutes and then your pull request is
00:19:36.559 now sitting in a queue waiting for somebody to approve it and in this particular organization there
00:19:42.640 was a hierarchy there were people who were so junior they only had the ability to comment on
00:19:48.960 prs there were people who had earned a plus one they could give you a point
00:19:55.760 and you needed to get two points so you could get two of them or you could get one of the very few people
00:20:03.280 in the organization who had been ordained with a plus two
00:20:08.559 now as you might imagine in an environment like this uh who you knew
00:20:14.559 uh was kind of the whole the whole thing so if if you were one of those plus twoers and you needed your
00:20:21.120 code reviewed you just kind of uh nudged your your buddy and they would do the
00:20:26.559 code review and your code would get in and if you were low on the totem pole so
00:20:32.799 to speak i'm really sorry i used that phrase in any case if you were
00:20:38.480 um if you were a of lower status
00:20:44.480 your pr could wait for a very long time especially if it wasn't considered
00:20:50.320 critical which is why one of the saddest stories that i took away from that
00:20:55.360 particular experience was a junior developer who expressed to me her tremendous
00:21:02.000 frustration because she had a very small like couple lines of code change
00:21:07.280 that she couldn't get in for an entire week and during that week the there were other commits that were getting merged
00:21:15.120 and so she was constantly rebase rerun all the tests resubmit the pr
00:21:20.159 for an entire week for a few lines worth of change she didn't get anything else done that
00:21:25.280 week how incredibly demotivating worse that meant that she had fewer
00:21:32.400 opportunities to get code merged because through no fault of her own she's waiting for somebody to get around to
00:21:38.880 dane to review her pr okay so then it would merge to maine ci
00:21:44.640 runs the full set of tests this was the entire set of things that had to happen to get a change all the way through
00:21:50.799 and that process could take a day if you were one of those plus tours who got to
00:21:55.840 kind of jump the queue but it could take potentially weeks and in fact at the point where we turned off
00:22:02.000 garrett there were still pr's that were sitting there that were essentially abandoned they were stale nobody was going to go
00:22:08.320 through and update them whatever changes they represented either got lost in the sands of time or got
00:22:13.760 subsumed had been made in something some other change that did get pulled in
00:22:19.039 so we made a change we made a process change now i will note that uh the only reason this process
00:22:26.240 change was possible was that we had support all the way from the top this was not a universally
00:22:31.440 popular change there was support at a grassroots level for this change there were people who were so
00:22:37.760 happy and grateful that they were going to be able to move faster but not everybody felt that way
00:22:43.520 and i feel i need to be honest about the fact that this was not universally popular i still in hindsight this this
00:22:50.400 all was happening many years ago and in hindsight knowing how the rest of the story then evolved this was the right
00:22:55.919 decision because we were able to go so much faster after that we were able to
00:23:01.600 introduce so much more innovation the process went to okay we don't do prs we
00:23:06.799 pair on code that's how you get another set of eyes we all agree that it's a good and healthy thing for this project
00:23:12.320 to have multiple sets of eyes on things furthermore what's not represented here is we we practice collective code
00:23:18.080 ownership a team owns the code base there is no my feature your feature and consequently since we're pairing and
00:23:24.720 we're rotating pairs frequently everybody ends up touching that code so you're getting more than two sets of eyes on any given thing over a long
00:23:31.520 period of time but two is enough to get it merged into the code base for local tests we only run the fast
00:23:38.320 tests and yeah they took 10 minutes so yeah they technically weren't unit tests
00:23:43.520 but that was so much better than sitting there for an hour waiting for your tests to be done so the
00:23:48.720 fast tests we run locally then we check in we lived on main we merged to maine
00:23:55.039 and then ci would run the full set of tests and yeah sometimes stuff broke and then we would fix it but because we were
00:24:00.080 working in tiny tiny increments fixing was pretty quick too and so
00:24:05.279 now we could get changes in minutes to hours before they got merged in so when you think about your process and
00:24:12.320 the latency that it introduces the weight states that it introduces remember the fruit flies
00:24:19.919 we want to be able to get so many more cycles that that's what this kind of process change can give you
00:24:27.840 okay now let's talk about branching strategies batch sizes and latency
00:24:33.200 my personal preference is to live on main i recognize that's not possible for everyone but if you are
00:24:39.520 able to do this in your context i it means that yeah if you're using git
00:24:45.919 that local copy that i have is essentially a branch but i'm making a few changes get to green
00:24:52.000 make it clean and then check it in everybody else is doing that as well and so at any given moment in time the
00:24:58.720 amount of inventory work in progress that's sitting out unchecked in is a fairly small amount
00:25:06.080 so we don't get a whole lot of churn we certainly don't have the experience that that poor junior developer who
00:25:12.880 incidentally wasn't actually that junior that was just the status they had been given
00:25:18.400 but that poor junior developer had of waiting an entire week to get something merged because of the amount of churn
00:25:23.440 that was happening and they're anyway moving on
00:25:28.799 feature branching also very common i am not here to argue with you about whether you should live on main or should do
00:25:33.840 feature branching there are trade-offs there are good reasons to do both it does introduce the challenge that now
00:25:40.480 you've got uh larger batch sizes and so you can merge less frequently and so if
00:25:46.000 you notice that although this looks like a nice neat little diagram and it feels all warm and fuzzy that it's not quite
00:25:53.440 that simple and the way that that shows up in this diagram is if we flip back and forth between the two we can see
00:25:59.039 that we're getting fewer merges to main so that means that the generation cycle that that that uh life cycle is that
00:26:05.919 much longer and that introduces what do we know about that area under the curve risk more risk
00:26:12.320 so far so good okay that was all the background to the cautionary tale
00:26:18.720 let's talk about a process that i do not recommend anybody anybody anywhere ever do under any
00:26:26.600 circumstances but in this organization and i i have to confess i did not see it at the moment
00:26:32.559 that it was happening my involvement happened some number of months later i came in and i was still hearing the
00:26:38.640 echoes of the screams of pain from this particular situation
00:26:43.760 uh with a long-running team branch it started off simply enough we have a main and the organization had decided for
00:26:50.880 reasons that i am sure made sense at the time that every one of the teams would have
00:26:57.279 their own team branch i understand that in hindsight this does
00:27:04.400 not seem like a good idea i'm hearing some of you laugh but i am willing to believe that they were doing the best
00:27:10.080 they knew how given everything that they knew at the time it still didn't turn out so well um but
00:27:17.360 i'm getting ahead of myself let me go through the rest of the story so and you know it now the team is
00:27:23.200 treating the team branch kind of like you would treat maine in a feature branch scenario developer a developer b
00:27:30.559 developer c they're all working on stuff developer a is is working on their feature they now merge back to the team
00:27:36.720 branch develop and then you know start another feature uh and then of course in the meantime
00:27:42.320 maine is changing and they did have a plan for rebasing off of maine onto the team branch
00:27:49.039 uh but it was way too easy for a developer to just ignore everything that was uh
00:27:54.880 i'm really sorry you're completely having a reaction to this um it was
00:28:01.600 i am so sorry um developers see
00:28:08.159 tries to merge back discovers that of course they have ignored the fact that the world was changing around them
00:28:14.000 uh and so they now have to okay in the meantime developer b has
00:28:20.720 been cranking away on their feature and the world has been moving on
00:28:32.080 this is in fact what happened um the really sad part is an entire team
00:28:40.000 threw away six months of work can you imagine
00:28:48.080 oh i'm really sorry i'd better move on some of you are having very strong reactions
00:28:54.240 let's talk about test pollution oh it's not getting better is it
00:29:00.559 i warned you did i not warn you up front by the way our spec is lovely thank you so much
00:29:13.120 the state of javascript testing is a different problem which incidentally i just as an aside um
00:29:19.679 my new thing is it's not stealth it's just i'm not quite sure what it's going to grow up to be
00:29:25.520 it's curious duck digital laboratory i am building a simulation game thing uh
00:29:31.600 it is uh the the back end is a pure ruby gem yay gems um and massive shout out to davis frank
00:29:38.880 who convinced me to make it a gem i and that has had massive uh payoff
00:29:46.240 um i but the front end is a rails front end with stimulus for javascript and so i
00:29:52.000 spent like six weeks getting my mison plus on plus for javascript testing
00:29:57.440 so incidentally if anybody wants to talk about that stuff i'm happy to show you what i've done get your feedback that is
00:30:02.799 actually not the point of the test pollution although there is a reason why i am mentioning this because we'll talk about
00:30:08.799 it with pollution first let me explain what i mean by that now your your feedback cycles the information you get
00:30:15.919 back can be polluted in a variety of ways what does pollution mean it just means that we don't trust the information that
00:30:21.840 we're getting back if you mix opinion in with empirical
00:30:27.039 evidence you now have a polluted stream that's one example of pollution
00:30:35.520 um another big source of pollution uh let's see if this feels familiar to y'all
00:30:41.840 you're a new person on a project you start doing some work uh you run the tests or the tests are
00:30:48.320 running in ci whatever and you see failures and you go wait those failures do not appear to have
00:30:53.919 anything to do whatsoever with the stuff that i changed what happened and
00:30:59.120 your your new buddy the person who's helping to onboard you says don't worry that's fine they do that just kick it
00:31:05.200 again and you kick it again
00:31:10.240 i hope i still have i hope i still have friends after this talk cause
00:31:15.840 i'm watching some of y'all's faces and i'm really worried
00:31:20.880 okay so you kick it again and sure enough it's a different set of tests that are failing and you kick it again
00:31:27.360 and you kick it again okay y'all have seen this
00:31:37.120 so um one one group that i got in involved with was working with a legacy system
00:31:43.919 that was massively distributed and had a whole lot of
00:31:50.240 threading and parallelization and very difficult to find race conditions and i
00:31:56.000 they had a very long long history so long that there had historically been a very talented and skilled qa group that
00:32:02.880 has built an incredibly sophisticated test harness but because the test harness had a tendency to expose things
00:32:09.200 that were both real and also not something that would be real the uh
00:32:15.200 developer team that did not feel any sense of ownership whatsoever over those tests or that test harness had a
00:32:20.720 tendency to just discount those results until somebody went through and
00:32:25.799 painstakingly did the analysis to discover whether or not that was real information
00:32:31.760 and in this environment we attempted to reduce our cycle time which meant
00:32:37.200 developers had to own the tests but developers had no intention of owning that set of tests but that set of tests
00:32:42.559 was the only set of tests that was giving us real information about the system so as we reduced our cycle time
00:32:48.080 we ended up increasing risk and at that point i was responsible for the the group i i've i've held all sorts of
00:32:54.960 rules and in this case i was a vp of r d and i did a really terrifying thing i i
00:33:01.039 pulled the big red cord i said we're not shipping any more features until we clean this up and i had expected that it
00:33:08.320 would take a few weeks of a concerted effort with everybody all hands on deck everybody cleaning this up
00:33:15.200 i was wrong it took months and i held my ground and i'll just know
00:33:20.480 that even in in the level of authority that i had within that organization it was a scary thing it takes an enormous
00:33:26.480 amount of intestinal fortitude to say no we're not going to write new features
00:33:31.840 until we can trust that our tests are giving us information that that we that we can believe it was not a popular
00:33:39.440 decision with sales what a shock um the product managers um who incidentally
00:33:45.519 reported to me they were not happy with me the developers who reported to me were not happy with me some of them were
00:33:51.039 but some of them were just of the opinion that um this was unfair that
00:33:56.080 they had to clean up this mess that they hadn't made and that they would much rather be developing new features and so this
00:34:02.960 was not a popular decision and yet our customers and support organization
00:34:08.320 needed me to make that decision because we were shipping objectively worse software with every
00:34:15.040 increment that we delivered so this is a very difficult situation that is the cautionary tale by the way
00:34:21.919 if you don't clean it up while it's a small mess if you wait until it's a superfund site
00:34:28.320 you're gonna end up in a situation where you have to make the excruciating decision between stop the line and don't
00:34:34.639 do anything but clean up the superfund site or be at a very serious risk of shipping a
00:34:39.919 product that frankly doesn't doesn't meet the value proposition that it's supposed to
00:34:46.839 okay so then and i noticed that i'm running a little short on time so i will tell the abbreviated version of this
00:34:52.800 cautionary tale what happens when you have both pollution and you have delayed feedback cycles and
00:34:59.680 that was this project i was again a vp and i had a peer who
00:35:05.280 was another vp who came to me and said what are you doing
00:35:10.960 what do you mean well your group theoretically ships software and has been unable to ship software so your group isn't doing what
00:35:17.280 your group is supposed to do what are you doing now i knew we had challenges but
00:35:23.200 let's just say that that was a galvanizing conversation in which i decided it was time for me to understand
00:35:29.440 in depth so i started asking and we ended up i interviewed a whole lot of the
00:35:35.599 individual contributors who were on that project we would stand in my office and i would
00:35:40.880 ask them help me understand again from the moment that a developer has something ready that theoretically is
00:35:46.640 ready to ship to the moment that we actually ship it what do we do again tell me again no tell me like i'm five
00:35:52.720 and we we mapped out collectively through the series of conversations some of which happened one-on-one and some of
00:35:58.400 which happened in groups we ended up on with a diagram on my whiteboard that that lived there for months that showed
00:36:05.839 the pipeline and it showed the following information about the pipeline so the first stage was there was a build step
00:36:11.760 this was shipping enterprise software first stage is there's a build step and then some set of fast tests run and then
00:36:18.160 if that's green it goes on to the system tests that were in a fan out across multiple
00:36:23.680 configurations and environments as you do um and then there was a final final
00:36:29.760 final packaging step of some kind so basically four stages and okay well how long does each stage
00:36:35.200 take that first build step a few minutes some of you are laughing in anticipation
00:36:40.880 um i that first step takes a few minutes the second step takes a few minutes uh
00:36:47.440 that third step that could take anywhere from four hours
00:36:53.520 to over 24 hours that's a lot of variation why what what
00:36:59.680 is different between the four hour runs and the 24 hour runs well it turns out what i i learned it took a long time
00:37:06.720 because nobody had ever really looked at this in in this particular framing in this way until we all got together
00:37:13.760 because everybody had one piece of the puzzle and we were assembling a jigsaw but the spoiler is the reason ultimately
00:37:20.800 that the things sometimes took a lot longer was because we were waiting for a lock
00:37:26.240 on an environment in a pool of very restricted environments
00:37:32.400 and there was a fair amount of at the peak of activity right before a theoretical release there would be a lot
00:37:38.160 of contention vying for one of those coveted locks on
00:37:43.280 an uh on those environments so there could be a very long wait state
00:37:48.480 so that's that is the delayed feedback piece right that thing was causing a weight
00:37:55.200 state that could be variable length but but end up being very very long okay so the next question i had to ask
00:38:01.599 was well how often are things failing and and let's talk about pollution why
00:38:07.280 are they failing when when tests fail have we learned anything new or were they failing for spurious reasons
00:38:14.560 and um well the build never failed we always got something out whether or not
00:38:19.760 it worked was a different question but the build never failed the fast tests almost never failed the failures were mostly occurring in the
00:38:26.880 system tests and they were flakes we had a burgeoning superfund site
00:38:34.800 uh so the solution to this incidentally we we had gotten to the point where we were just wedged we
00:38:41.200 kind of like a car spinning in sand um we were struggling to to ship
00:38:47.040 um and so the solution ended up being to focus on doing two things
00:38:52.560 one was to reduce the amount of time that the tests took and to reduce the weight states by trying to increase the
00:38:59.280 number of those very coveted environments that were available but also reducing the contention by
00:39:05.280 reducing because it turns out that if when the tests fail that thing ends up going through that
00:39:10.800 whole cycle again so that increases the amount of contention for those locks
00:39:16.480 so we needed to reduce the flakiness in the test suite until the point where when it failed it was telling us
00:39:22.400 something real so if you are facing a potential
00:39:28.079 superfund site here are some strategies that you can try one is to just separate the streams you've got blocking and
00:39:33.760 non-blocking caveat this only works if
00:39:39.839 you trust that you have sufficient coverage in the blocking tests to tell you whether or not
00:39:46.000 it to tell you about the risk in the software that was not the case in the story that i told of the massively
00:39:51.359 parallel uh we didn't have enough test coverage we did not know what we were shipping we
00:39:57.359 were definitely shipping schroedinger's releases all over the place um if part of the reason why flakiness
00:40:03.760 is persisting is that sense of tragedy of the commons that nobody feels a sense of responsibility
00:40:09.920 or ownership or maybe even agency to clean it up getting cross-team
00:40:14.960 partnerships going can help tremendously and then just carving out time on a regular basis which i recommend
00:40:20.960 even for a problem projects that don't have these problems yet i recommend
00:40:26.720 carving out time in some way shape or form you may not need to do entire tidy tuesdays or whatever you choose to do
00:40:33.280 uh maybe just every time somebody finishes a feature there's an expectation that if there are
00:40:38.400 some flakes or mysteries in the code base that that's the next thing they tackle before they tackle something else
00:40:46.400 and then the other thing i strongly recommend is reducing test execution time here's where i get to tell a story that
00:40:52.240 um incidentally this is this is for aaron tender love are you here
00:40:57.839 okay well this is being recorded so just so you know this entire story is for him
00:41:04.880 uh which the reason why we'll become a parent momentarily
00:41:10.400 so once upon a time we were struggling with long build times or long long test times and i was uh sorting encouraging
00:41:17.760 teams to think about shortening their test times it kind of wasn't working i so i i am not above bribery
00:41:25.839 um at one point i basically said hey tell you what you lop
00:41:31.280 an hour off of that test cycle time and i'll bake you a pie
00:41:37.040 and the person i was talking to said i like pie and i said well it turns out i'm really
00:41:43.040 good i'm really good at making pie these are real pies that i made
00:41:51.760 i'm pretty good at pie so he lopped an hour off the next day i
00:41:58.560 brought him a pie the other team members said i like pie
00:42:04.880 and thus we formed a tradition you lop a sufficiently large amount of
00:42:10.240 time off of that very very very long cycle time you get a pie and it can be it can
00:42:16.000 be a collective pie it doesn't have to be a heroic individual effort
00:42:21.359 but you get a pie because we want to reduce the time in our pipelines
00:42:35.359 you now see why this was dedicated anyway let's bring it home
00:42:42.240 uh healthy feedback loops the things we have been talking about
00:42:47.280 have to do with seeing to so to see to the care and feeding of your feedback loops you want to make sure you keep
00:42:53.119 them tight keep them short watch those weight states make them as short as you possibly can
00:43:00.000 given the context and those constraints therefore that you live within but attend to the the time in the feedback
00:43:06.720 cycle think about the fact that you have these multiple levels too often i have had developers argue
00:43:12.560 with me that the unit testing is a waste because it's just going to get tested at the system level anyway so why bother
00:43:20.240 i'm so glad i'm speaking to a community that doesn't buy into that like modulo javascript which we can talk
00:43:26.560 about separately um uh and then keep them clean keep the
00:43:32.000 pollution out of your feedback cycles and i want to introduce you to one more feedback cycle
00:43:38.000 this is yet another one should look very familiar it's kind of like the others that we talked about but this is the cold learning cycle
00:43:44.400 it turns out that you know experiment experience observe
00:43:50.319 and reflect and then abstract the lessons learned that is also a feedback cycle so in short every feedback cycle
00:43:57.359 is a learning cycle and the more of those feedback cycles you get the more you get to learn
00:44:04.160 which is why i say there is no failure there's only learning but i'll also tell you that there are some weeks when i do
00:44:10.079 a lot of learning all right i am down to 55 seconds on the
00:44:15.839 clock which means that i don't think we've got time for q a i am here all day though love to talk about this stuff
00:44:22.000 we'd love to talk with y'all thank you so much for having me and thank you for laughing at my jokes