00:00:10.719
ah i'm gonna confess this is a little bit surreal for me um even without kovid i
00:00:18.560
wasn't getting out a whole lot uh so in 2019 i don't believe that i did any
00:00:25.599
in-person live conference events like this so it's been probably at least three years
00:00:31.199
i'm a little nervous so i hope this all works for all of us but it is so good to see you all in
00:00:37.680
person and i'm so grateful to be here and to see you
00:00:43.040
so let's let's talk about feedback i this um
00:00:49.200
where to start well better just get started this is talking four parts first we're going to talk about the
00:00:54.559
nature of feedback in sort of an abstract way we'll go through that fairly quickly then we're going to talk about software and how it applies to
00:01:01.280
software then i'm going to tell some cautionary tales and then i'm going to bring it home so talk in four parts
00:01:09.040
let's start with just what is feedback it is the simplest thing in the world you do a thing you see what happens
00:01:15.040
you get empirical evidence that tells you if the thing that you did had the
00:01:20.960
effect that you intended and the empirical evidence part is the super important part of that
00:01:26.640
because opinions are not actually feedback if they are not giving you concrete information about the effect of
00:01:34.320
the thing that you did now we have a lot of very fancy ways of talking about feedback loops there's the
00:01:41.520
deming cycle plan do check act it's just a feedback cycle we plan what
00:01:46.799
we're going to do then we do it then we check to see how it went and then we act on the information that we got because
00:01:52.799
if you get a lot of feedback and then you do not act on it well that's a problem
00:01:59.040
um there's the ooda loop john boyd gave us the ooda loop of observe orient yourself
00:02:05.759
check act that's another feedback cycle there's lean startup build measure learn
00:02:12.080
build measure learn and of course the idea here is build the smallest amount possible so that you can test your
00:02:18.800
hypothesis it's basically the scientific method we're forming a hypothesis about what the market wants or what would be
00:02:25.680
successful for the outcomes that we're attempting to achieve and then we're going to
00:02:31.599
design that experiment the cheapest experiment we possibly can and then we're going to observe what happened
00:02:36.640
when we ran that experiment and then that's going to lead us to form our next hypothesis so it's just all feedback
00:02:43.280
so far so good sweet let's talk about software
00:02:49.200
this is where things get a little bit complicated um once upon a time when i gave variations of this talk by the way i've
00:02:55.599
given this talk a lot of times it's slightly different every time this time for reasons i still don't understand
00:03:01.200
myself i decided to rebuild all of the slides so you're looking at an entirely new
00:03:06.319
deck even if it looks a little bit familiar uh i will also say this is the first time
00:03:12.159
i'm giving this talk presenting from google slides this is not a google ad but i will say that the world has gotten
00:03:18.959
sufficiently different that i no longer feel the need to have everything local on my computer the internet is actually
00:03:24.159
ubiquitous plus also the whole download work offline mode actually works anywho
00:03:29.760
um so let's talk about every software project ever whether you work in an agile way and you're shipping frequently
00:03:37.440
or you work on big honking things that ship every five years
00:03:42.640
still yes some of you are laughing hi i'm an old that was the way things worked
00:03:51.360
been in the end wrote my first line of code in 1980 so
00:03:56.480
every software project ever at the start you're analyzing some stuff now that
00:04:02.000
might be an analysis phase or that might simply be doing a little
00:04:07.439
bit of user research to understand what we want to build the the problem really this is about analyzing the problem that
00:04:13.680
we want to solve this is the stage at which we're forming some hypotheses and then we're doing some design and
00:04:20.400
that might be designing a ui it might be capital a architecture but we're doing some design
00:04:26.560
and we're implementing now notice there's this curve that i haven't talked about yet that's the speculation curve because at
00:04:33.520
each stage we're speculating we are building assumptions on top of
00:04:40.080
assumptions we're speculating that we understood the problem to be solved that when the business analysts went to
00:04:46.240
gather requirements from the corporate stakeholders that they actually understood the corporate stakeholders
00:04:51.680
that the corporate stakeholders were actually saying um what their needs were as opposed to offering solutions
00:04:58.400
packaged as needs that weren't going to solve the problem that they actually had that never happens
00:05:04.639
um so we're speculating at each stage we're speculating that our design will in fact
00:05:10.000
solve the problem that we identified we're speculating that our implementation
00:05:15.120
is going to work that the so at each step we're building assumptions
00:05:22.000
uh and then we end up iterating whether you were doing waterfall or agile you still end up iterating you'll notice the
00:05:27.759
slope of that curve is starting to come down because now we do get some empirical evidence
00:05:34.320
then we do whatever is final testing if you worked in a very traditional kind of way you might have an entire qa
00:05:40.000
department doing the final test cycle if you work
00:05:46.160
in a very agile way you might be putting things onto the staging server whatever it is you're doing some form of final
00:05:53.039
testing and then there's a release and that's really when the
00:05:58.240
the chickens come home to roost and you find out so this is the entire life cycle of
00:06:04.240
what's happening inside that feedback cycle from do a thing to see what happens and
00:06:10.400
that area under the curve that's all risk so clearly the longer this cycle goes
00:06:18.720
the more risk we're incurring by the way this talk has a bunch of
00:06:24.639
digressions here is one of them let's talk about schrodinger's cat why on earth am i talking about schroedinger's cat well first
00:06:30.800
you may have heard of schrodinger's cat let me do a quick very quick summary
00:06:36.240
this is a thought experiment that was proposed by schroedinger i believe in a letter in 1935 to einstein i might have
00:06:43.360
the details a little bit wrong but there was the copenhagen school of thought in in quantum
00:06:49.759
mechanics that said that well it's all probability waves and so there is the potential to have superimposed
00:06:55.840
probability waves in which two states coexist at the same time and until there's an observation that is made
00:07:01.840
those probability waves do not collapse and schrodinger was trying to say
00:07:07.440
let's let's bring determinism back to physics let's take a completely absurd absurd
00:07:13.520
thought experiment that starts with one cat we have a box it is a sealed chamber
00:07:18.880
said sealed chamber has some some poison gas or a
00:07:24.000
cyanide gas that cyanide gas will be released based on completely random
00:07:29.280
decomposition decomposing of a radioactive isotope
00:07:35.280
that it has a half-life of time t so at the end of time t there's a 50 50 shot
00:07:40.720
that said radioactive isotope has decayed if it has decayed the cat is dead because the hammer hit the flask
00:07:46.639
and the flask has released the poison gas but if it has not decayed again 50 50 shot
00:07:52.080
uh the cat's a little alarmed um 50 50 shot that the cat uh you know could be alive if it did not
00:07:58.479
decay so uh this was a thought experiment and the
00:08:04.240
what he was trying to say is this is clearly absurd i'm not a physicist but it's my understanding that yeah
00:08:09.280
basically the probability waves collapse at the moment of observation and the cat is either alive or dead at that point
00:08:14.560
but until then probability waves still exist why why would i be telling you a story
00:08:20.800
about physics so let's talk about schredinger's release until the moment that we release we do
00:08:27.840
not know if it's alive or dead we don't have the empirical evidence we haven't made the observation and the
00:08:33.120
release exists in two states simultaneously which you uh may have enjoyed that that
00:08:39.599
possibility when you go into a status meeting and there's wild optimism on one
00:08:45.519
side and tremendous pessimism on another that is an example of these probability waves being superimposed and until we
00:08:52.320
actually get empirical evidence we don't know
00:08:58.320
so until you observe in the wild you're speculating
00:09:03.760
now in theory agile solves all this for us agile made the world perfect right 20 years old
00:09:10.000
uh-huh yeah thank you penelope for those of you in the back in case you couldn't hear uproarious laughter
00:09:17.920
so let's talk about agile there's a lot we could talk about about agile and i'm not going to i just want to focus on one
00:09:24.080
aspect of it in theory you're shipping very frequently
00:09:29.440
and in fact in my experience that when we do all of the disciplines of agile and do them well we're able to ship with
00:09:36.640
confidence much more frequently and in those circumstances what happens to that risk curve it gets so much smaller so we
00:09:43.440
get empirical evidence and then we can steer when we discover that we made invalid assumptions we can steer towards
00:09:50.320
value we can steer away from risk so iteration after iteration shipping after
00:09:56.399
shipping we're able to control the risk that is the theory of agile
00:10:02.320
however let's talk about the reality
00:10:07.839
even if you work on software as a service where you have the ability to deliver multiple times a week and it
00:10:15.440
goes in front of real users there still can be
00:10:21.200
a a difference a thing that we are not yet getting
00:10:26.320
empirical evidence about because you're putting it behind say a feature flag right so even if you're actually
00:10:32.399
delivering multiple times a week not everybody works on software as a service
00:10:37.600
i've shipped enterprise products where we could have a notion of release a bull
00:10:42.640
but we could not actually release as frequently as we could create a releasable artifact
00:10:48.959
and in those cases you definitely have these longer periods where you're not getting the empirical evidence
00:10:54.240
and particularly in those environments here's the thing that i've seen happen
00:10:59.360
over and over and over again here's a theoretical line this is how much we would like to test
00:11:07.040
that's all of the the system tests the unit tests the the performance tests
00:11:12.399
everything that we need to gather information about because a test is a thing that gets you information about
00:11:18.560
the behavior of the thing that you're producing that we theoretically want to do all of this
00:11:26.320
and um unfortunately we don't quite get there and sometimes it's a very conscious
00:11:32.240
decision sometimes it's a conscious decision that says it is so expensive for us to do our
00:11:37.440
performance tests it requires such a large environment that is realistic and
00:11:42.560
there is contention for that environment across multiple different projects in the same organization
00:11:48.160
therefore we're going to schedule that for the end sometimes it's a
00:11:53.600
less good reason for scheduling it for the end like the performance testing is a pain in the butt and i don't want to do it so we'll
00:12:00.079
just wait until later hence the whole xp mantra if it hurts do more of it but in any case there is a
00:12:07.040
gap a gap between what we actually the the information we actually get the
00:12:13.360
feedback we actually get and the ideal state and then what happens
00:12:18.880
iteration after iteration we're delivering and there's that gap
00:12:26.399
and what does that look like well that gap is speculation we're
00:12:31.839
speculating that it'll be fine there wasn't going to be that much information to discover in that gap
00:12:41.519
penelope i told you i wasn't going to make it better right okay
00:12:47.120
so so the speculation buildup happens until we get to the final ta-dah when we
00:12:53.279
actually release stuff and guess what the area under that curve is risk which is why it is much easier to do fragile
00:12:59.839
than agile
00:13:07.040
okay so i and yeah if you wait for that in the wild feedback so i've been making this
00:13:13.040
whole case that the in the wild feedback is the only valid empirical evidence that tells you
00:13:18.079
whether or not what you released is any good but if you wait for that oh you've
00:13:23.600
waited way too long it's way too risky there is a very high probability there's a wonderful wonderful talk in here
00:13:30.320
yesterday on the 737 max story you end up with a 737 max class disaster if you
00:13:37.760
wait and for in the wild feedback so part of what i'm hoping that you will
00:13:43.760
be thinking about is the different levels and types of feedback that you could be getting
00:13:50.399
a unit test answers a very specific type of question as a programmer did the code that i write
00:13:56.720
do what i intended it to do and if you have a comprehensive suite of unit tests without violating the
00:14:03.279
expectations that any of the other code already had in the system that tells you nothing at all and i mean
00:14:11.360
literally nothing about the overall behavior of the system from the user's perspective
00:14:18.160
i one a long time ago one of my consulting clients i used to consult
00:14:24.240
one of my consulting clients brought me in for a project that was three years into
00:14:29.440
a three-year schedule and they had not yet made it to formal qa and they felt that they needed help
00:14:34.880
with testing and quality and at the time that was what i was most often brought in to do and they said well
00:14:41.440
but we've got all these calm objects this was way back in the days when that was a thing that was the old micro services
00:14:47.680
okay so we've got all of these calm objects and we've tested them all so we should be fine right
00:14:55.600
oh yeah that that project by the way i was only involved for a for a short period of time but i later heard it was
00:15:01.760
five years the three year schedule ended up being five years because of late breaking surprises so
00:15:07.440
there's all these different types of feedback um the ci system is giving you
00:15:12.639
information about it running presumably in the different environments or configurations that it needs to run in
00:15:17.920
you probably don't run every permutation of that locally so you probably have a ci pipeline there is
00:15:24.240
probably somebody who is doing what i'm calling acceptance testing here and what i mean by that is somebody who is
00:15:30.320
accepting that the work represents the value that it was intended to represent so i i don't mean
00:15:36.880
acceptance tests like cucumber tests i mean like there is a say product manager who asked
00:15:43.040
for a thing and they're saying yes i got what i asked for so that's a different type of feedback
00:15:49.839
stakeholder feedback are we going in the right direction user feedback is the ultimate did did we deliver the value
00:15:56.240
that we intended to and for each of these different levels of feedback there is a different cycle
00:16:02.079
time a different natural cycle time your unit tests seconds to minutes
00:16:07.839
technically if you're running in minutes they probably aren't unit tests but that's
00:16:13.600
seconds to minutes is is not that bad uh integration systems yet yes somebody
00:16:18.720
groaned i'm with you but let's recognize the amount of legacy
00:16:24.560
software out there and give people a pass okay um integration systems uh ci tests
00:16:30.959
minutes to hours probably those acceptance tests it probably takes hours to potentially even days before
00:16:37.680
that person who is accepting the work whether they're a product manager or they are a qa person before they
00:16:44.399
actually take a look at the thing um stakeholder feedback can take days to
00:16:49.920
weeks and the user feedback that can potentially take years even if you're releasing very frequently because
00:16:55.680
if you work on enterprise software it could take years before your customer gains to
00:17:02.839
upgrade a reality of life in that context all right so let's talk about some
00:17:09.120
cautionary tales
00:17:15.120
huh but wait first in digression that is a
00:17:20.640
fruit fly anybody know why i've got a fruit fly on my slide short generational lifespan
00:17:29.919
they are awesome for science experiments because the generational the love span of a fruit play is like 50 days or
00:17:36.640
something but you get new generations every 10 to 12 days give or take so you can do longitudinal studies with
00:17:43.760
multiple generations in the span of weeks to months sweet
00:17:49.520
hold that thought let's talk about code reviews
00:17:57.039
so this is a cautionary tale of a team that i was involved with uh
00:18:03.120
at the time that i got involved with this this team uh this this
00:18:08.559
project uh this was their process it was fairly traditional it's one that you see in a lot of places it was a pull request
00:18:16.160
based process uh a loan developer ta-da developer
00:18:22.640
writes a whole bunch of code this was in an environment where individuals were incentivized to push
00:18:28.880
their features through that's how you got a promotion individual ownership of things so
00:18:35.039
the loan developer would do all of the brilliant work and then
00:18:40.960
when they felt that they were ready to have their thing reviewed it was at that
00:18:46.480
point a whole thing they would run the tests locally that took about an hour because it was all of the tests and
00:18:53.039
the tests were really slow and then they would check in on a branch and this was a garrett-based flow
00:19:00.240
i am not here to bash garrett it is a tool if you are in an environment where you need something
00:19:06.240
that automates the workflow of pull requests code reviews
00:19:12.640
i actually know nothing about how it is now this was years ago it is a tool but
00:19:18.000
i will also tell you this is a this is a preview of what's coming i took tremendous pleasure in
00:19:24.320
ripping garrett out of this process uh in any case you check in on a branch
00:19:29.600
ci would then run a set of unit tests taking about 10 minutes and then your pull request is
00:19:36.559
now sitting in a queue waiting for somebody to approve it and in this particular organization there
00:19:42.640
was a hierarchy there were people who were so junior they only had the ability to comment on
00:19:48.960
prs there were people who had earned a plus one they could give you a point
00:19:55.760
and you needed to get two points so you could get two of them or you could get one of the very few people
00:20:03.280
in the organization who had been ordained with a plus two
00:20:08.559
now as you might imagine in an environment like this uh who you knew
00:20:14.559
uh was kind of the whole the whole thing so if if you were one of those plus twoers and you needed your
00:20:21.120
code reviewed you just kind of uh nudged your your buddy and they would do the
00:20:26.559
code review and your code would get in and if you were low on the totem pole so
00:20:32.799
to speak i'm really sorry i used that phrase in any case if you were
00:20:38.480
um if you were a of lower status
00:20:44.480
your pr could wait for a very long time especially if it wasn't considered
00:20:50.320
critical which is why one of the saddest stories that i took away from that
00:20:55.360
particular experience was a junior developer who expressed to me her tremendous
00:21:02.000
frustration because she had a very small like couple lines of code change
00:21:07.280
that she couldn't get in for an entire week and during that week the there were other commits that were getting merged
00:21:15.120
and so she was constantly rebase rerun all the tests resubmit the pr
00:21:20.159
for an entire week for a few lines worth of change she didn't get anything else done that
00:21:25.280
week how incredibly demotivating worse that meant that she had fewer
00:21:32.400
opportunities to get code merged because through no fault of her own she's waiting for somebody to get around to
00:21:38.880
dane to review her pr okay so then it would merge to maine ci
00:21:44.640
runs the full set of tests this was the entire set of things that had to happen to get a change all the way through
00:21:50.799
and that process could take a day if you were one of those plus tours who got to
00:21:55.840
kind of jump the queue but it could take potentially weeks and in fact at the point where we turned off
00:22:02.000
garrett there were still pr's that were sitting there that were essentially abandoned they were stale nobody was going to go
00:22:08.320
through and update them whatever changes they represented either got lost in the sands of time or got
00:22:13.760
subsumed had been made in something some other change that did get pulled in
00:22:19.039
so we made a change we made a process change now i will note that uh the only reason this process
00:22:26.240
change was possible was that we had support all the way from the top this was not a universally
00:22:31.440
popular change there was support at a grassroots level for this change there were people who were so
00:22:37.760
happy and grateful that they were going to be able to move faster but not everybody felt that way
00:22:43.520
and i feel i need to be honest about the fact that this was not universally popular i still in hindsight this this
00:22:50.400
all was happening many years ago and in hindsight knowing how the rest of the story then evolved this was the right
00:22:55.919
decision because we were able to go so much faster after that we were able to
00:23:01.600
introduce so much more innovation the process went to okay we don't do prs we
00:23:06.799
pair on code that's how you get another set of eyes we all agree that it's a good and healthy thing for this project
00:23:12.320
to have multiple sets of eyes on things furthermore what's not represented here is we we practice collective code
00:23:18.080
ownership a team owns the code base there is no my feature your feature and consequently since we're pairing and
00:23:24.720
we're rotating pairs frequently everybody ends up touching that code so you're getting more than two sets of eyes on any given thing over a long
00:23:31.520
period of time but two is enough to get it merged into the code base for local tests we only run the fast
00:23:38.320
tests and yeah they took 10 minutes so yeah they technically weren't unit tests
00:23:43.520
but that was so much better than sitting there for an hour waiting for your tests to be done so the
00:23:48.720
fast tests we run locally then we check in we lived on main we merged to maine
00:23:55.039
and then ci would run the full set of tests and yeah sometimes stuff broke and then we would fix it but because we were
00:24:00.080
working in tiny tiny increments fixing was pretty quick too and so
00:24:05.279
now we could get changes in minutes to hours before they got merged in so when you think about your process and
00:24:12.320
the latency that it introduces the weight states that it introduces remember the fruit flies
00:24:19.919
we want to be able to get so many more cycles that that's what this kind of process change can give you
00:24:27.840
okay now let's talk about branching strategies batch sizes and latency
00:24:33.200
my personal preference is to live on main i recognize that's not possible for everyone but if you are
00:24:39.520
able to do this in your context i it means that yeah if you're using git
00:24:45.919
that local copy that i have is essentially a branch but i'm making a few changes get to green
00:24:52.000
make it clean and then check it in everybody else is doing that as well and so at any given moment in time the
00:24:58.720
amount of inventory work in progress that's sitting out unchecked in is a fairly small amount
00:25:06.080
so we don't get a whole lot of churn we certainly don't have the experience that that poor junior developer who
00:25:12.880
incidentally wasn't actually that junior that was just the status they had been given
00:25:18.400
but that poor junior developer had of waiting an entire week to get something merged because of the amount of churn
00:25:23.440
that was happening and they're anyway moving on
00:25:28.799
feature branching also very common i am not here to argue with you about whether you should live on main or should do
00:25:33.840
feature branching there are trade-offs there are good reasons to do both it does introduce the challenge that now
00:25:40.480
you've got uh larger batch sizes and so you can merge less frequently and so if
00:25:46.000
you notice that although this looks like a nice neat little diagram and it feels all warm and fuzzy that it's not quite
00:25:53.440
that simple and the way that that shows up in this diagram is if we flip back and forth between the two we can see
00:25:59.039
that we're getting fewer merges to main so that means that the generation cycle that that that uh life cycle is that
00:26:05.919
much longer and that introduces what do we know about that area under the curve risk more risk
00:26:12.320
so far so good okay that was all the background to the cautionary tale
00:26:18.720
let's talk about a process that i do not recommend anybody anybody anywhere ever do under any
00:26:26.600
circumstances but in this organization and i i have to confess i did not see it at the moment
00:26:32.559
that it was happening my involvement happened some number of months later i came in and i was still hearing the
00:26:38.640
echoes of the screams of pain from this particular situation
00:26:43.760
uh with a long-running team branch it started off simply enough we have a main and the organization had decided for
00:26:50.880
reasons that i am sure made sense at the time that every one of the teams would have
00:26:57.279
their own team branch i understand that in hindsight this does
00:27:04.400
not seem like a good idea i'm hearing some of you laugh but i am willing to believe that they were doing the best
00:27:10.080
they knew how given everything that they knew at the time it still didn't turn out so well um but
00:27:17.360
i'm getting ahead of myself let me go through the rest of the story so and you know it now the team is
00:27:23.200
treating the team branch kind of like you would treat maine in a feature branch scenario developer a developer b
00:27:30.559
developer c they're all working on stuff developer a is is working on their feature they now merge back to the team
00:27:36.720
branch develop and then you know start another feature uh and then of course in the meantime
00:27:42.320
maine is changing and they did have a plan for rebasing off of maine onto the team branch
00:27:49.039
uh but it was way too easy for a developer to just ignore everything that was uh
00:27:54.880
i'm really sorry you're completely having a reaction to this um it was
00:28:01.600
i am so sorry um developers see
00:28:08.159
tries to merge back discovers that of course they have ignored the fact that the world was changing around them
00:28:14.000
uh and so they now have to okay in the meantime developer b has
00:28:20.720
been cranking away on their feature and the world has been moving on
00:28:32.080
this is in fact what happened um the really sad part is an entire team
00:28:40.000
threw away six months of work can you imagine
00:28:48.080
oh i'm really sorry i'd better move on some of you are having very strong reactions
00:28:54.240
let's talk about test pollution oh it's not getting better is it
00:29:00.559
i warned you did i not warn you up front by the way our spec is lovely thank you so much
00:29:13.120
the state of javascript testing is a different problem which incidentally i just as an aside um
00:29:19.679
my new thing is it's not stealth it's just i'm not quite sure what it's going to grow up to be
00:29:25.520
it's curious duck digital laboratory i am building a simulation game thing uh
00:29:31.600
it is uh the the back end is a pure ruby gem yay gems um and massive shout out to davis frank
00:29:38.880
who convinced me to make it a gem i and that has had massive uh payoff
00:29:46.240
um i but the front end is a rails front end with stimulus for javascript and so i
00:29:52.000
spent like six weeks getting my mison plus on plus for javascript testing
00:29:57.440
so incidentally if anybody wants to talk about that stuff i'm happy to show you what i've done get your feedback that is
00:30:02.799
actually not the point of the test pollution although there is a reason why i am mentioning this because we'll talk about
00:30:08.799
it with pollution first let me explain what i mean by that now your your feedback cycles the information you get
00:30:15.919
back can be polluted in a variety of ways what does pollution mean it just means that we don't trust the information that
00:30:21.840
we're getting back if you mix opinion in with empirical
00:30:27.039
evidence you now have a polluted stream that's one example of pollution
00:30:35.520
um another big source of pollution uh let's see if this feels familiar to y'all
00:30:41.840
you're a new person on a project you start doing some work uh you run the tests or the tests are
00:30:48.320
running in ci whatever and you see failures and you go wait those failures do not appear to have
00:30:53.919
anything to do whatsoever with the stuff that i changed what happened and
00:30:59.120
your your new buddy the person who's helping to onboard you says don't worry that's fine they do that just kick it
00:31:05.200
again and you kick it again
00:31:10.240
i hope i still have i hope i still have friends after this talk cause
00:31:15.840
i'm watching some of y'all's faces and i'm really worried
00:31:20.880
okay so you kick it again and sure enough it's a different set of tests that are failing and you kick it again
00:31:27.360
and you kick it again okay y'all have seen this
00:31:37.120
so um one one group that i got in involved with was working with a legacy system
00:31:43.919
that was massively distributed and had a whole lot of
00:31:50.240
threading and parallelization and very difficult to find race conditions and i
00:31:56.000
they had a very long long history so long that there had historically been a very talented and skilled qa group that
00:32:02.880
has built an incredibly sophisticated test harness but because the test harness had a tendency to expose things
00:32:09.200
that were both real and also not something that would be real the uh
00:32:15.200
developer team that did not feel any sense of ownership whatsoever over those tests or that test harness had a
00:32:20.720
tendency to just discount those results until somebody went through and
00:32:25.799
painstakingly did the analysis to discover whether or not that was real information
00:32:31.760
and in this environment we attempted to reduce our cycle time which meant
00:32:37.200
developers had to own the tests but developers had no intention of owning that set of tests but that set of tests
00:32:42.559
was the only set of tests that was giving us real information about the system so as we reduced our cycle time
00:32:48.080
we ended up increasing risk and at that point i was responsible for the the group i i've i've held all sorts of
00:32:54.960
rules and in this case i was a vp of r d and i did a really terrifying thing i i
00:33:01.039
pulled the big red cord i said we're not shipping any more features until we clean this up and i had expected that it
00:33:08.320
would take a few weeks of a concerted effort with everybody all hands on deck everybody cleaning this up
00:33:15.200
i was wrong it took months and i held my ground and i'll just know
00:33:20.480
that even in in the level of authority that i had within that organization it was a scary thing it takes an enormous
00:33:26.480
amount of intestinal fortitude to say no we're not going to write new features
00:33:31.840
until we can trust that our tests are giving us information that that we that we can believe it was not a popular
00:33:39.440
decision with sales what a shock um the product managers um who incidentally
00:33:45.519
reported to me they were not happy with me the developers who reported to me were not happy with me some of them were
00:33:51.039
but some of them were just of the opinion that um this was unfair that
00:33:56.080
they had to clean up this mess that they hadn't made and that they would much rather be developing new features and so this
00:34:02.960
was not a popular decision and yet our customers and support organization
00:34:08.320
needed me to make that decision because we were shipping objectively worse software with every
00:34:15.040
increment that we delivered so this is a very difficult situation that is the cautionary tale by the way
00:34:21.919
if you don't clean it up while it's a small mess if you wait until it's a superfund site
00:34:28.320
you're gonna end up in a situation where you have to make the excruciating decision between stop the line and don't
00:34:34.639
do anything but clean up the superfund site or be at a very serious risk of shipping a
00:34:39.919
product that frankly doesn't doesn't meet the value proposition that it's supposed to
00:34:46.839
okay so then and i noticed that i'm running a little short on time so i will tell the abbreviated version of this
00:34:52.800
cautionary tale what happens when you have both pollution and you have delayed feedback cycles and
00:34:59.680
that was this project i was again a vp and i had a peer who
00:35:05.280
was another vp who came to me and said what are you doing
00:35:10.960
what do you mean well your group theoretically ships software and has been unable to ship software so your group isn't doing what
00:35:17.280
your group is supposed to do what are you doing now i knew we had challenges but
00:35:23.200
let's just say that that was a galvanizing conversation in which i decided it was time for me to understand
00:35:29.440
in depth so i started asking and we ended up i interviewed a whole lot of the
00:35:35.599
individual contributors who were on that project we would stand in my office and i would
00:35:40.880
ask them help me understand again from the moment that a developer has something ready that theoretically is
00:35:46.640
ready to ship to the moment that we actually ship it what do we do again tell me again no tell me like i'm five
00:35:52.720
and we we mapped out collectively through the series of conversations some of which happened one-on-one and some of
00:35:58.400
which happened in groups we ended up on with a diagram on my whiteboard that that lived there for months that showed
00:36:05.839
the pipeline and it showed the following information about the pipeline so the first stage was there was a build step
00:36:11.760
this was shipping enterprise software first stage is there's a build step and then some set of fast tests run and then
00:36:18.160
if that's green it goes on to the system tests that were in a fan out across multiple
00:36:23.680
configurations and environments as you do um and then there was a final final
00:36:29.760
final packaging step of some kind so basically four stages and okay well how long does each stage
00:36:35.200
take that first build step a few minutes some of you are laughing in anticipation
00:36:40.880
um i that first step takes a few minutes the second step takes a few minutes uh
00:36:47.440
that third step that could take anywhere from four hours
00:36:53.520
to over 24 hours that's a lot of variation why what what
00:36:59.680
is different between the four hour runs and the 24 hour runs well it turns out what i i learned it took a long time
00:37:06.720
because nobody had ever really looked at this in in this particular framing in this way until we all got together
00:37:13.760
because everybody had one piece of the puzzle and we were assembling a jigsaw but the spoiler is the reason ultimately
00:37:20.800
that the things sometimes took a lot longer was because we were waiting for a lock
00:37:26.240
on an environment in a pool of very restricted environments
00:37:32.400
and there was a fair amount of at the peak of activity right before a theoretical release there would be a lot
00:37:38.160
of contention vying for one of those coveted locks on
00:37:43.280
an uh on those environments so there could be a very long wait state
00:37:48.480
so that's that is the delayed feedback piece right that thing was causing a weight
00:37:55.200
state that could be variable length but but end up being very very long okay so the next question i had to ask
00:38:01.599
was well how often are things failing and and let's talk about pollution why
00:38:07.280
are they failing when when tests fail have we learned anything new or were they failing for spurious reasons
00:38:14.560
and um well the build never failed we always got something out whether or not
00:38:19.760
it worked was a different question but the build never failed the fast tests almost never failed the failures were mostly occurring in the
00:38:26.880
system tests and they were flakes we had a burgeoning superfund site
00:38:34.800
uh so the solution to this incidentally we we had gotten to the point where we were just wedged we
00:38:41.200
kind of like a car spinning in sand um we were struggling to to ship
00:38:47.040
um and so the solution ended up being to focus on doing two things
00:38:52.560
one was to reduce the amount of time that the tests took and to reduce the weight states by trying to increase the
00:38:59.280
number of those very coveted environments that were available but also reducing the contention by
00:39:05.280
reducing because it turns out that if when the tests fail that thing ends up going through that
00:39:10.800
whole cycle again so that increases the amount of contention for those locks
00:39:16.480
so we needed to reduce the flakiness in the test suite until the point where when it failed it was telling us
00:39:22.400
something real so if you are facing a potential
00:39:28.079
superfund site here are some strategies that you can try one is to just separate the streams you've got blocking and
00:39:33.760
non-blocking caveat this only works if
00:39:39.839
you trust that you have sufficient coverage in the blocking tests to tell you whether or not
00:39:46.000
it to tell you about the risk in the software that was not the case in the story that i told of the massively
00:39:51.359
parallel uh we didn't have enough test coverage we did not know what we were shipping we
00:39:57.359
were definitely shipping schroedinger's releases all over the place um if part of the reason why flakiness
00:40:03.760
is persisting is that sense of tragedy of the commons that nobody feels a sense of responsibility
00:40:09.920
or ownership or maybe even agency to clean it up getting cross-team
00:40:14.960
partnerships going can help tremendously and then just carving out time on a regular basis which i recommend
00:40:20.960
even for a problem projects that don't have these problems yet i recommend
00:40:26.720
carving out time in some way shape or form you may not need to do entire tidy tuesdays or whatever you choose to do
00:40:33.280
uh maybe just every time somebody finishes a feature there's an expectation that if there are
00:40:38.400
some flakes or mysteries in the code base that that's the next thing they tackle before they tackle something else
00:40:46.400
and then the other thing i strongly recommend is reducing test execution time here's where i get to tell a story that
00:40:52.240
um incidentally this is this is for aaron tender love are you here
00:40:57.839
okay well this is being recorded so just so you know this entire story is for him
00:41:04.880
uh which the reason why we'll become a parent momentarily
00:41:10.400
so once upon a time we were struggling with long build times or long long test times and i was uh sorting encouraging
00:41:17.760
teams to think about shortening their test times it kind of wasn't working i so i i am not above bribery
00:41:25.839
um at one point i basically said hey tell you what you lop
00:41:31.280
an hour off of that test cycle time and i'll bake you a pie
00:41:37.040
and the person i was talking to said i like pie and i said well it turns out i'm really
00:41:43.040
good i'm really good at making pie these are real pies that i made
00:41:51.760
i'm pretty good at pie so he lopped an hour off the next day i
00:41:58.560
brought him a pie the other team members said i like pie
00:42:04.880
and thus we formed a tradition you lop a sufficiently large amount of
00:42:10.240
time off of that very very very long cycle time you get a pie and it can be it can
00:42:16.000
be a collective pie it doesn't have to be a heroic individual effort
00:42:21.359
but you get a pie because we want to reduce the time in our pipelines
00:42:35.359
you now see why this was dedicated anyway let's bring it home
00:42:42.240
uh healthy feedback loops the things we have been talking about
00:42:47.280
have to do with seeing to so to see to the care and feeding of your feedback loops you want to make sure you keep
00:42:53.119
them tight keep them short watch those weight states make them as short as you possibly can
00:43:00.000
given the context and those constraints therefore that you live within but attend to the the time in the feedback
00:43:06.720
cycle think about the fact that you have these multiple levels too often i have had developers argue
00:43:12.560
with me that the unit testing is a waste because it's just going to get tested at the system level anyway so why bother
00:43:20.240
i'm so glad i'm speaking to a community that doesn't buy into that like modulo javascript which we can talk
00:43:26.560
about separately um uh and then keep them clean keep the
00:43:32.000
pollution out of your feedback cycles and i want to introduce you to one more feedback cycle
00:43:38.000
this is yet another one should look very familiar it's kind of like the others that we talked about but this is the cold learning cycle
00:43:44.400
it turns out that you know experiment experience observe
00:43:50.319
and reflect and then abstract the lessons learned that is also a feedback cycle so in short every feedback cycle
00:43:57.359
is a learning cycle and the more of those feedback cycles you get the more you get to learn
00:44:04.160
which is why i say there is no failure there's only learning but i'll also tell you that there are some weeks when i do
00:44:10.079
a lot of learning all right i am down to 55 seconds on the
00:44:15.839
clock which means that i don't think we've got time for q a i am here all day though love to talk about this stuff
00:44:22.000
we'd love to talk with y'all thank you so much for having me and thank you for laughing at my jokes