00:00:00.900
foreign
00:00:12.440
excited to be back at railsconf this is my first conference since 2019. dang it
00:00:17.820
is nice do you all miss this yeah um so my name is Brad yorani I'm a
00:00:24.060
principal engineer at procore I live in Austin Texas uh what the hell is that oh there we go
00:00:31.560
uh Pro core build software for the construction industry if you're building a skyscraper like this one a shopping
00:00:36.899
mall um you know a subway system uh you're probably using procore it's a big Suite
00:00:43.020
of enterprise software we build the software that builds the world that is our model
00:00:49.140
um holy cow uh there's a one of our iPad apps used for building design and things like that we are I think 200 maybe 300
00:00:57.360
rails developers strong um Grew From a tiny little shop up into a great big one starting on Rails 0.9
00:01:04.619
over 16 years ago um we now have I think might be the
00:01:10.260
world's largest rails app I know that Shopify has got a big one not sure if you all are still contributing to that
00:01:15.479
or if it's still growing ours is huge over 43 000 Ruby files over a thousand database tables over 12 000 routes two
00:01:23.280
or three hundred devs probably 250 or 300 devs working on it every single day it continues to grow
00:01:30.840
um with this comes some questions right because well uh this model of giant rails apps has served us well uh it
00:01:37.680
can't scale Forever at some point uh we need to do something to do something else to break that apart and that brings
00:01:43.200
up some really tough questions how do we split the world's largest monolith um how does our model f talk to new
00:01:49.320
services not as obvious as it might seem how do we make it multi-region so if we have like our AWS regions you know we've
00:01:55.979
got an instance of this whole application in Europe in North America they do need to communicate to each other they need to share data with each other
00:02:02.520
how do we do that how do you break out things like search and Reporting which are sometimes tough problems um and the
00:02:08.819
source of a lot of performance problems um having you know reporting queries baked into your primary database or
00:02:15.180
having Search Baked into your primary database along with the problems of making sure all the data gets into all
00:02:20.940
these systems Downstream in a consistent way so that all these different data stores match at the end of the day
00:02:27.420
foreign platform team at procore this was
00:02:34.980
basically what we came up uh how we're building solutions to a lot of these problems that's actually a picture of
00:02:40.620
our platform if you zoom in on the water it's actually like little ones and zeros floating Downstream uh so um this is at
00:02:47.220
the core of our distributed system strategy it is basically the backbone of our service to service communication
00:02:52.340
amongst other things it's our way of replicating data Zone to Zone it's a suite of tools built by my team and I
00:02:58.920
that allow sort of self-service creation of services that can publish and consume from streams
00:03:04.800
um so that we can finally break this monolith right and start developing microservices so streams themselves right they go by a
00:03:12.060
few different names you might hear it called an event bus right a transaction log I made the icon of log get it
00:03:17.940
transaction log yeah um a stream um some people call it a pub sub right for those of you born in the 90s that is
00:03:24.659
a newspaper it's like the internet but on paper um
00:03:30.060
and at the heart of this right is Apache Kafka Kafka is a distributed log right
00:03:36.360
it was invented by um Jay krepps who was at LinkedIn at the time and it is fundamentally um oh well
00:03:43.920
I should say that um Apache Kafka is not the only solution that does that there are some competitors Amazon has Kinesis
00:03:50.099
which is similar but not as fully featured Google has Cloud Pub sub I
00:03:55.379
don't know who actually uses Google Cloud but um uh so they're our competitors but most people in this ecosystem are using Kafka it's kind of
00:04:01.920
the tool of choice definitely the most common and has the most tools and things built into it and libraries
00:04:09.900
um so Kafka is fundamentally fundamentally this system where you have something that publishes data
00:04:16.019
um to what's called a topic a topic is like a single Stream So you uh you publish to a Kafka topic um and then
00:04:21.720
something else consumes from it it is uh so I'm going to go into just a few nuts
00:04:28.199
and bolts about how you read and write to Kafka from Ruby before we get into the fun stuff
00:04:34.259
so uh there's this wonderful gem it's called Ruby Kafka it's maintained by the fine folks at zendesk if you're here and
00:04:40.139
you work on this I love you please come talk to me uh so um and and using it is actually
00:04:46.919
pretty simple you instantiate this Kafka client you pass it some config options with broker names and things like that
00:04:51.960
you deliver a message you've got the message you've got which topic you're going to write to and you've got a partition number which I'm going to talk
00:04:58.020
about in a second so um not too difficult really to to kind of get started with this once you
00:05:03.900
have Kafka up and running somewhere um it also has an asynchronous producer mode so the first one I showed you was
00:05:09.840
synchronous where you actually write to Kafka it blocks you ain't for a response and then uh you get your your thread
00:05:16.139
back right um there's also an async version of this which when you publish a message to Kafka actually puts the
00:05:21.780
message in a in-memory background queue and there's a background thread on your web servers that takes those messages in
00:05:27.539
that queue in small batches and publishes them to Kafka the nice thing there is that it doesn't block your
00:05:32.940
request right so in your rails app users aren't waiting for you to publish to Kafka and you return a little bit faster
00:05:38.539
the downside of that is it's not durable so in like a kill 9 scenario or if your server crashes you can lose messages
00:05:44.100
that way of very important things to consider in fact I've got a whole section coming up about how to produce and how to produce
00:05:51.360
consistently based on your needs um Kafka topics are broken up into
00:05:56.699
partitions so what happens is these consumers um they subscribe to a Kafka topic they
00:06:02.699
take off one small batch at a time and they process it and then they grab the next batch to make that scale it's
00:06:08.460
broken up into partitions based on a partition key you choose your own partition key that could be something like user ID and for that particular
00:06:15.600
partition messages are in order um and then you can parallelize it by running uh consumers right up to one
00:06:22.740
consumer per partition um that's how Kafka scales um when you're reading from Kafka like
00:06:29.520
just in basic Ruby um once again this is the Ruby Kafka gem you could figure it with these broker
00:06:34.800
names Brokers or like the Kafka servers right um You have got this Loop here you subscribe to a topic in this Loop and it
00:06:41.759
just loops and loops and Loops consuming micro batches of messages how you do exception handling in there is real
00:06:48.360
critical if you don't want it to stop and and um definitely something to pay attention to because it determines uh basically
00:06:54.600
your ordering guarantees uh which is something important I'm going to talk about a lot
00:06:59.639
um if you're using rails right you can in fact turn your rails app into a Kafka consumer it is a little bit fraud it's
00:07:07.620
not necessarily straightforward or easy but um if you basically make a new class right call application initialize that
00:07:13.860
you know starts off all your auto loaders and your initializers right which that allows you to use like your
00:07:19.139
model classes and things like that then you subscribe to Kafka and you start processing messages so this is skipping
00:07:24.960
the whole web server right and the normal rail server we're basically using
00:07:30.000
um we're basically just running a ruby file directly and then loading rails classes and you can do that depending on
00:07:35.699
which version of rails you're using it may or may not be more complicated than that to get the environment to load but
00:07:41.099
that's nice you can you can write all the classes you've already written in your rails app if you don't want to break them out into a gym or something
00:07:46.199
like that uh so why Kafka this is the author Franz
00:07:51.720
Kafka he wrote a great book about a man it turned into a cockroach
00:07:57.300
um Kafka has some interesting guarantees that aren't really true about most other similar but different Technologies um
00:08:03.780
one is at least once delivery a Kafka consumer always retries until it
00:08:08.880
succeeds in processing your message so it will sit there you know it's subscribed to one partition it will read the messages
00:08:15.539
off that partition and if it fails to get an acknowledgment back it will keep trying and in that way you are
00:08:22.319
guaranteed to never lose a message provided someday you fix your bug or your performance issue or whatever it is
00:08:28.680
that is preventing that message from getting through it will sit there retry forever it's a disk based storage system
00:08:34.020
the messages don't get lost all the messages are replicated on disk across Brokers
00:08:39.539
um and it's got a retention period which I'm going to talk about Kafka gives you guaranteed ordering the order in which
00:08:46.320
the messages go in is the order in which they are processed and that is key to all of this I can't overstate how
00:08:52.440
important that is for these types of system because that is what allows you to make a consistent system say for
00:08:58.440
instance you've got a rails app right and it's writing to a database and then you publish to Kafka if you want that
00:09:04.140
and that down Downstream you're consuming it and writing to another database to a search index or something
00:09:10.320
um what you don't want to do is have like you know a record get inserted in your database and then updated and then
00:09:16.560
update it again and then but the last update goes before the first one and then you overwrite new changes with old
00:09:21.899
old ones that's bad right because then your data is inconsistent that's why this guaranteed ordering is so important
00:09:29.940
also multicasts so um you can have multiple consumers on a
00:09:35.160
single topic that's key too this is what really allows your system to be really
00:09:40.260
flexible um and you can replay it because uh Kafka Masters are retained uh often you
00:09:46.560
set your own retention window ours is a week uh if you mess something up right if you fail to write if you forget to
00:09:52.500
write a column to your database you can go back in time reconsume the stream and repopulate your database so uh with at
00:09:59.040
least once delivery and guaranteed ordering we have a system that is consistent with multicast we have something that is
00:10:04.440
democratized because many people could consume this use cases you don't even know about yet and it is decoupled you
00:10:10.380
have decoupled the upstream and downstream and different services that are consuming are decoupled from each other this is not necessarily true for
00:10:17.339
similar things right if you're used to rabbitmq Amazon sqs they don't have
00:10:22.500
retention you pick that message up off the queue it's gone you can't replay it they are not ordered so you can enqueue
00:10:28.980
two things often depends on some of the settings but you can enqueue two things and have them the order be swapped you can have them
00:10:35.940
processed at the same time this is also true of sidekick which is different but you know it uses redis as a queue often
00:10:42.540
you get parallel execution of these things which is not consistent because it is not ordered and you can have old things overriding new things if
00:10:49.140
you're not careful and if for actually running Kafka you've got some options if you don't actually want to spin up your own Kafka servers
00:10:55.320
Amazon has a relatively new one called msk even though they have a competing Kinesis go figure
00:11:01.620
um heroku's got a great one surprisingly we used to use it it's actually really awesome and it's cheap and it's fun and
00:11:07.019
then confluent is the Kafka company they make all the open source tooling but then they've got commercial versions of
00:11:12.240
this stuff Cloud hosted versions uh use cases that is Jacobs the inventor
00:11:17.700
of Kafka he did not turn into a cockroach um so simple question here
00:11:23.940
um actually it's not that simple it's deceivingly complicated what is the best way to make one rails app talk to
00:11:29.700
another hmm all right well you know a simple thing to do would be an HTTP post
00:11:36.060
so that's not as great as it sounds um it's not tolerant to errors if that
00:11:41.760
if that Downstream system is offline if it's being deployed if it's bogged down due to database performance that's going
00:11:48.959
to return an error right and then you either have to swallow that message and lose it or you have to return an air all the way back to the user so it is not
00:11:56.040
fault tolerant at all if this is not your app Downstream if it's a third-party app you're especially vulnerable here this can create big
00:12:02.640
problems latency is also a problem because you don't know 100 the performance characteristics of the
00:12:07.860
downstream app if the database is bogged down you could be waiting for seconds for that thing to return you know if
00:12:13.019
under Peak load or if someone runs some giant query that they're not supposed to run you could be sitting there waiting
00:12:18.060
forever your Upstream one could actually time out whereas writing to Kafka is fast once you write to Kafka it's fire
00:12:24.600
and forget continue on with your life right and the downstream system will catch up
00:12:29.760
um and it's also one and done there's no Downstream consumers here the history is kind of lost unless you're like logging
00:12:34.860
all requests right um the history is lost you can't go back and examine your your past history and
00:12:40.500
things like that it's one and done not so great grpc this is like a streaming technology uh Google streaming
00:12:46.380
technology which is a much more efficient protocol than HTTP but It suffers from a lot of the same problems
00:12:52.980
um again you can multicast with that but again it's not fault tolerant Etc et cetera Etc
00:12:59.579
um Kafka fixes these it's buffered which means um like it could smooth out Performance Bikes right because if you
00:13:05.579
get a spike of Rights the reads are relatively consistent it's based on the number of consumers you have and the number of partitions it's durable the
00:13:12.120
message is not lost you can go back and replay it's consistent um and it's democratized meaning lots of
00:13:18.959
consumers can subscribe to it which you know can pay off in the future for use cases you haven't even anticipated yet
00:13:25.019
em potency is key I put that key emoji on there to reinforce its key that was my last minute Edition in the speaker
00:13:30.420
Lounge it's key um because the Kafka always retries there's always the possibility of
00:13:36.839
duplicated messages so you have to upsert on the right side there um otherwise you could get duplicate
00:13:43.440
inserts right so it's got to be item potent operations um so uh here's an uh use case number
00:13:48.899
two so use case number one was service to service use case number two here is this like kind of multicast idea imagine
00:13:54.360
you've got the service in a new user signs up there's a lot of things you need to do send a welcome email
00:14:00.320
insert a search record like maybe in your elasticsearch cluster so that person is searchable in your directory
00:14:05.339
maybe you suggest people to follow or recipes to try or whatever and you've got some machine learning model or
00:14:10.500
something that you're that you're calling on here we've split that up with multiple consumers um those are now independent they are
00:14:17.220
decoupled you've got independent retry for each one of these and because of guaranteed delivery you know that even
00:14:22.500
if one of these Services down it doesn't affect the other ones and the message will never get lost as long as you get the running back at some point your
00:14:29.040
messages are still there and you're guaranteed to get all these steps to complete heterogeneous data stores similar but
00:14:36.360
different but this is a great case where you've got a rails app you know primarily you're running on postgres but a lot of us also need other things we
00:14:43.019
want to sort of mature a bit and use a real search engine like elasticsearch right maybe we want to use a reporting
00:14:48.180
database like snowflake which is great for running big aggregate queries this
00:14:53.519
allows you to make sure that all of these databases are consistent and have the same data in them if your app writes
00:14:59.279
to Kafka you know that because of its guaranteed delivery that all of these will eventually get the message and then
00:15:05.760
you'll have the same thing in every database and once again that's where the ordering comes in not only will they get there they'll arrive in order so you
00:15:11.760
don't have old things overriding new ones um one thing I do have to mention
00:15:16.860
because if you look at Kafka you're going to come up with this it's called Kafka connect and this is a solution to
00:15:22.199
make it easier to produce and consume from Kafka it's this application you spin up a bunch of PODS right like a
00:15:27.899
bunch of containers of this they talk to each other and you can create consumers and producers just by submitting
00:15:33.060
configuration files so if you want to write to elasticsearch right to S3 HTTP post these batches you just post you
00:15:39.660
just send these Json configs to Kafka connect and it spins up producers for you and in that way a lot of the
00:15:46.620
downstream stuff like the typical things of like take a message write it to a database you know take a message HTTP
00:15:52.260
post it somewhere take a message put it in S3 right you don't have to write all that code
00:15:58.680
um and like I said this works really well on the consuming end so you've got all these built-in database connectors
00:16:03.959
so you don't even have to develop this stuff yourself what the heck
00:16:10.199
now it is Java but you know you don't actually have to actually write that much Java to use it right this is the
00:16:17.040
spare girl right here she's really sad about that um and then use case number four right
00:16:22.260
similar but different is this idea of a data lake so if you're not familiar with like sort of uh Enterprise data
00:16:28.260
infrastructure parlance a data lake is like just a giant catch-all of all your Enterprises data most people including
00:16:36.000
us we implement this with S3 buckets right Amazon's cheap storage and this is
00:16:41.040
kind of cool if your services are communicating each other in Kafka there's no reason not to put an HTTP
00:16:46.620
sync on every single Kafka topic and back it all up to the cloud right so it's there forever we have ours there
00:16:53.639
we've never used it but it's there in case we do right but seriously um this
00:17:00.600
um there's new technologies that are coming online that make like basically querying S3 buckets as if they're SQL
00:17:06.839
um cheaper and better and it's been slow to mature which has been frustrating but this stuff is coming which I basically
00:17:12.179
can make all like years and years and years of your Enterprise's data searchable and queryable which is really
00:17:17.220
powerful and Kafka and Kafka connect helps you get it in there I tried to put like a lake Emoji or like a lake clip
00:17:23.760
art behind the buckets but I couldn't get it to look good so I just use a blue circle you know because Lakes are usually kind of ovular and less square
00:17:29.640
and I made it blue to look like water um my dream ultimately we have not built
00:17:35.940
this but I would love to um would be the ability to take historical data out of our data Lake and
00:17:41.880
load it back into Kafka so if you wanted to say hey I want to reprocess all of 2021's data to power a new machine
00:17:48.840
learning model you know we could press a button and have this tool created that goes puts it in Kafka you know populates
00:17:55.440
this thing or if you wanted to say like hey I've redesigned my reporting database I want to use a totally different table structure drop the old
00:18:02.340
database replay traffic from the beginning of time repopulated that we're trying to build a system where like for
00:18:08.880
every domain like a domain is like like a tool or a piece a slice of functionality in our in our system right
00:18:14.100
that we have complete historical data and the ability to rebuild it to basically repopulate by replaying all
00:18:20.220
that traffic we don't have that yet but we're going to get there
00:18:26.100
um um I forgot to change the title slide but finally use case number six um this
00:18:32.700
should be titled um event sourcing uh we all know that in a rails project right you typically attach
00:18:39.120
it to a database right but you don't necessarily have to do that if you read a lot of the literature on Kafka they
00:18:45.059
Advocate this pattern with the rails app or your producing app writes to Kafka first then you have a consumer that
00:18:51.299
writes the database and all the app does is Select from the database so it reads but does not write to the database this
00:18:58.200
decoupling is really powerful because again if you've got multiple databases a search Index right a machine learning
00:19:03.840
model a reporting database and things like that it's easier to make sure they're all the
00:19:09.059
same um we call this event sourcing that's you know definitions may vary out what
00:19:14.760
exactly that means uh but it's a powerful thing and if you're doing this in rails of course then you get to like go and reorganize your rails app rails
00:19:20.880
is great for this the way you can customize it you'd have like different read models different write models right and you get to basically redo your whole
00:19:27.660
data layer and create your own framework which would be fun
00:19:33.240
kind of thing uh usually this is not usually but often I'd say usually this is coupled with materialized views so
00:19:40.320
like imagine your basic widget right you've got a widget list view the list Saw the widget you've got an item view that shows like the details of that
00:19:46.740
widget you've got like an aggregate view like how many of those widgets sold you know day by day month by month week by
00:19:52.679
week using this allows you to pre-materialize those things to pre-compute them to make these uh tables
00:20:00.179
with that data in it so you don't have to query your database and calculate that on the fly right there are many
00:20:05.580
ways you can do it postgres has materialized views you know you could use a nosql style store which is pretty
00:20:11.400
common right and actually basically create you know data that matches what your page look like so list view in this
00:20:17.340
table an item view in this table right an aggregate view in this table you need Kafka to do that because it's the only
00:20:23.880
way to ensure that they all get the same data because of guaranteed delivery three consumers right really consumer
00:20:29.820
groups three consumer groups writing to three you can you know that they all succeed and you don't get those inconsistencies again if you try to just
00:20:36.600
insert into list view insert into item view insert into aggregate view if something fails right then you're
00:20:42.780
inconsistent you've got you know bad data forever basically
00:20:48.720
um and then once again right with this um with this event sourcing pattern where you don't write to your database
00:20:54.380
it does make it easier for heterogeneous stores right a reporting database a search index Etc
00:21:00.720
there's a name for this right this is one version of a pattern called command query responsibility segregation right
00:21:07.320
which basically means that your read model is different from your right model um and you know it already kind of is if
00:21:13.980
I were gonna you know take a shot at rails um you've got this model class which
00:21:19.200
represents a table right let's say that's a user um you always write one user record at a
00:21:24.660
time but when you read you're often inner joining aren't you join join joins and how many people have rails apps where
00:21:30.299
like half the queries are like eight joins or more right we do so fundamentally that is a different shape
00:21:36.000
of data isn't it it looks totally different yet they're one model file which you know leads to some confusion
00:21:42.179
um this definitely smashes that the the extreme Way by totally different tables right
00:21:48.720
um use case number six region to region replication uh we have customers in America we have customers in Europe and
00:21:55.380
Australia in Canada and other places and um they do we do not actually share records because we have General
00:22:01.020
construction contractors who hire you know in Europe who hire subcontractors in the US and um Kafka makes a great way
00:22:08.100
to um to basically replicate that across regions by having one consumer there are
00:22:13.320
a number of patterns you can do this with you can have one consumer you know post to another region
00:22:18.780
um there's this thing called Mirror maker which replicates a Kafka topic conveniently it is part of Kafka connect
00:22:24.120
so once again if you don't mind a little bit of java right
00:22:29.640
um you know you can employ that right and replicate topics across zones
00:22:35.400
um one more pattern that I will mention um is that like I said turning your rails app into a Kafka consumer it
00:22:42.179
definitely can be done um there's another way which is to have an HTTP sync which is just something
00:22:48.539
that consumes a micro batch from Kafka and HTTP post it to your service right you still get the same retry
00:22:55.380
mechanisms because if the post returns you know anything but a 200 response it continues to retry and that saves you
00:23:01.320
some of the Ops headaches from having to like spin up every instance of other instances of your rails app right or kind of hack the Ruby boot loading you
00:23:08.100
know the boot framework and stuff like that to figure out how to get your classes to run that way the other thing you could do is just use a plain old
00:23:13.559
Ruby consumer right and stick your model classes out of your rails app and put them in a gem that you share
00:23:19.919
uh so let's talk a little bit about getting stuff into Kafka because if you don't get it into Kafka the right way if
00:23:25.980
you can't make your production message production consistent um your whole system is going to be inconsistent inconsistency is bad
00:23:32.520
imagine you've got someone's medical records they're dealing with someone's money God forbid you know you'd have one database that shows have X dollars or
00:23:38.640
you know X whatever maladies and the other says something else you wouldn't want that
00:23:43.799
um so um there's a issue here that I have not that I've sort of glossed over
00:23:49.679
um but I'm going to address it now uh rails apps have a database typically
00:23:54.960
um and you may also want to write to Kafka right for Downstream things so
00:24:00.240
other things can see this data this message this introduces a problem called dual right
00:24:05.520
uh what dual right means is all right imagine I published a Kafka and I opened a database transaction insert into my
00:24:12.059
local database right well you've published a Kafka but your database transaction can fail it can roll back
00:24:17.400
and now you got something in Kafka that's not in the database and you are inconsistent if I put it at the end that doesn't help
00:24:23.520
right because you put in the database and then you can fail to publish to Kafka right now you are inconsistent
00:24:29.100
again because your local database does not reflect what's Downstream it doesn't help if you try to put the
00:24:34.860
publish in the middle of your database transaction which is typically a bad idea anyway um but sadly we do it a lot
00:24:42.600
um side note right quick side note if you are using sidekick right you may already be victim to this problem and not even
00:24:49.500
know it sidekick uses redis right sidekick or rescue these are these Ruby job processing Frameworks if you are
00:24:55.980
inserting into a database and then enqueuing a sidekick job there's no guarantee that you get one or the other and that is also a source of
00:25:01.679
inconsistencies or if you're using postgres for your database and elastic or like you know elasticsearch for
00:25:08.340
search right there's no guarantee that you get one or the other because you can always fail in the middle
00:25:14.220
um so to fix this we have a system called change data capture and this is a key to our like I told you we have this
00:25:19.260
giant monolith and we're trying to split it apart this is going to be a multi-year effort and it's probably going to take us eight years
00:25:24.779
um and take hundreds of developers but this is one way we get the data out in a way that avoids that dual right which is
00:25:30.840
uh change data capture we use the system called dibesium conveniently it is a Kafka connect right
00:25:37.559
um Source connector which means that to customize it we have the right job but um
00:25:43.260
what it does is it actually uh reads the postgres right ahead log right which is
00:25:49.260
right where the data gets serialized on a disk in a single process reads it
00:25:54.299
writes it to Kafka and that way you can actually just track a table you give it the name of a table see it's my users table and it will read the data as it is
00:26:01.080
written to the right ahead log and postgres and publish it to Kafka and in that way your database always matches
00:26:08.220
what's in Kafka and you've fixed the Dual right Java oh my gosh
00:26:13.799
um one more um one more quick thing to realize about that is it comes with some caveats a rails database is typically if
00:26:21.179
you're using you know like postgres a relational database it's typically a normalized database you know something
00:26:27.059
simple like a user record a product record is often split into multiple tables you are leaking the implementation details of that
00:26:33.000
Downstream if you're just doing direct change data capture right we all know these wonderful rails patterns like
00:26:39.179
single table inheritance and polymorphic relationships now guess what you're dealing with that Downstream right which
00:26:44.520
is not very fun um so the way around that is using this pattern called transactional outbox
00:26:51.419
um it works like this you um you insert like say this is a denormalized user or
00:26:56.460
sorry a normalized user record right so you maybe insert into users you insert into accounts you insert into profiles
00:27:02.460
then you do a fourth insert into a special table called outbox right which could be it could contain all that stuff
00:27:08.039
right you could take all the things that win on those three tables jam it together maybe even in some Json blob
00:27:13.260
right and then publish it to the outbox and then you use change data capture to follow the postgres while log right
00:27:20.179
consume that table publish it to Kafka and in that way because this is all in a database transaction so you begin commit
00:27:26.220
either all of them succeed or all of them fail and you're not left inconsistent running to BCM right is um
00:27:33.059
it's actually not that hard you turn on postgres rological replication you do have to run Kafka connect we've got it
00:27:38.100
in kubernetes and then you submit this configuration file and with your database connection
00:27:44.279
strings and stuff like that and it kind of attaches and reads all the changes we have it used to be I don't know if it's
00:27:50.220
still true the world highest volume Aurora postgres instance in AWS we were processing something like we still are
00:27:56.400
processing like something over 200 000 transactions a second and it keeps up on a single thread
00:28:03.779
you must choose but Choose Wisely it had this rails conferide I had to put a meme somewhere um so we've got a few methods of
00:28:10.320
publishing director Kafka synchronously which is the safe way to do it asynchronously which is faster but you
00:28:16.260
might kill you might lose messages if the server restarts right or crashes CDC
00:28:21.299
just tracking your database tables and publish them to message change data capture publishing them directly to
00:28:26.520
Kafka or transactional outbox the cleaner way to do it because you can decouple your stream from your database
00:28:34.919
consistent regimes require consistent producers um your requirements may vary honestly a
00:28:40.080
lot of you could probably deal with the Dual right because um you know unless you're dealing with someone's medical history it might not
00:28:45.360
be that big of a problem don't over build it don't use this stuff just because we use it we have different requirements than you
00:28:52.380
so you know summing up a little bit of this stuff we get streaming gives us a globally distributed eventually
00:28:57.600
consistent near real-time way to send messages between services and across zones or regions in a way that is fast
00:29:03.360
and reliable with strong guarantees that it is correct
00:29:08.400
um I'm running out of time so I'm going to go through this real fast what's in a message you've got headers partition key
00:29:14.159
the data and you got to choose a format right um you know at your company choose
00:29:20.279
standard headers for all your messages for all different services that allows you to route and filter and do different things Json is good but you might
00:29:27.659
consider a serialized binary format like Avro or protobuff it's more efficient and those these have schemas that get
00:29:34.440
recorded in the schema registry and allows you to track schemas That Vary over time there's a there's an issue if you're dealing with like data lakes and
00:29:40.380
Reporting databases of schema Evolution this helps you keep track of what message had what schema at one time
00:29:47.880
the future one more thing Apache Flink I've got one minute I don't know why they chose a squirrel for their like whatever I guess
00:29:54.480
those are cute I find them annoying I'm annoying because they're cute because like you want to pet them but no
00:30:00.059
one's ever petted a squirrel right um flank is this thing we're going to
00:30:05.159
add it in the future is super powerful I'm really excited it allows you to join streams together like crossing the
00:30:10.620
streams and Ghostbusters right imagine you've got a situation where you've got like a record of like a post like social
00:30:15.659
media like a Facebook post right it has a user ID oh um well wait now I've got another scream
00:30:21.539
another like a Kafka topic with all my users in it I can actually join them together just like a sequel join on a
00:30:27.120
stream that's great imagine you want to make these searchable right like an elasticsearch you can't Index this post
00:30:32.159
in your elasticsearch if you don't have the name of the author in it right so by combining by enriching stream take one
00:30:37.679
stream enriching it with another incredible powerful stuff um you can aggregate records Flink has a
00:30:44.279
built-in data store so you can aggregate how my time is up um aggregate records right just like a group buy but on a
00:30:49.500
stream great article the original by Jay craps um kind of talks in abstract terms about
00:30:55.679
streams and about Kafka this is essential reading if you get into this I'm Brad yurani I used to tweet I don't
00:31:01.620
tweet anymore I quit I was addicted it made my life better except then I took that like compulsive scrolling and I got addicted to eBay instead and started
00:31:08.159
collecting watches and now I own 100 watches um anyway I work in Austin Texas at procore we are hiring we're a great
00:31:15.720
place to work come talk to me unfortunately I don't have time for questions but you can come up and ask me if you want nope