List

Effective Data Synchronization between Rails Microservices

Effective Data Synchronization between Rails Microservices

by Austin Story

In the video titled "Effective Data Synchronization between Rails Microservices," Austin Story, a tech lead at Doximity, shares insights from the company's journey in managing data synchronization in a growing microservice architecture. As organizations expand, maintaining data consistency becomes a complex challenge, especially with multiple teams involved, including application developers and data engineers. Austin outlines the evolution of Doximity's data synchronization strategies and presents a Kafka-based solution that has allowed their teams to work independently while respecting the business logic essential to their applications.

Key points include:

  • Background on Doximity: A Rails-based platform that has grown significantly over the past 11 years. It serves over 70% of U.S. Physicians, providing various services like telehealth and continuing medical education.
  • Need for Effective Data Syncing: As the company grew, synchronizing data across multiple Rails microservices became increasingly difficult. Ensuring that data teams and application teams remained aligned while managing complex data needs was a central theme.
  • Initial Approaches: Various methods were attempted to handle data synchronization, such as granting direct database access, which posed risks to application integrity and data logic adherence. An admin UI for RESTful interactions offered some improvements but was eventually deemed inadequate as the organization expanded.
  • Advent of Kafka: The final architecture embraces Kafka, a distributed event streaming platform, which effectively separates data producers (data teams) from consumers (application teams). This allowed each side to operate independently at their own pace.
  • Operational Framework: Doximity developed a structured operation system that consists of messages with attributes allowing independent processing and updating of data. This system has facilitated over 7.7 billion data updates since its implementation.

Overall, Austin emphasizes the importance of integrating data processing independently and safely to achieve seamless data synchronization that respects existing business logic. The Kafka implementation at Doximity exemplifies a scalable and effective approach to managing complex data ecosystems, underlining how careful architectural planning and the right tools can lead to successful microservice operations.

Data consistency in a microservice architecture can be a challenge, especially when your team grows to include data producers from data engineers, analysts, and others.

How do we enable external processes to load data onto data stores owned by several applications while honoring necessary Rails callbacks? How do we ensure data consistency across the stack?

Over the last five years, Doximity has built an elegant system that allows dozens of teams across our organization to independently load transformed data through our rich domain models while maintaining consistency. I'd like to show you how!

RailsConf 2021

00:00:05.120 hey everybody my name is Austin story and I am a tech lead and manager at
00:00:10.620 doximity I'm honored to be here to be sharing uh lessons that doximity uh our
00:00:16.320 team has learned over the last several years of how to effectively sync data between rails microservices and do that
00:00:23.039 at scale and I'd like to start with painting you more of a picture of what we're going to
00:00:28.260 be talking about over the next half hour and that's going to be whenever you have
00:00:33.420 a company that has more complex data requirements and whenever you start
00:00:39.059 things are a little more simple it's easy to make decisions you have a rails monolith following the rails way
00:00:47.040 how do you keep your application developers and your data engineers in sync
00:00:54.120 so that you can lean into all of the rich things that rails provides for you in order to keep your Rich domain models
00:01:00.660 in sync over time as your business grows your data needs
00:01:06.659 grow your applications grow get more lines of businesses you end up in a situation where you have multiple apps
00:01:12.240 multiple teams multiple data teams multiple application teams and those problems become harder and
00:01:18.720 harder to keep the data team and the web application team in sync and then finally imagine that you're in
00:01:24.900 a situation where you have dozens of lines of business and you have 70 plus application
00:01:30.780 developers dozens of teams that are working on and relying on Rails to provide Rich data needs for your clients
00:01:38.340 and then data teams where you have 45 Plus data engineers and over a dozen
00:01:44.520 teams that's what I'm going to be talking to you about today how doximity has solved
00:01:50.579 that problem to effectively sync the data between multiple rails microservices and enable our data teams
00:01:55.740 and our web application teams to work together the way that I'm going to do that is
00:02:01.020 first I'm going to start talking a little bit about the background talk about the domain a little bit about doximity our company and then we're
00:02:08.039 going to talk and we're going to Define explicitly what we mean when we say effective data syncing after that we'll
00:02:13.800 move into more of the application and Company growth that we experienced and
00:02:19.200 some of the things that we tried along the way to keep our data team and our web application team in sync and finally we'll end up on what I call our secret
00:02:25.680 sauce the architecture that has worked for us over the last several years and has enabled billions of effective data
00:02:32.760 syncs in our system now a little bit of background and domain now first uh our company is uh
00:02:39.480 called doximity it is a 11 year old rails based application
00:02:45.420 and our company is focused on a it's a professional Network Professional
00:02:51.540 Medical Network that is focused on enabling Physicians to save time so that
00:02:57.239 they can provide better care to patients we provide doctors with a lot of ways to communicate in a more modern way to
00:03:03.840 enable their workflows and also some continue education tools for them
00:03:09.120 and some of the products that we've developed over the years in order to enable that are like doximity dialer
00:03:15.900 that is a product that has facilitated over 100 million Telehealth calls in the
00:03:22.019 US and then another one that I'll be talking about a little bit later is called continuing medical education that
00:03:27.659 is an entire system where we ingest articles that are medically relevant and
00:03:33.720 we give them the doctors so that they can read them and get credit for that for their continuing medical education
00:03:39.360 we have Rich search that is enabled through our integration with rails
00:03:45.480 domain modeling we have secure faxing and messaging uh we also have Rich
00:03:51.120 profile data for our physicians you know they have simple things like name but also things like their specialty the
00:03:57.659 things that they've done and where they've went to college University those sorts of things and another important
00:04:04.860 area is medically relevant news we provide uh that for our teams and our
00:04:10.680 physicians in the form of a news feed and in order to
00:04:15.900 um oh and because of all of those features that we've had we've grown to a point where we have over 70 of all the
00:04:21.660 U.S Physicians and 45 of nurse practitioners and physician assistants as verified members on our site
00:04:28.440 now with that amount of features and users
00:04:34.080 we have to have a lot of teams in order to build that out at this point we have over 10 data teams
00:04:40.380 with 45 plus engineers and 20 plus application teams with over 70 Engineers that are building out and
00:04:48.180 maintaining these features and the system that I'm going to introduce you a little bit later are a data update tool
00:04:53.220 we've performed over 7.7 billion data updates since April of 2019.
00:05:01.020 so now let me know a little bit of the uh the background the domain that we're going to be talking about let's Define
00:05:06.360 effective data syncing what I'm talking about here is data integration in rails so you have many
00:05:13.380 rails based microservices and you have many different data stores how do we
00:05:18.479 move data to and from those different data sources while respecting application business logic without
00:05:24.660 breaking things now before we go into the talk about application growth I just want to give
00:05:31.800 like a preview of the solution that we ended up on in general it is a Kafka
00:05:37.320 based system that allows our data team to produce messages and our application developers to consume those messages
00:05:42.720 that they're produced they're able to work independently because of that now
00:05:48.660 in the beginning before we had all the teams all of the microservices we had the model f and our
00:05:57.240 monolith was quite Majestic sparkly had unicorns
00:06:02.580 and it had sprouted wings at some point and just to give a summary of how data
00:06:08.460 updates work in a monolith apple or monolith application I'd like to kind of Step through that
00:06:14.639 this is the way whenever you have a monolith trying to get data updates in but at the end of the day all we really
00:06:20.160 care about is that we're able to serve our users for us it's Physicians they don't care about all the stuff that
00:06:26.520 we're doing all that they care about is that they're able to get the stuff that they need and access the data that they
00:06:32.160 want whenever they want it but rails so fantastic at providing a
00:06:37.440 rich way to model all of the domain logic that exists and is distributed amongst
00:06:42.720 many data stores you know my SQL redis and there's also a lot of rails developers that are very familiar with
00:06:49.500 all of The Primitives that rails provides all the abstractions in order to integrate or in order to communicate
00:06:54.660 with those data sources and whenever you have a a data need something that is is
00:07:00.300 more simple say you want to go in and you know up case all the Physicians first names
00:07:06.240 business talks to the rails developers and the rails developers have a very well-known mature set of tools in order
00:07:13.259 to handle that crime rig active job rails console in order to get those data updates to go
00:07:20.520 through the rich domain modeling that rails has provided and sync it to all the data stores
00:07:26.880 and just as a way to demonstrate how fantastic rails is at keeping these data
00:07:32.039 stores in sync I'd like to talk about what you would do if you wanted to add
00:07:37.199 better search for your users and say you want to enable your Physicians to be
00:07:43.440 able to find each other by many other types of criteria so name where they went to University and you want to be
00:07:49.380 able to sort that by uh relevance and control scoring and that sort of stuff so your team decides to use
00:07:54.660 elasticsearch for that because it's very good at that sort of stuff how do you keep
00:08:01.080 your elasticsearch system up to date with your users now there's a lot of ways that you can
00:08:07.919 approach this problem but I think that this is one of the areas where rails shines with its Rich application domain
00:08:15.300 modeling now there's a lot involved with doing search effectively but whenever I'm
00:08:22.080 focusing on just the step of making sure that your user data stays in sync
00:08:29.280 whenever you change it with your elasticsearch index all that you really have to do to
00:08:34.740 accomplish that is create an after commit hook in your user model to schedule a background
00:08:40.680 elasticsearch sync and then have that method kick off a background job that
00:08:46.380 will successfully re-index the user in your elasticsearch
00:08:51.779 this is one of my favorite parts of rails it makes tasks like this so simple and straightforward and it's also one of
00:08:58.980 the reasons why over time you get a ton of very rich domain logic in your rails
00:09:04.680 models now let's talk a little bit about what happens whenever your business gets to a
00:09:10.620 point where it needs more more data like more experienced data
00:09:15.839 and here's some examples of reasons that that would happen so doximity is a very data driven company you know most of the
00:09:21.660 decisions that we make for new products leaning into specific areas of our
00:09:27.300 business are related to the feedback that we're getting from our users on whether they're
00:09:32.580 engaging with specific or specific products so uh there's a very sophisticated analytics Pipeline and we
00:09:39.600 need to know how that what we're doing is working in a timely manner doing that is very difficult it requires people
00:09:45.959 that have deep specialized knowledge and data pipelines and Analytics
00:09:51.000 now there's also some other features that we lean on quite heavily like machine learning recommendations and
00:09:56.279 data Pipelines but let's look at a real example of a
00:10:01.860 complex data need that we've actually built out so that we're talking in concretes so earlier I mentioned that we have a
00:10:08.279 lot of physician profile data these are things like usernames where people or not usernames but uh first name last
00:10:14.760 name where they went to University at their specialty their sub-specialty and then we also have that continuing
00:10:21.000 medical education that I talked about where we ingest articles and we're able to put them through a pipeline where we
00:10:28.740 can extract out things like who has been cited in other articles so business gets
00:10:35.160 this idea hey you know how empowering would it be for our physicians if they were able to
00:10:41.100 see that when they created a white paper or a journal somewhere that somebody
00:10:46.440 else cited their work in the other person's article
00:10:51.540 and the end result that we had was this is something that we've released and
00:10:57.120 Physicians like it it's it's empowering it's cool whenever people do work and other people rely on that and it allows
00:11:03.839 the Physicians to do things like you know one feel better about writing articles and write more articles
00:11:09.180 or journal entries or go in and double check that everything you know is jello with what they were saying
00:11:15.480 now doing this is a subtly complex problem there is a lot involved with
00:11:23.279 matching CME articles citations with Physicians Data you know the the first part of this is
00:11:30.000 you have to make sure that all the physician names are correct you know how do you do that in the first place
00:11:36.300 and then after that you have to make sure that you have the information there on the Physicians like what is their
00:11:42.240 specialty where did they go to university uh then for the CME articles you have to
00:11:48.360 clean and standardize all of the citation names that appear in these
00:11:53.940 journal entries that's also hard because there is no
00:11:59.160 standard format for this you know a journal could choose to do first name last name last name first name they could put any string of
00:12:05.940 characters that they want in there and then after you're able to get all the
00:12:11.220 physician data good and standardize all the cited names which are both hard processes by themselves then I think the
00:12:18.720 real difficulty starts where you get into name matching you know for common
00:12:23.880 names you know like Austin story that may not be too difficult but what if you have a very common name and you have
00:12:29.519 multiple Physicians that share the exact same name then you start getting into confidence scores where you look at the person's
00:12:35.760 specialty is this a journal entry related to the specialty that they would be uh writing in you know uh is this
00:12:42.540 related to something that they've done in the past uh so this this is very
00:12:47.700 very difficult and our data team does an incredible job at it but it does require
00:12:52.740 deep expertise in Specialists so how do we integrate these data
00:12:58.680 specialists that have their own unique sets of tools that they need into our existing rails
00:13:06.600 monolith application so that the things that they're doing get piped through our Rich domain models because they're used
00:13:12.720 to things like python spark jobs raw SQL how do we do this
00:13:18.120 well I'll tell you about the steps that we took in order to accomplish this and the first one is a question that
00:13:25.500 I ask a lot is what is the easiest way to do this and one of the easiest ways that you
00:13:31.920 could arrive at is to just get them direct database access and then promise
00:13:37.560 to be super careful but before you just walk away from that promise you have to solidify that with
00:13:44.940 one of the most binding contracts possible which is the pinky promise
00:13:51.779 now after you've made a picky promise to be super careful whenever they have direct database access
00:13:59.040 there's a lot of pro bonus and cons that you can evaluate with the system uh the pro is that it's very easy to
00:14:06.660 integrate them in the system you know you don't have to make any changes to anything that is over in your rich or in
00:14:14.100 your your rails ecosystem you just give them direct database access but that does come with a lot of cons uh
00:14:21.480 the first of which is that pinky promises are actually pretty hard to keep you know uh even if everybody has
00:14:27.360 the best intentions what if the context of that Pinky Promise changes and not everybody on the team gets the update
00:14:33.360 that the context has changed well if there's some tables that for some reason need to be highly available and you
00:14:39.360 can't change them uh During certain hours you know what if there's accidents
00:14:44.459 and even if there's no accidents and that Pinky Promise is completely managed
00:14:49.860 correctly you can't control the load here the data team can just go in and do whatever they
00:14:55.560 want whenever they want to you have to coordinate the load there so that they're not overwhelming specific tables
00:15:01.019 whenever you're needing to serve those to the Physicians and then I think the biggest reason that
00:15:06.060 I don't like this as the way that we do things is because even if all of that
00:15:12.360 happens perfectly you keep your picking promise the load doesn't ever bring down the site because the database servers
00:15:17.940 get overwhelmed the biggest thing here is that the application your Rich domain logic from
00:15:23.399 rails is not going to be respected whenever the data team goes in and does that update on all the users first names
00:15:31.139 there's no way for them to run all of the rich domain modeling that rails is
00:15:37.860 providing you you know you don't get the after commit whenever you're doing a direct update uh in in the database so
00:15:44.519 your elastic search job isn't kicked off also caching you know even if your
00:15:50.279 database team knows like okay whenever I update a record I need to also update the updated app to now whenever I do
00:15:57.120 that what about answer what about the uh other other models that are dependent on
00:16:03.779 touching you know like there's no way that you can expect the data team to be
00:16:09.300 aware of all of the other concerns that could be added in the future you know if
00:16:14.339 uh you need to touch your account whenever you update the user uh data team has no way to know that that needs
00:16:19.740 to happen so the application logic not being respected I think is the biggest reason why this one is isn't a reason that we went with it so the next way is
00:16:27.180 I think the the next most easy thing which is just an admin UI with uh with some rest over it
00:16:32.339 so instead of giving them direct database access you provide some API so that they can do their updates submit
00:16:39.420 them through some restful call and then all of your updates do get piped through
00:16:44.940 those Rich domain models at that point so that is much better there's also some other good benefits here uh you know
00:16:50.940 rest is well known and it can also be shared with other clients like you know web mobile apis but there are also some
00:16:59.160 downsides for this and also the reasons that we didn't continue
00:17:04.620 down this path the first one is that the limiting of the clients is completely
00:17:10.860 dependent on them respecting when they get a message to stop sending requests
00:17:16.319 uh you can't control that the clients are just fully like they just keep pushing stuff up and you just keep uh
00:17:22.319 responding with uh you know back off and make requests later also batch processing is difficult here
00:17:30.540 most of the time whenever you're doing batch processing there's different subtle needs for each type of process
00:17:36.840 that you're going to be pushing it through and what we ended up with was a lot of snowflake type setups where each
00:17:44.460 type of update was different and each batch processing update was different so
00:17:50.580 we didn't stick with this long term but we landed up on what I'm going to call temp tables plus sync internally we
00:17:55.919 call this our data update tool volume one and with this type of setup you know
00:18:01.559 instead of the direct database access or the rest uis you create some tint tables
00:18:07.080 data team populates those temp tables and then you create a tool so that
00:18:13.440 whenever the database team wants to they can sync those tables to the real rails
00:18:19.440 application tables and we ran with this one for a while several years the pros on this were pretty high you
00:18:26.340 know we had no direct access to the main database so we didn't have to worry about them accidentally taking down the
00:18:32.039 main database uh we were able to separate the data writing from the consuming which is
00:18:37.140 fantastic but there were also some cons here uh it was hard to manage the load here
00:18:43.880 and our batch processing was pretty difficult using the batch processing
00:18:49.080 system that that we were using so this worked probably could have worked forever for us but right as we got this
00:18:58.020 temp tables plus sync solution running the size of our team started growing a
00:19:04.320 lot and we started pulling up a lot uh bringing up a lot more line of business applications
00:19:10.380 and that's what I'm talking about right now so we had a lot of growth about this time so we had our main application which had
00:19:17.640 the setup that I just talked to you about where we had you know rails developers and data team working in harmony through this temp table
00:19:23.460 and then we also brought up our news feed about the same time this is where we're delivering all the medically relevant news that I was talking about
00:19:29.520 we brought up another service to handle all of our colleaguing in order to you know facilitate connections between
00:19:34.740 people in our Network how do we manage this
00:19:40.440 sort of set up so that each team is not having to maintain their own way for their data team to integrate with the
00:19:46.740 rails application because keep in mind at the end of the day Physicians don't care about the way that our back end is
00:19:52.679 set up all that they care about is that they're able to get the data that they need and that's what we should be focused on you know enabling the teams
00:19:59.100 to work so the Physicians can get what they need so that their jobs are easier
00:20:04.440 cool so now we're going to move on to the architecture that works our secret sauce here
00:20:09.960 just to tie this back again to our definition effective data seeking how do we move around data to and from the different
00:20:16.679 sources while respecting the application business logic and not breaking things now let's talk about how we do this
00:20:22.440 so we just went through a lot of growth we had the temp tables plus sync solution going
00:20:28.500 and that gave us an opportunity to kind of pump the brakes a little bit and say hey you know what we have here is good
00:20:35.760 but how do we build this in a way where it is scalable where it is going to grow
00:20:41.039 completely with our sys with our our team as we get more teams and more micro services
00:20:46.500 so we were able to sit down and Define a vision set some goals some requirements
00:20:51.539 for what we wanted our system to work with or look like for our data updates
00:20:57.240 and the first thing was that it has to work with our existing code base you know like just completely stopping
00:21:02.760 developing for six months or a year is absolutely not an option so it has to work with what we're doing right now
00:21:08.640 it also has to be easy to use for the data team that was some of the feedback that we got from the data update tool you know it was an extra step is a
00:21:14.880 little bit more complex for them to go in and update things has to support multiple apps out of the
00:21:22.020 box so it has to be easy for us to bring a new application into the system safeguards to avoid disaster if the data
00:21:28.679 team is producing too much data we need an easy way for that to not be impacting our web application servers that are
00:21:34.740 running all those updates through the ridge application domain models it has to be bulk processing by default
00:21:40.559 we wanted this to be a first class concern in our system and we also wanted a complete split
00:21:46.380 between the people that are producing data and the people that are consuming data completely independent
00:21:53.039 and in order to fulfill a lot of those needs we ended up reaching for a tool
00:21:58.679 called Kafka and I'll talk a little bit about it it's not the most important part of this but just know that it's a
00:22:04.500 tool that we used in order to fulfill these needs so Kafka the things that it fulfilled for us are it allowed us to
00:22:10.380 split the producers and consumers apart it kind of acted as like a bridge in between those it gave us multiple app
00:22:16.440 support easily and it also gives us safeguards because the data team or the the producers of the data are completely
00:22:22.200 independent from the the consumers of that data so they're able to be independent they were also able to go at
00:22:27.299 their own speed The Producers could produce as fast as they want to the consumers could consume as slow or as
00:22:32.580 fast as they want to and there was now no longer a need to communicate between them you know you
00:22:38.820 didn't have to go reach out to the application team before a data team was doing a big push and uh
00:22:44.640 what we added was an app topic for each of these
00:22:50.520 um applications in order to allow them to communicate on just their bandwidth so
00:22:55.679 you know like at the main main topic you have a news feed topic you have a colleague's topic and for those of you that are not
00:23:02.460 familiar with Kafka at all I'll just do like a really high level overview the easiest way to think about it is imagine
00:23:09.840 that you have a way to send a message to somebody
00:23:16.020 and do it in a Json payload you know it's not restricted to Jason but that's
00:23:21.659 how we we use it internally and you're able to put it on a
00:23:27.179 text file and you just keep appending to that text file and Kafka provides all of the
00:23:34.260 abstractions that you need so that you can distribute this text file and you can consume this from this text file in
00:23:40.919 a way that is very fault tolerant and resilient and the producers produced to this this
00:23:46.679 file and then a consumer basically just gets like a pointer to the text file and they just start reading and then they can stop and start as much as they want
00:23:52.980 to so another way to think of it if you're familiar with active support concerns or
00:23:58.140 active support notifications you uh are able to dispatch using active support notification and it's just persisted
00:24:04.620 somewhere and you can read chronologically through all of those notifications at your leisure
00:24:10.620 cool so how did this look for our system uh you know we have data team split on the left and uh our data team on the
00:24:17.159 left and our application team on the right we wedge Kafka right in between those two and the way that it works is the
00:24:24.600 data team is going to produce into a Kafka topic and our application team is going to
00:24:30.840 consume from that topic now before we go into a little bit more
00:24:36.900 details about how the data team produces and how the application team consumes I want to talk about a couple of Primitives that are kind of core to our
00:24:43.260 system and the first is called an operation uh
00:24:48.600 keep in mind we've done uh 7.7 billion of these at this point um
00:24:54.720 and it's composed of a few parts first it is a command to change the data in
00:25:00.240 some way and these are the normal things that you would expect you know create read or create insert update delete
00:25:05.400 upsert they're also self-contained an operation has to be able to live completely on it
00:25:12.299 on its own it has everything that it needs so that whenever it is when the operation is consumed somewhere the
00:25:18.960 consumer can do everything that it needs to with it and it must belong to a batch even if it
00:25:24.659 is a batch of one this is how we make sure that batching is a first class concern
00:25:30.360 and some of the specifics about how we did our operation so I mean we created the operation uh concept and then
00:25:37.080 whenever we are uh dispatching out these operations I'll show you an example of what one looks like later uh we used
00:25:43.260 Avro and Json in order to do this Avro is a schema that allows us to validate
00:25:49.620 the the format of the the values that are coming through but the two things that I want to focus on here are model
00:25:55.740 and type those are two very important Concepts here because it's how we allow
00:26:01.559 the consumers and we'll talk about this later to look up the right importer in order to pull the data in and it also
00:26:07.679 includes things like an identifier which is a key value pair in order to look up the uh the object a batch ID that it belongs to and then
00:26:14.940 any address attributes that it needs in order to update so you know if you're updating somebody's name uh you would
00:26:21.179 send name and then the new name that you want and then the requester so that we can give people alerts and also audit
00:26:29.279 cool another primitive batching uh this is so this solely exists as a way to
00:26:34.440 track manage and Report bulk process operations only reason that it exists
00:26:41.640 cool so here's the diagram of what our system looks like in order to facilitate
00:26:47.820 the data team being separated from the application team and here up at the top we have symbolized by python the data
00:26:55.200 processes that's our Orange Box we have kopka as our purple box
00:27:01.980 and our rails as our green box so in general the way that this works is the
00:27:07.799 python process there The Orange Box is going to produce it's going to write to a topic
00:27:13.620 and that topic is going to really just sit there and then on the rail size with
00:27:18.960 the with the green they have a consumer and that consumer is going to read from that same topic
00:27:24.120 now it's also going to do a couple other things uh well it's going to do uh it's going to write to a results topic as it
00:27:32.640 is consuming and it's also going to reach out to a main controller which is our red
00:27:38.220 in order to check and see if it needs to just stop doing what it's doing for a
00:27:43.440 little bit this is uh how we prevent uh disasters from happening and then our red box down there in the bottom left
00:27:50.279 this is the only time we'll talk about this but we have a metadata consumer where we're able to look at both the
00:27:57.059 operations and the results of them and be able to report on the status of these
00:28:03.120 uh batches that the python side is writing to the topics
00:28:09.179 cool so let's talk more specifically about the data side so data producers that is The Orange Box
00:28:15.779 and there are a few things here they are able some of the things that are important for the data team to work
00:28:21.720 independently they have to own their own data stores and be able to work completely independent of anything in
00:28:27.120 the application side so when the data team is doing processing uh you know a good example of that is the CME and
00:28:34.200 profile example that I talked about where we are associating citations uh to real profiles
00:28:39.900 they have to do that in their own data stores so they pull the data in they do any transforms that they need to and
00:28:46.080 then whenever they are done with what they're doing they'll run a python or
00:28:51.900 submitted or a python script or submitted job and that will write to the Kafka topic that they are targeting
00:28:57.480 there's also several other ways that we can produce into this it does not have to be the data team you can do this from
00:29:03.480 any language and I'll show you exactly what that looks like here in a second but you can do this also from a web UI
00:29:09.120 that red part that I talked about earlier we also have a web UI that allows you to submit jobs and then you could also just do this in any like
00:29:15.360 other rails based system so more specifically this is an example of what a data producer python script
00:29:22.440 would look like so up in the top we have a module that we've created and this
00:29:29.159 module allows you to create a batch notice that with that batch you just specify a few things one is like your
00:29:35.520 application Target you know I chose my app here that would be the application that is going to be piping this through
00:29:41.880 its Rich domain models put your username that is who you are for auditing and then I mentioned
00:29:49.860 earlier that there was a model and a type that is right there where it says the job is the model name uh that is
00:29:56.279 very important because it tells the system which which model that they are targeting and then last is prioritization this
00:30:03.299 isn't something that you would need to do in your own implementation of this but that's something that we've added and it
00:30:08.880 uh it really helps the the data updates go through so that is wrapping up the batch then once you have your batch you
00:30:15.240 are going to just add some option add some operations to that here we are adding an insert uh and we are passing
00:30:21.480 some attributes that I mentioned earlier that is the thing that we are that is
00:30:26.940 that represents the update that we are pushing through the system and then we are adding another operation
00:30:33.659 after that that is doing a separate update with a description and
00:30:39.840 this python script can look however you want to you know you you really just need to build some sort of an
00:30:45.179 abstraction so that your data team can can interact with creating or producing data into this topic but the message is
00:30:52.020 actually end up looking something like this so after you run the script it'll produce a bunch of messages that look like this
00:30:57.360 and this can really look however you want as well these are just some decisions that we've
00:31:03.659 made some things I want to point out here we have the batch it gives you the ID and then the size of the batch in the past or in the previous slide we created
00:31:10.440 two operations so this has two operations in that batch we have the index of this update in that
00:31:16.020 batch we have a type and a type that is an update and then a model that is the
00:31:21.480 job so we've given in this operation all of the context that is needed
00:31:27.299 for some process that is using it to be able to read update or to
00:31:34.799 find the model that it needs and to update it properly cool so we talked about the data side
00:31:41.820 now let's talk about the application side so the app consumers that's this rails part up here what are some of the
00:31:47.820 things that it needs first as part of consumption you have you'll need the concept of a dispatcher
00:31:54.240 so as you are reading from this topic you will need something that will be able to look at the messages that's
00:32:00.059 coming in and find the correct importer for that message
00:32:05.279 for us we use model and type so you tell the model you say that it's a you know an update and then we use that to look
00:32:11.640 up the proper class in order to run that through our system and then for the Importer this is also Implement what you
00:32:17.640 need some of the things that have been important for us are permitted attributes so you know
00:32:24.480 specifying in advance exactly what is allowed to be updated through the system you probably don't want admin Flags to
00:32:30.779 be toggled here you know or you know maybe you do but just whatever you're needing for your system
00:32:36.480 and we also created a super like a parent class uh abstract class that you
00:32:41.520 can inherit from and then you implement import whenever you need to to do something special in order to send these
00:32:48.059 through our system and another important idea is whenever we are returning from these results we return a operation
00:32:55.440 Operation result like a more of like a value object as opposed to just uh like a straight up hash
00:33:00.960 now this is the first example of what an importer is uh the your base importer
00:33:07.500 will need to do a few things like the first thing it will need to wrap all of
00:33:13.320 the logic for communicating with Kafka I omitted that from this because I don't think it's important for this talk but
00:33:20.340 you'll need to handle things like batch sizes with Kafka you know your topic configuration all that sort of stuff but
00:33:26.700 the thing that that is important I think is this has all of the logic that's related to consuming so earlier I said
00:33:33.419 it's important to add permitted attributes here we add a class attribute that allows you to specify some
00:33:40.019 permitted attributes we initialize this with some operations and then say hey
00:33:45.179 you need to implement the import method so that you can do what you need to do then you can also provide some helper
00:33:52.679 methods okay then you can also provide some helper methods like I have there at the
00:33:58.919 bottom which is like the failed operation result uh to make it a little bit easier for people to
00:34:03.960 implement the stuff uh and then here's an example of something that is going to inherit from that this is a basic insert
00:34:10.440 importer so here as part of the import method that we have to Define you know
00:34:16.080 we initialize an array with some results and then we look at the operations
00:34:22.440 and take the first model and constantize it because we've had a dispatcher that's dispatched to this
00:34:28.260 we're able to look up the model because we know that it's been dispatched properly and we're only going to be
00:34:33.359 importing based on one model then after that we Loop through all of the operations and then for each operation
00:34:39.839 you'll Define your business logic here this could be different you know for the most part it's just going to be a lookup
00:34:45.839 by ID but you find all the operations and then you slice the permitted attributes and update the object and at
00:34:52.740 the end you add the results to that array return at the end
00:34:59.099 once you have those two uh abstractions you can build on it in order to really easily Implement other importers like
00:35:05.220 here's an example of a city importer so on this because we've built the
00:35:10.800 Importer and the basic insert importer unless all we have to do is implement the permitted attributes and then
00:35:16.140 anybody that's producing in the system can update a uuid and a name anytime that they want to and we don't have to
00:35:21.359 override the import method we have something more specific that we needed to do whenever this message was coming
00:35:26.880 in you have the ability to implement the uh the import method itself to overwrite
00:35:33.240 that but you know the goal is to not have to do that as much as possible
00:35:38.339 cool so we just talked about some of the specifics related to how we've enabled
00:35:44.339 the data team to be able to work in Python and do all of their data processing
00:35:50.099 and how we use Kafka to integrate them and separate the concerns between them producing and the application side
00:35:56.099 consuming and we did it in a way where it is scalable and easy for the data team to
00:36:02.520 use so when we started we said hey these were the goals that we came up with you know has to work with existing code base
00:36:08.099 is used by data teams for multiple apps safeguards for disaster bulk processing
00:36:13.140 by default and independent concerns I'd say at the end of this we have really filled the need and that we've had over
00:36:19.440 7.7 billion data updates with this system uh really close to how how much we've leaned on to it in order to
00:36:25.980 provide the ability for our data teams to work independently of the application teams and the application teams to model
00:36:32.280 using rails all that rich domain logic cool so we just talked about
00:36:38.820 um the effective data syncing between rails microservices uh thank you so much for watching this talk just as a summary we
00:36:45.599 talked about the domain how doximity uh is a physician-first medical Network we have a lot of line of business
00:36:50.940 applications and our team has grown a lot and these coffee-based solution that
00:36:56.820 we arrived at in order to facilitate the application and the data teams to work independently of each other in synchrony
00:37:04.040 and a couple things I want to point out here uh if you like anything that I've said here go to workout toximity
00:37:10.320 um any questions uh if you're at railscomp I'll be in the effective data syncing between rails microservices
00:37:17.099 Discord Channel otherwise you can p me on Twitter Osteo 36 and I'd like to give a special thanks out for the slide
00:37:23.940 assistance to our one somebody that's on our design team named Hannah she is the reason that these slides don't look like
00:37:29.760 they were made by a caveman so thank you all very much for attending the talk and if you have any questions please reach
00:37:35.579 out thank you