00:00:05.299
thank you for coming to my session the theme of this session is scaring rare's
00:00:12.660
API to write heavy traffic first I will introduce myself
00:00:19.199
I am attack massage a software engineer and product manager at DNA
00:00:25.800
I am developing back-end systems for mobile game apps I love developing and
00:00:31.920
playing games I'll also Rob Ruby and Rion rails this is our company DNA our mission is
00:00:41.340
we Delight people beyond their widest dreams we offer a wide variety of services
00:00:48.920
Games sports live streaming Healthcare Automotive e-commerce and so on
00:00:57.780
we started using multiple databases for e-commerce in the 2000s
00:01:06.420
today game
00:01:12.540
and multiple databases this is the agenda for today's session
00:01:20.939
first I will present an explanation of mobile game API backends on the mobile
00:01:26.880
game app characteristics also we will take a quick overview of scalable
00:01:33.840
database configurations then we move on to the API designs for
00:01:40.200
right scalability there we will explore the application API response design and
00:01:46.619
charting after that we will check some pitfalls around multiple databases and
00:01:53.100
test with multiple databases then I will share the consideration of
00:01:58.500
the past present and future of multiple databases at our game servers
00:02:05.340
finally we will sum up the session
00:02:10.979
okay let's start with the mobile game API backends
00:02:20.700
generally speaking mobile game may appear back ends are split into two categories one is real-time servers and
00:02:29.280
the other one is no real-time servers in this session we will focus on
00:02:36.440
non-real-time servers the objective of these servers are to provide secured or
00:02:43.379
data storage and transactions multiplayer or Multi-Device Communications for example
00:02:50.360
non-real-timer non-real-time servers handle inventories friends daily bonuses
00:02:55.980
and other transactions this figure illustrates an example of
00:03:02.819
API calls a client wants to consume the energy and help the friend the server
00:03:09.300
conduct a resource exchange transaction and returns a successful response
00:03:17.220
this slide explains the system overview of our mobile game API backends we
00:03:23.879
started developing our game API server in the mid-2010 the system is running on
00:03:31.140
Ruby on Race 5 2 and mainly running on iaas
00:03:36.360
the main API is a monolithic application and there exists other components as
00:03:43.860
separated services such as proxy servers and management tool and HTML servers for
00:03:50.940
webview we utilize APA mode to minimize
00:03:56.220
controller modules and overhead also we employ scheme First Development by
00:04:02.700
protocol buffers we use MySQL memcast release and
00:04:07.980
elasticsearch now we will look at mobile game app
00:04:15.180
characteristics
00:04:22.199
how do you try a new game app store link tap the install button wait a
00:04:29.820
minute and play it is super easy for players play to
00:04:36.479
play however it is super easy for servers to
00:04:42.180
write this slide shows the right heavy traffic
00:04:47.940
reasons first as mentioned earlier easy install leads to easy right access
00:04:54.900
downloading is very easy and mobile up distribution platforms ensure installs
00:05:00.840
from all around the globe there is no need to make an account and the first
00:05:06.900
in-game action means first right secondly various in-game actions trigger
00:05:14.160
accesses especially in write such as receiving daily bonuses playing battles
00:05:20.580
buying items and Etc generally speaking the order of a
00:05:26.940
magnitude for mobile apps ranges from 10 kilo requester per second to one Mega
00:05:34.440
requester per second therefore scaling API to write heavy
00:05:40.199
traffic is required here
00:05:46.259
now we will take a quick overview of a scalable database configurations
00:05:57.300
this slide is an overview of our database configurations
00:06:02.699
from the application side we use vertical splitting horizontal shabbing
00:06:08.340
and read write splitting on the other hand from the infrastructure side we conduct load
00:06:15.960
balancing on readers and automatic writer failover let's look at them one by one
00:06:25.440
the first technique is vertical splitting we split the databases into two parts
00:06:31.979
the first part is common there is one writer node and this is optimized for
00:06:39.060
read heavy traffic we prioritize consistency of our
00:06:44.520
scalability here for example game setting data and
00:06:49.680
pointer to Shard are stored in the database
00:06:56.340
the other part is short which is the main focus of today's session there are
00:07:02.340
multiple writer nodes and this is optimized for write heavy traffic we
00:07:08.160
prioritize scalability over consistency here also this shared database is
00:07:15.600
horizontally charted for example player generated data is stored in the database
00:07:24.419
this is the horizontal shading we take data locality on a player and partition
00:07:30.840
them by player ID not by resource types this architecture is effective to hold
00:07:38.160
consistency on each player and efficient for player-based data access
00:07:45.240
we as a mapping table called player shards to assign a shard
00:07:50.880
we decide A Shard on the player's first axis by weighted sampling this Dynamic
00:07:57.840
shared allocation is flexible for adding new shards and balancing the load on
00:08:04.020
them there is a limitation of a maximum inserts per second for this architecture
00:08:11.460
for example thousands or tens of thousands of queries per second
00:08:17.460
are the limitation of my SQL if we hit the limit we can use the range
00:08:23.639
mapping table or distributed mapping tables
00:08:29.220
of course we utilize rewrite splitting 2. in our case we use manual read write
00:08:36.539
splitting this is complex but efficient APA servers access writer nodes to
00:08:43.500
retrieve data used for transactions and the previous own data
00:08:49.680
on the other hand APA servers access region nodes to retrieve data not used
00:08:55.920
for transactions and the other players data
00:09:01.440
from the infrastructure side we employ several multiple database systems the
00:09:07.620
key concept here is do that outside of rails this concept increases flexibility
00:09:15.540
and achieves separation of concerns in our case with a dns-based service
00:09:22.920
Discovery system with the custom DNS resolver on Rails blood balancing is conducted on the
00:09:30.779
weighted sampling of IP addresses and automatic writer failover is executed by
00:09:37.080
changing the IP address to a new writer node
00:09:43.140
so every aspect of multiple databases is indispensable for scalability
00:09:51.779
let's dive deeper into some critical features
00:09:57.839
now we are going to explore the API design for right scalability
00:10:06.839
let's begin with traffic offload by replication
00:10:13.640
replication is an effective technique to offload read traffic
00:10:18.959
here we replicate ddl and DML from writers to readers and almost the same
00:10:27.480
data is available on reader nodes by offloading read traffic to readers we
00:10:34.080
can reduce the writer's load and make the system scalable for read traffic
00:10:43.920
however there is an application delay historically speaking the latest data is
00:10:51.720
only available on the writer node
00:10:57.240
then how do we balance between the scalability and the consistency
00:11:06.779
the solution is read your right consistency which ensures consistency on
00:11:13.860
your right this slide shows the radio right
00:11:19.800
consistency in railroad 6. the solution is time-based connection switching when
00:11:26.760
there is a recent right by the death changer Raiders connects to a writer
00:11:32.399
this connection switching is the default in rails 6 and 2 seconds replication
00:11:38.399
delay is tolerated at default using this architecture we gain high
00:11:45.420
scalability for really heavy applications and strong consistency is
00:11:50.820
guaranteed where players care on the other hand there are mainly two
00:11:56.760
columns for this architecture the first one is that this is not
00:12:02.700
suitable for write heavy applications because of the higher proportion to the
00:12:08.040
writer traffic offloading is not so effective
00:12:13.079
also we have to take a severe balance between replication delay and the right
00:12:19.800
read ratio when the Tool elevated replication delay increases the right read ratio also
00:12:28.320
increases which means not scalable so we must further improve this for
00:12:35.640
right heavy traffic this slide explains the radial right
00:12:42.899
consistency for write heavy applications the solution is a resource-based
00:12:48.899
strategy players own data and critical data affects from writer and other digital
00:12:55.860
effects from readers one with the pros of this architecture
00:13:00.899
is defectiveness for right heavy traffic also much longer replication delay is
00:13:08.040
tolerated say tens or hundreds of seconds this magnitude of replication delay is
00:13:16.800
often invoked by Cloud infrastructure figures or By Request search
00:13:23.040
this architecture is effective to offload interplayer API traffic because
00:13:28.860
there are fewer connections to writers the counts of this architecture are the
00:13:35.700
complicated business logic and ineffectiveness of offloading into a player API in other words this
00:13:43.920
architecture helps greatly enhanced scalability but further Improvement is
00:13:49.200
required considering the interoperial API
00:13:56.279
okay now we will take a look at traffic offload by resource Focus the API design
00:14:10.560
let's conduct a case study the scenario is buy hamburgers and add
00:14:18.899
them into an abstract there is a knapsack with items a wallet
00:14:24.360
with money and infinite hamburgers player a buys hamburgers and add them to
00:14:32.459
the knapsack the Wallet balance will decrease and the items in the knapsack will increase what apis are required for
00:14:41.399
this scenario there are three required apis create
00:14:49.320
Burger purchase index item in knapsack and show wallet you can see the
00:14:55.500
railroads roots on your right hand side the roots consist of three resources
00:15:02.040
let's see the API cover sequence on the left hand side in the first axis a client calls index
00:15:10.199
item in knapsack and show wallet then the player decide what to buy and
00:15:17.160
the client calls create Burger purchase index items in knapsack and show Wallet
00:15:23.040
balance again the player decides what to buy and
00:15:28.500
the client calls all the apis they are mailing API calls to access
00:15:35.820
writer of this right heavy traffic is because of the recent ride for regional right in
00:15:43.860
rail 6 and because of the own resources for radio right for right heavy traffic
00:15:53.880
how do we solve this right heavy API covering
00:16:01.380
question is why do we fetch the data from a writer
00:16:07.019
this is because the items and wallet are changed
00:16:13.740
what exactly the changed items and worried
00:16:20.940
we can find them in the controller they are the resources client want to
00:16:26.399
watch and the resources write requests affected as a side effect and the
00:16:32.100
resources various has on memory as active record instances
00:16:40.500
so the solution is to include all the affected resources in API response
00:16:47.880
in this scenario there are two kinds of affected resources the first one is
00:16:54.360
Target resource which is the primary purpose of the API in this case the
00:17:00.060
burger projects the other one is affected resource which is the side
00:17:05.520
effect of the API in this case items in knapsack and wallet
00:17:12.959
on the Json sample on your right hand side you can see all the resources are
00:17:18.419
included in the response and you can see the API core sequence below the Json
00:17:25.079
here the client calls index item in knapsack and show wallet on the first
00:17:31.080
axis this sequence is the same as the original API core sequence
00:17:37.380
however after deciding what to buy the client calls only create Burger purchase
00:17:44.640
well this is because all the required resources are included in the response
00:17:50.160
of create variable purchase so there are no read API requests except
00:17:56.880
the first one and there are no writer database accesses except for the
00:18:01.980
transaction itself the other sense of request is beneficial both to databases and to
00:18:10.020
clients although there are duplicate resources and increased complexity we have a major
00:18:17.880
writer databases focused on transactions which is the only thing not achievable
00:18:24.120
by readers
00:18:29.580
this slide explains the summary of the mixed result of replication and resource
00:18:36.059
focused APK design on the one hand radial right consistency
00:18:41.220
based on data's owner is effective for write heavy traffic
00:18:46.520
interplayer apis and sharding on the other hand including affected
00:18:52.799
resources in right to API response is effective for minimizing read API
00:18:59.760
requests underwriter database accesses also it is effective for interplayer API
00:19:09.360
as a mixed result we have obtained the following effects firstly we have gained
00:19:16.860
a complex but adaptive architecture due to manual read write switching
00:19:23.160
secondly we have achieved maximum traffic upload due to affected resources
00:19:29.700
in right API responses thirdly we have achieved robustness of a
00:19:36.360
replication delay the architecture tolerate a very long application delay
00:19:44.340
so we have achieved both consistency and right scalability with higher robustness
00:19:49.799
of our application delay
00:19:55.020
now we successfully let writer databases focus on transactions
00:20:02.700
however what if the number of transactions exceeds the writer databases capacity
00:20:11.160
the answer is sharding now let's see the scalable data locality
00:20:19.080
for sharding
00:20:25.919
General multi-tenant sharding splits the data by the owner Talent
00:20:31.679
there is no interaction between shards
00:20:37.260
this is the scalable architecture we only have to add another chart to scale
00:20:45.440
also starting is the valley architecture to guarantee the best data locality a
00:20:52.320
multi-canon sharding
00:20:59.880
how about a single tenant multiplayer game
00:21:05.820
interaction between shards is the most critical feature required for single
00:21:11.460
tenant multiplayer game players in one large game world must
00:21:17.340
interact with each other also sharding does not guarantee the
00:21:25.380
best data locality by itself furthermore Charlene can invoke a
00:21:32.460
consistency issue on other failures or network failures
00:21:42.480
so we have to handle it
00:21:47.940
the objective of that locality is to improve performance and consistency by
00:21:54.720
locating data wisely the basic strategy here is first consider closed chartered
00:22:01.620
operation cost at expensive second decrease the access shared and
00:22:07.440
additional computation per request then make a balance on accumulated cost
00:22:14.460
let's conduct some case studies on the architecture
00:22:21.299
the first scenario is train myself and respond to call for help
00:22:30.419
in this scenario player a and player B trained themselves then player C calls
00:22:38.760
one of them for help here there is no need for the latest
00:22:44.580
trained state the request consists of 95 everyday
00:22:50.520
training requests and five percent occasional help requests
00:22:56.100
the solution is a pool architecture you can see the layers and databases on the
00:23:01.559
figure we locate the data to the ownership indexed by player ID then 95 percent of
00:23:09.900
write requests involve just one writer and five percent read requests involve
00:23:16.380
one writer and one reader this architecture is performant on the
00:23:22.260
majority write requests the next scenario is send gifts list
00:23:30.539
gifts and receive gifts in this scenario player a and player B
00:23:39.360
send gifts to player C player C lists them in chronological
00:23:46.140
order the requests consist of 30 occasional
00:23:51.539
sending requests and 70 percent everyday listing or receiving requests
00:23:57.419
you can see the figures on your right hand side the solution here is a push architecture
00:24:04.679
we locate the data to the receiver desert indexed by receivable player ID
00:24:09.960
sent out by doing all this 30 percent write request involve two writers and 70
00:24:18.059
percent read write requests involve just one writer
00:24:23.280
although there is an additional cost on sending gifts this architecture is
00:24:29.460
performed on the majority read write request also there is no merge sort cost due to
00:24:36.840
prey indexed data the third scenario is become friends and
00:24:44.520
listen to friends in this scenario player a player B and
00:24:52.260
player C list friends player C become friends or linked with
00:24:57.720
the B the request consists of 95 everyday
00:25:02.880
listing of requests and the five percent occasional linking requests
00:25:08.940
the solution here is double right architecture we locate the data to both shards
00:25:16.799
indexed by player ID and friend player ID
00:25:22.980
then five percent write requests involve maximum two writers a 95 percent read
00:25:30.840
request involve only one reader this is performant on majority read
00:25:37.919
requests let's look into a more complicated
00:25:43.740
scenario now the scenario is join the gears and
00:25:49.919
defeat a raid boss player a player B and player C join a
00:25:55.380
guild they fight together against the powerful enemy
00:26:00.600
they share a limited amount of consumable weapons each player's Attack under the enemy's
00:26:07.679
attack change the current situation both acid and performance are critical
00:26:13.500
for updating playered Health stage formation and elements Health State and
00:26:20.880
gears remain new weapons and State the solution is partial resholding
00:26:29.640
from the databases Viewpoint players belong to different shards and players
00:26:36.059
belong to the same Guild both acid and performance are required here
00:26:42.779
first select another Shard for the guild by weighted sampling
00:26:48.299
you can see the new Shard with the sunglasses on the figure on your right hand side
00:26:54.539
then reference the new Shard from each player's original shot
00:26:59.640
at the design phase we extract critical resources for the raid boss
00:27:06.299
for example Health stage formation weapons and so on
00:27:12.179
then locate the extracted resources on the geared's child
00:27:18.000
this achieves acid and performance because the number of involved shards is
00:27:24.720
only one for transactions we can leave the non-critical resources
00:27:30.720
to the original shards by this way there is no cross-shard
00:27:37.740
overhead for guild-based transactions and apis
00:27:43.140
and acid is also guaranteed
00:27:49.980
we have successfully improved data locality and achieved scalability
00:27:59.520
this data locality technique benefit not only conventional databases but also
00:28:07.200
some distributed databases considering the under the hood architecture
00:28:16.200
however we have not yet fully solved the difficult issue for conventional
00:28:22.620
databases now let's look at consistency for
00:28:29.159
shutting the last part of API design
00:28:39.240
keeping consistency over multiple databases is a difficult problem
00:28:47.159
especially back in the mid-2000 tons when our game server was
00:28:53.279
born someone says you know two-phase comic
00:28:59.580
can ensure a Domesticity of multiple databases like Excel transaction
00:29:08.340
this is the overview of the two-phase comic for comparative study first of all
00:29:15.000
there are two kinds of participants here one is transaction manager in this case
00:29:21.720
API server and the other one is resource manager in this case mySQL database
00:29:29.340
the sequence is as follows first transaction manager ask resource
00:29:36.240
managers to update records as usual and transaction manager ask resource
00:29:42.480
managers whether or not they are prepared for commit then resource managers execute the
00:29:50.399
transaction up to the point where they will be asked to commit
00:29:56.340
after that resource managers respond to transaction manager so that they can commit you or not
00:30:03.480
if all the resource managers can commit then transaction manager tell over the
00:30:08.760
resource manager to commit it however if none of the resource managers
00:30:14.399
cannot commit the insurance action manager tell all of the resource managers to rollback it
00:30:21.659
this is simple at least for nominal cases
00:30:27.179
you can see the more casual description of that sequence in the picture on your
00:30:32.580
right hand side here transaction manager asks resource managers whether they can commit or not
00:30:41.460
transaction manager is asking for an explicit agreement here
00:30:50.100
however unfortunately the my secret document says prior to my sequel 577 XC
00:30:58.140
transactions were not compatible with the replication at all
00:31:03.960
also two phase commit like XA transaction has potential issues for
00:31:09.659
scalability prioritizing scalability we have to
00:31:16.320
relax the optimistic while maintaining the practically acceptable consistency
00:31:25.380
let's conduct another case study for this consistency
00:31:30.419
player a consume item of item in knapsack player would be receive the consumed
00:31:38.039
amount as a gift pretty easy isn't it
00:31:44.940
this code is the result of a venture consistency implemented in raid 6. seems
00:31:52.500
difficult because of connected to but do not worry we will explore this
00:31:58.799
one by one rule one is reduce the risk of crosstalk
00:32:07.500
transaction anomalies before proposing this rule we should
00:32:13.140
check the precondition for relaxed cross-shot transactions first all resource managers are composed
00:32:21.779
of the same architecture MySQL and then resource managers are prone to fail for
00:32:29.399
a continuous period of time for example failure in Cloud infrastructure can
00:32:35.279
trigger this kind of failure also resource managers are unlikely to
00:32:41.399
fail on the young comment let's look at the figures on your right
00:32:47.220
hand side there are two MySQL databases in the middle the top part of this figure is a
00:32:54.059
conventional two-phase commit and the bottom part of this figure is the proposed relaxed transaction
00:33:01.559
in the latter case transaction manager asks the resource manager to update the
00:33:06.779
record and the resource managers respond with ok then transaction manager assumes that
00:33:13.860
yeah they can probably comment and transaction answer execute the
00:33:20.460
commit order let's take a look at radius code achieved a relaxed transaction
00:33:28.200
in the beginning there are two consecutive nested transaction blocks this nested transactions execute a
00:33:36.059
consecutive commit at the end of the transaction block right after the beginning of the
00:33:43.140
transaction block you can see the player.blog.findplayer.id
00:33:48.600
this code is locking granular resources to avoid race conditions during the
00:33:55.140
transactions which further ensures the success rate of the transaction
00:34:02.220
the main business logic is omitted in this example
00:34:08.220
finally toward the end of the transaction block they are a consecutive safe methods this chord is the critical
00:34:15.899
component in Redux transaction the Ruby code invokes consecutive update
00:34:22.800
queries followed by consecutive commit queries which minimizes the time and
00:34:29.399
uncertainty of a transaction manager and resource managers
00:34:34.859
but typically expected but a very unlikely anomalies are as follows
00:34:41.520
one transaction manager or rails failures between consecutive comment
00:34:47.820
the next one is Network failures between consecutive commit on the third one is
00:34:54.480
sender Shadow node failure before the last comment
00:35:00.240
here we have achieved an XA competitive success rate for nominal cases except
00:35:06.839
for the difference between ah sorry except for the difference from Excel
00:35:12.000
prepare to update for the difference from XA commit to
00:35:17.820
commit also there are no blocking factors for
00:35:23.280
node or network failures although we do not have a persistent log
00:35:28.560
on resource managers we have gained a reasonable success rate for nominal
00:35:33.720
cases here rule 2 is to keep inconsistency
00:35:38.880
observable and acceptable before proposing this rule we have to know that
00:35:45.599
there are three states for Crosshair transactions one is none of the shorts Commit This is
00:35:53.280
the failure state second one is some and not all of the
00:35:58.500
shards commit this is the anomaly state and the last one is all of the Sharks
00:36:05.220
comment of this is Success state in order to recover from the anomalic
00:36:11.460
state the general two-phase Comet forces all the Sheffield shards to commit so we
00:36:18.000
do the same thing firstly we serialize and control the
00:36:23.160
order of comet this reduces the anomalystate to L minus 1 for the number
00:36:29.099
of resource managers second way we make the inconsistency
00:36:34.579
observable from transaction manager in this scenario the sender this is the
00:36:41.339
trigger for trial recovery after that we make inconsistency
00:36:46.740
acceptable from others or from the Nexus action we let the inconsistency behave
00:36:54.180
as if the transactions are all successful also the next action should not affect
00:37:02.040
the recovery or vice versa this practice leads to a non-blocking characteristics
00:37:09.540
now let's see the rare scored on your right hand side in the beginning you can see the outer
00:37:16.320
transaction block is the standards chart so the anomaly case is that receiver
00:37:22.500
Rich Comet and the sender reached rollback let's imagine what will happen to this
00:37:29.099
anomaly at the bottom of this transaction first send your history and the sender
00:37:34.800
summary would not be saved in this case the absence of history is observable and
00:37:42.240
the sender can retry based on that also receiver gift would be saved in the
00:37:49.079
animal case and the receiver can use receiver gift regardless of the sender's
00:37:55.500
transaction by this observable and acceptable
00:38:01.200
inconsistency we have gained a non-blocking start point for retry or
00:38:07.200
recovery then Here Comes rule three keep identity
00:38:16.140
and reach a venture consistency the proposed solution utilized retry
00:38:23.099
because the required is the simplest but reliable method actually it works even
00:38:29.040
for the network failures outside the server systems in order to utilize retry we have to use
00:38:37.320
an identity key to identify the resources the identifier is often called
00:38:44.220
transaction ID identity key or request ID
00:38:50.040
we build the business logic assuming the inconsistency history like table and find our initials
00:38:57.480
by is a convenient way to handle this
00:39:02.700
inconsistency let's look at the actual radius code this code is the main business logic of
00:39:10.619
the Crosshair transaction in the middle you can see the return near Center history dot persisted
00:39:18.660
this code ensures the identity for which rights if the central history persisted
00:39:24.960
it means that all the transactions successfully reached comments no further
00:39:30.960
update is required at the bottom you can see the gift dot
00:39:36.720
find your insurance by in the anomaly case the gift already exists so the
00:39:42.839
final initialized by migrate this inconsistency efficiently
00:39:48.660
by introducing identity and the adventure consistency we not only
00:39:54.180
overcome database anomalies but also failures outside the server systems
00:40:00.599
both eventual consistency and scalability are achieved I think this is
00:40:07.920
sufficiently practical are not satisfied what do you require
00:40:15.180
most work solution then rule 4 meets a requirement
00:40:22.020
real voice utilize at least one's message queue for strict completion
00:40:28.560
the precondition for this rule is that business logic already complies with the
00:40:34.680
rule 1 2 and 3. we divide the architecture into two parts the first part is the enqueue
00:40:41.940
phase and the other one is the DQ phase in the enqueue phase we save data to a
00:40:48.900
single shark database the acid is guaranteed here then we include a
00:40:54.839
message referencing the data within the transaction this guarantee at least once message
00:41:02.940
existence then make the message decued only after
00:41:08.880
the transaction is finished Force timeout both on Rails and on
00:41:14.460
databases and use perform later for deleted queue
00:41:19.859
let's see the radius chord on your right hand side in the middle of the code you
00:41:25.200
can see the weight 30 seconds and perform later within the transaction
00:41:30.780
given my sequel on the greatest timeout it is guaranteed that the transaction
00:41:36.119
will reach either commit or roll back when the active job is performed
00:41:43.380
in the DQ phase just retrieve the messages from the queue and check if the
00:41:49.020
data in the database exists if not exist then the transaction reads
00:41:55.200
rollback do nothing if it exists then the transaction reached commit continue
00:42:01.800
the process then conduct ident business logic and be
00:42:07.920
sure to finish the message only after the business logic succeeds this guarantees at least once job execution
00:42:16.099
also do not forget to consider the risk of Ruby process Clash
00:42:22.800
by this architecture we have achieved a quasis theoretically guaranteed
00:42:29.240
completion of cross-shot transactions
00:42:36.000
thus we have acquired practically acceptable consistency without losing
00:42:41.160
scalability this is essentially a concerned
00:42:46.859
databases so we should bear in mind this should be more elegantly solved by
00:42:52.500
sophisticated databases so is it all the things we have to do
00:43:00.300
for red scalability there's more
00:43:06.780
and we are going to look at the pitfalls around multiple databases
00:43:17.819
the objective of sharding is horizontal right scalability of databases
00:43:24.480
specifically speaking given n as number of shards keep time and space complexity
00:43:31.200
to of one or at least of log l sounds simple and trivial we have
00:43:37.920
actually achieved that by sharding however there are more pitfalls around
00:43:44.099
us the one bit four is the number of connection per database
00:43:50.940
the connection pool of rails is to cache and reuse the established database connection for performance
00:43:57.900
and there is a limitation over this because the casted pool cannot be shared
00:44:05.640
across processes or across instances also the Ruby 2 the gvl forced only one
00:44:15.599
native thread is runnable per process the result of these two limitations
00:44:21.540
leads to the number of processes larger than CPU core
00:44:27.599
and generally speaking there are many API servers to the many shots
00:44:34.560
so let's do the math here the number of servers is 1000 and CP
00:44:39.900
core is 32 and the full connection for process is 5. then we can yield 160
00:44:48.060
000 over 480 000 connections per database
00:44:54.720
actually this number exceeds the maximum connection of my SQL and postgresql
00:45:00.900
and we can see the deteriorated database performance due to many connections like
00:45:06.720
memory and CPU actually the typical default by cloud
00:45:12.300
computing providers is around hundreds or thousands so we have to Target this
00:45:18.420
value the strategy of reducing the number is
00:45:24.480
like this first assume average of one for actually used database connections
00:45:30.359
for request and then release the connection to where average value of process can retrieve
00:45:37.140
the solution one is Multiplex connections at the proxies between databases and API servers
00:45:44.579
the PHD bouncer for postgrades SQL is famous for this solution
00:45:50.400
the pros of this solution is there is no connect overhead at databases
00:45:57.660
however there are columns as for the transaction pooling some database
00:46:02.700
features are disabled and regarding session pooling clear connections are
00:46:08.040
also required and the solution too is Clear Connection pool after each request
00:46:15.240
the active record refresh connection is famous for this solution
00:46:20.339
the process of this solution is there is no additional components required and
00:46:26.280
full database features are available although these have a connect overhead
00:46:32.760
at databases this is not so problematic for MySQL and we have achieved the low
00:46:38.700
connection number with a solution too the next Pitfall is the connection
00:46:45.720
Reaper thread per API server a connection reverse thread is a thread
00:46:51.300
to periodically flush IU database connections this was enabled by default
00:46:57.180
in raids 5. the original design of Reaper thread with two River thread created for
00:47:04.680
database spec so let's do the math here again the number of unicorn processes is 100
00:47:12.000
and number of shards is 300 and then the number of writers and readers are two
00:47:18.420
then we can yield 60 000 total number of river thread here
00:47:26.520
so this is actually exceeds the default parameter in kernel and also we can see
00:47:32.880
the deteriority performance due to many thread
00:47:37.980
actually in the phrase six this is already fixed and if we are using rares
00:47:44.280
under six you can see the reading frequency equals new undisable thread
00:47:53.940
the summary of beautifuls is like this we have learned that even the
00:47:59.220
elaboratory crafty code fails before huge number of databases and there might
00:48:04.920
be another pitfalls behind you
00:48:10.619
so how do we avoid pitfalls there are too many things to consider
00:48:19.380
then it comes the test test with multiple databases
00:48:28.140
if we haven't tried it assume it's broken
00:48:33.480
we should test multiple databases
00:48:39.060
the recommendation for use real remote databases in test is first use real
00:48:45.000
multiple database configurations at test avoid stub and use real databases and
00:48:52.099
middlewares and separate region right connections and there should be more
00:48:58.380
than one shirt or horizontal shelving and we love Factory bought our fixtures
00:49:04.079
because it can test the dynamic Shard allocation and access
00:49:10.260
and we recommend use real transactions on tests
00:49:15.540
by setting the config and we also simulate partial transaction
00:49:21.060
failures and this practice can detect wrong
00:49:26.520
understanding of material Behavior wrong choice of video writer connections misallocation and misaxes discharge and
00:49:34.560
wrong handling of recover your try of a partially failed transactions
00:49:41.940
also we had injected anomalies in sandbox environment
00:49:47.460
the purpose of sandbox environment is developed client system and conduct q a
00:49:53.280
on integration of ground Sovereign other components the anomalies should be tested there
00:50:00.720
in our case we had always 10 seconds application delay
00:50:06.839
and we implemented debugger to inject exceptions for important and complicated
00:50:12.900
apis this practice can detect server bug on
00:50:18.240
existing readers and client bug on relying with the API for the related data and many architecture bug on the
00:50:27.540
design of read write API and identity and the adventure consistency
00:50:34.800
and also we highly recommend conducting load tests before service launch
00:50:40.079
because this is the most unpredictable part on multiple databases for example
00:50:46.319
does the app work with 10 shots 100 shots or 1 000 shards
00:50:53.640
and actually it is required to provision servers to the maximum load on the
00:50:59.099
production and these practices can detect an even
00:51:04.440
distributed load of partitions and time and special complexity of of n or
00:51:10.800
greater and other issues due to the real scale
00:51:17.160
now we can relax ourselves in addition to relaxing multiple databases
00:51:24.839
lastly past present and future of multiple
00:51:31.319
databases at our game server in the past our project started ways for
00:51:38.700
two this is the latest version of that time and there is no native multiple support
00:51:45.240
to them and so we have extended rails and various gems and we make many monkey
00:51:53.400
patches for example in-house jail mode truck was created and Richboro was extended
00:52:01.380
and we take priority on performance over complexity and we take priority on
00:52:07.619
various syntax over implicit computation there we have implemented various features and
00:52:14.700
this was successful supporting games then the current
00:52:21.000
there are two different race versions average is 5 2 plus extended gems plus
00:52:27.780
monkey patches and Upstream Rays if six two plus native multi native multiple
00:52:33.839
database sport the featured discrepancy is shrinking for example common pitfalls are being
00:52:40.559
fixed out of certain rails and useful features are being implemented at up
00:52:45.780
some rails however um there is growing called discrepancy
00:52:53.760
because there is different code to achieve the similar things and there is more sophisticated design at Upstream
00:53:00.900
rails so which way are we going
00:53:09.000
of course on the rails we are considering to upgrade to
00:53:15.480
Upstream with a simplified multiple databases I think this is sustainable continuous
00:53:21.780
development toward the future also we are considering to migrate and
00:53:27.240
contribute our technical expertise to rails this not only benefit rails but
00:53:34.079
also benefit us and we can turn the technical debt into technical assets
00:53:42.720
we'll be back on Rails
00:53:51.059
so let's sum up today's session we have explored various techniques to
00:53:58.740
scale rails API to write heavy traffic first of all we utilize all aspects of
00:54:04.920
multiple databases vertical splitting horizontal shading replication and Road
00:54:11.220
balancing are all necessary then we let writer databases focus on
00:54:17.460
transactions replication can offload the array traffic and resource-focused API
00:54:23.280
response design can minimize unnecessary data access to writer
00:54:28.380
next we optimize data locality to reduce the cross chart performance overhead
00:54:34.740
we have explored pull push double right and partial resholding architectures
00:54:41.760
after that we introduce practically acceptable and scalable consistency it
00:54:48.599
consists of risk minimization observable and acceptable inconsistency and
00:54:54.599
eventual consistency we also learned some pitfalls and risks even for the elaboratory crafted code
00:55:03.180
therefore we recommend a test with multiple databases this can ensure all
00:55:09.180
the techniques here are working as intended by using real databases real anomalies
00:55:16.319
and real scale we can ensure everything is working well
00:55:22.500
lastly I believe it is crucial to contribute to rails Upstream which turns
00:55:29.520
technical debt into technical assets
00:55:35.760
thank you for listening DNA loves rays