00:00:14.240
hello so are you hungry so I have a good news for you for
00:00:20.840
all of you so after my session you can have lunch so first thanks to the organizers
00:00:28.000
over real P for me to the conference sh my experience like what I talked in
00:00:33.360
yesterday pan discussion became me a senior developer without solid and Broad knowledges we can't become a sen new
00:00:39.840
developer this year R had turned 10 years old and is still one of the best
00:00:46.320
framework in the world and the C team keeps adding new adding the best work practice into the framework however no
00:00:54.160
days we have more and more voice seeing that real is not for beginner anymore it's true and not true
00:01:00.559
is true because there are so many background knowledges besides rules we
00:01:05.560
should know it's not true because real real did a good job to hide those details whatever if we want to turn sen
00:01:13.360
as architecture we must and should learn those knowledges such as op Sy Concepts
00:01:20.280
database system Concepts fundamental compiling text today my topic is about
00:01:26.479
database with reals uh for first let me introduce myself I'm Ying from mainland China you
00:01:33.720
can find me with at at Twitter Facebook and we I'm the founder of f. and missing
00:01:40.399
simple and productive CL tool for product teams I'm the organizer of Ron
00:01:45.439
China we will hold the sixth conference R China Ron of China at November 1st and
00:01:51.240
2nd so welcome to our parties everyone I'm also running a podcast
00:01:57.000
program called tiawa it's targeting engineers stops in China okay now let's start my
00:02:04.240
topic so we all know that real world applications are complex especially Enterprise applications our goal is to
00:02:11.440
have a low high load but a low response time this is the S TI CL Ser
00:02:16.920
architecture most used in real world and we have the front end web server handling the requests from browser or
00:02:23.959
client and then BLS it to the apps the application server will accept those
00:02:29.519
request and interacts with the database server for data exchange and there are many studies to
00:02:36.400
test where the performance part neck resides among those components which discussed just now based on appearance
00:02:43.000
following as the static of the performance B neck probability in each of its component we have see that there
00:02:50.599
are 44% percentage of performance B de appears in database server and 32
00:02:58.400
percentage appears in application server 14 in the web server and 10% on the
00:03:05.360
network so this result shows that almost 76 percentage of the performance bonex
00:03:12.159
appear in the application and the database servers identify performance but next source is important to optimize
00:03:20.000
the application performance as we can see D database performance is most critical for
00:03:26.920
application performance as this Main cator for performance B next there are many poor performance costs in database
00:03:34.239
such as inefficient secure statement um R data model or in proper
00:03:40.480
dat data model design in effective database configurations missing or inefficient
00:03:48.959
index small or inefficient dat cache or quality cach and the long concurrent
00:03:54.879
model and the possible dead logs so let's learn those con steps to better
00:04:01.239
avoid those problems every application starts with data modeling this is the job of the
00:04:07.439
database designer turning users requirements into the application data models let's see a simple
00:04:15.120
example this everyone here should be very familiar with this code signif it's a m file creating the table callede
00:04:22.160
users with attribute U ID name email confirmed V basic info and account ID
00:04:28.520
and we Define several con constraint to our scheme such a name should not be know confirmed should be forc by default
00:04:35.400
it's called Integrity constraint in database Integrity constru a list of
00:04:40.880
predictions we add to ensure that changes made to the database do not res
00:04:46.479
in a loss of data consisten to protect from accidental damage to the database
00:04:52.840
so let's turn back to our examples it will create a primary key called ID for
00:04:58.000
user table and we will know that the ID field is an auto increment field which
00:05:04.240
always has value this is called Prim primary key constraint also called entity constraint the second kind of
00:05:12.000
constraint is not non- constraint because we marking the name of the user should not be n the Third Kind of con is
00:05:19.199
unial constant seeing that the uid field should not be should be unical
00:05:26.479
across unical across the table the four one is the referential constraint
00:05:31.800
ensuring that the account ID of the user should also appear in account
00:05:39.039
table so database itself has those Conant active record claims that those
00:05:45.160
intelligence should belong to in your application models not in the database so that's why we use validations and
00:05:52.479
dependent to enforce data in integrity and the benefits in doing so is when we
00:05:57.560
do it in the application Level we can easily customize AR message when the constant is violated so such as not n
00:06:05.680
constant with validat pres of and unial for validat unical and referential
00:06:12.560
constraint use validates assciated so let's talk about
00:06:18.319
referential constraint a bit more before we create and record user we want to make sure that the created account
00:06:25.680
record exists after record created what happens if we delete the account if
00:06:32.199
without any control the user record will be left in dirty State and because it can't find the
00:06:39.240
account anymore which void the referential constraint and this will cause bugs in our
00:06:47.280
application so that's why we use dependent to control what happened to
00:06:53.000
the ass object when the account is destroyed there are several available options as follows so for delete destroy
00:07:01.000
and delete all means it will delete the all ass users to also be destroyed or
00:07:07.599
deleted this is on delete Cascade on the datab table
00:07:13.120
schema and for NY means that it should be set the US account ID
00:07:21.160
of the user to NE and that will cause callback are not exuded and
00:07:28.720
restrict with exception and the show with error means the arror or exception
00:07:33.759
should be released if there any user so account should not be
00:07:38.919
destroyed that's on the reject on the database cre table in
00:07:45.759
Integrity so however it's still possible that we will violate theal constraint if we use statement such as account. delete
00:07:53.360
ID because no callback will be CED it will generate a sec statement directory
00:08:01.199
and the including no callback will be acced including dependent ass action so
00:08:06.520
if we need real falling key support we should use the GM
00:08:12.000
falling and it's included in the rul 4.2 by
00:08:17.560
default so let's talk about another topic view it's not the view of MVC it's
00:08:24.000
a database view view is not used a lot in real app but it can be very useful
00:08:29.879
for proper situations basically view is just the visual relationship or
00:08:35.440
according let's take an example so seeing that a user can
00:08:40.839
belongs to many projects and it might be the member of a project or the administrator of the project so now we
00:08:48.640
have the requirements to get the list of members the user can
00:08:54.279
manage so I Define a faction called admin user in user model this pring as
00:09:00.399
this so we have see that it's a Jo statement and it's a wrong we statement including a in statement and a sub query
00:09:09.600
so this in so it's too complex and the
00:09:15.600
code smell is not as good then how should we reflect by creating a view since will be
00:09:23.720
much easier so this is we can create a view directory and reflect the among users
00:09:32.000
this is the view definition it's similar to the secq generated in the just uh in the last admin user faction so there are
00:09:39.079
several points I want to mention so first is a Creator replace view user Pro
00:09:44.160
SL project slmh so it's the view name and we have select so we only select the
00:09:52.000
fields we need which is the user ID and admin user ID and project ID and we have the inter
00:09:58.880
John so it's a subquery but we mark it as a as a temporary table called admin
00:10:05.560
memberships so when we have this view we can reflect
00:10:11.000
our code using we Define a because the view uh
00:10:17.320
active recorder trigger views similar as tables so we can have a user project membership class which inherited from AC
00:10:24.480
call base and it will use the view directory and active record will trigger
00:10:31.760
view as no difference as the table so and then we have has many associations
00:10:39.200
which can directly fetch admin users so this benefits view has bring to us it
00:10:45.600
gives us simple and customiz interface and what's more the index on the original table can still be used which
00:10:52.920
is really good however be aware that the view is
00:10:58.040
not a real table it's wrong every time view is referenced in a quy and most of time view is not
00:11:05.399
insertable updatable or deletable so if it's best shoed for
00:11:11.720
select there so you may concern that the performance is not very good because the query of view definition is WR every
00:11:19.120
time when it's quered referenced in a query so in Secure standard there are another
00:11:26.399
kind of view called materialized view which will include the qual and store
00:11:31.800
the result Into The View when the command is issued so we can direct fetch
00:11:37.279
the table results without additional quy it's
00:11:42.360
prefetched we use create materials view table name to create a material
00:11:47.720
view however as a query is prefetched and executed once the command is issued
00:11:54.560
it will not be automatically updated and unless we cause a refresh
00:12:01.120
material view Memory refresh view so material review
00:12:07.560
is best suited for relationship really changed such a reporting and you can St
00:12:14.079
so for example you can start yes this project this into a materialized view to
00:12:19.560
catch result instead of qu qu the cach qu the results every time
00:12:25.839
when it's called that's the use case of Mater View and the view which can bring benefits to
00:12:34.920
us so let's talk about another topic which
00:12:40.000
index uh recall the performance bck Source it says that nearly 10 percentage
00:12:46.839
of the data database performance problem are caused by missing ineffective or
00:12:52.360
inefficient index index is a m primary mechanism to get improved performance
00:12:59.839
on a database so if giving a table do you know which index should be created
00:13:07.160
is there any rules to create index what's the trade off so let's first see
00:13:13.199
how index work in database internally so now giving a user table
00:13:20.000
with the following records what if we want to find the record whose IDE is for
00:13:25.480
suppose without any index the operation will sear record one by one comparing it
00:13:30.959
ID to four until find record so first it's AR it's four second
00:13:39.040
it's four no third it's four no four yes
00:13:44.320
so we find the record that's we call it as a four table scan imagine when we have millions or
00:13:51.440
tens of millions of Records this will be very inefficient that's why we need the index
00:13:57.680
to call it to speed up with the index on ID field which is by
00:14:03.720
default for the prime key field we can directly locate the location of the
00:14:09.680
recorder and then fetch it directory so it contains two steps first we get the
00:14:18.240
pointer of the set recod from the index table and then then we get exact recod
00:14:25.759
using the pointer so now the question is how to get the point of that record
00:14:32.360
quickly from the index table there are several algorithms we can use it's
00:14:37.560
called index type in the database each index type uses a
00:14:43.600
different algorithm and best suited for different kind types of queries B+ Tre
00:14:50.560
is the most widely used in almost all the database hush is another type which is
00:14:56.480
supported by post skill and my skill besides that there are still other types
00:15:02.759
such as gist SP gist and G today I will talk about B tree
00:15:09.279
only so B plus tree is a kind of balanced tree this image is a 4 degree B
00:15:16.959
plus tree from the image we can see that in the from a node in the left side it's
00:15:23.199
the value it's the value key key value Which is less than the key uh key in the
00:15:29.560
no node and in the right side is larger or equal to so here I don't want to talk
00:15:36.000
about how this tree is construct let's see how we can use this tree to search
00:15:41.120
for records to get the point of the record suppose we now we want to find a record whose ID is four it will transver
00:15:49.199
the path from the root node to the leaf node first it will try to find the four
00:15:56.360
and by comparing to five it's smaller so it it will go to the left
00:16:01.800
side and comparing to two it will go to the right side okay then it find that we
00:16:08.160
have reached to the leaf nodes we will just bring this whole block into the memory and then actually the records one
00:16:16.720
by one so from two is three and it's four okay we find the record and then we
00:16:24.040
get the pointer of the record and then use the pointer to fet record directory
00:16:29.240
so algorithm complexity for a four degree tree is log 2 N so n is depend on
00:16:35.360
how many records we have for example we have eight here so it takes three steps
00:16:40.800
to get to the Lea nodes we can choose another degree As You Wish For example now we use four and
00:16:48.240
we can use six 16 100 1,000 it depends on the index field of this index on the
00:16:56.639
type of the field which you to index and the complex
00:17:02.040
complexity algor algorithm is Log and the depth Tre and and so we can even
00:17:11.959
with millions of Records or thousands of millions record the depth is not
00:17:17.880
very high and we can search it in the few steps which can be very efficient oh
00:17:24.559
based on so from the example we can know that B pro is very useful for ques like
00:17:31.280
equal and for ques with compare such as less than for less than it will just
00:17:38.799
find the value of four and then to travels back hand backward and for the
00:17:45.880
between it will find the V at the tree and then we iterate the dis block one by
00:17:52.000
one to find TI reach VR also for the
00:17:57.480
larg than find the value and then actually the Le nodes one by
00:18:03.400
one so from this this image and the explanation we can know that b plus 3 is
00:18:11.120
very suitable for Fields whose values of rise and so if the field just contains
00:18:17.760
four values such as the B value or in N Fields the index will be a bit useless
00:18:24.640
because there are many duplicates and the depth will be so it will almost
00:18:31.120
taken like four table SC so in the implementation of some database when the
00:18:37.080
index field are uh DC with lots of duplications the databas the database
00:18:43.280
will automatically turn to use the four table scan instead of the
00:18:48.760
index so the so after knowing that so what's should
00:18:54.760
we create the index as much as possible so we before answering this question we
00:19:00.000
should know that there are several downside of the index we should be aware of first each index we created will cost
00:19:08.240
additional storage space however as nowadays the storage is
00:19:13.840
very cheap is a marginal downside which you should not concern second it will take additional
00:19:20.240
time to create this index when record created so it will slow down the
00:19:25.440
recorded ction this is a meding level down side but it's is not a big problem the third downside is the main one we
00:19:32.159
should take into consideration while creating index index maintains is cost
00:19:37.559
if the values index updates frequently and it might be offset benefits than
00:19:43.360
heavy index because the maintainance will take two steps to for to the B
00:19:49.080
Construction it first will delete the value and then add it back so if the database updates the field value
00:19:57.080
frequently so it's not good to have the index then not so after knowing those downside and
00:20:04.720
know how index Works internally do you we can now get some conclusions about
00:20:10.919
how to pick up index first it will depend on the size of table if the size
00:20:17.120
of table is small for example it just contains some hundreds or thousands of Records there will there will be no much
00:20:24.720
difference between indexing and a full table scan and the database such post G SK will choose to use four table scan
00:20:32.360
directly to search the records one by one even if there are available index to
00:20:38.440
use second is the data distributions index are only effective when the values
00:20:44.760
of field is dense third is the qu rate comparing to
00:20:49.880
updated load because index maintains is costy as we just discussed in the down
00:20:55.120
side you shouldn't create index whose value are changed frequen but not qu too
00:21:01.120
often so that that's way the rules how we to pick up
00:21:06.440
index then we turn transaction talk about transaction is another important
00:21:12.360
Concept in database transactions is the unit of program equs that access and possibly
00:21:20.880
update right data items understanding the concepts of a transaction is critical for
00:21:26.919
understanding and implementing up of data in a database so let's see a real example in real
00:21:33.679
first suppose now we want to update the field the name of the user and the user
00:21:39.279
has a presense validation on name field so from from the generated secure we can
00:21:45.279
see that it starts with a begin statement and then do the job to update
00:21:50.799
the name and then the commit statement at the end means that the transaction is
00:21:56.840
succeeded and it will it has saved into the database the second example is ended
00:22:03.080
with the low back because it has Violet the name constraint and then it claims
00:22:09.919
to give up the update operations begin begin commit commit and lowb back are
00:22:17.320
the control unit of the transaction in the database these two examples are
00:22:22.880
single statement and we can use the transaction class method to run multiple
00:22:28.559
statement in one transaction so this example when we have so we have the
00:22:35.720
transaction Cass method and we'll do two things it will begin statement and run
00:22:42.400
statement one statement two and then commit so the way batch update
00:22:48.840
the name of field a active project ID in
00:22:54.640
batch so in one transaction so to speed up what interesting about transaction uh
00:23:02.200
callbacks I know very many developers are confusing why after save is not able
00:23:07.279
to see the records from another process for example when we trigger a background
00:23:12.480
process job in the after save and it's cute such as in the S and accute and it
00:23:20.000
can't find the record ID the ID of that record and you will raise arror that's
00:23:26.480
POS you might be possible have meet this situation so if we know
00:23:32.600
how transaction works it's easy to understand why when save and Destroy
00:23:37.919
code is automatically wrapped in a transaction and the Callback chain will
00:23:43.080
be like this so when we try to save the user it will start with the begin
00:23:48.799
statement of the transaction and it will be run before save call back and the
00:23:55.720
third step is a real save step and and the fourth step is the after save step
00:24:02.279
and then it's the committ step committ step statement so when after save is called
00:24:11.080
the transaction is still not committed that means in another process you might
00:24:16.840
be not being able to see the change you made to the database so it kind find the
00:24:22.880
record and it will just raise the record and not find and the final step is the
00:24:28.480
commit step so if you really need the background job such some Works to do
00:24:35.159
make making sure that the record has been updated or created in the database
00:24:40.960
you should use after commit callback so indeed transaction
00:24:48.840
management is very complex in the datab in the database design and
00:24:54.320
implementations the database system should follow the acid property of the transaction a is automatic which
00:25:02.720
means either all operation of the transaction happens properly in the D database or not happen so because the bu
00:25:11.039
transaction we just use if one of the statement of save field the whole transaction should be low
00:25:17.600
back so the future cannot leave the database in a state where a transaction par cuted so consistent is the acction
00:25:25.440
of the individual transaction should preserve the consistent of the database so if the state of database before the
00:25:32.600
transaction is consistent after after the extion the state should still be cap
00:25:37.960
consistent so it should not be changed the state of the database before and after the
00:25:43.559
transaction I is isolation it should ensure congruent Equity transaction
00:25:48.919
should not be should be isolated from one other just like no other transaction ised conr concr the is d right ability
00:25:59.240
it make sure that once the transaction has completed successfully the change
00:26:05.760
has made to the database should persist even if there are assistent
00:26:11.000
Futures so let's see an example to understand how this a works so the
00:26:16.840
complexing comes from conent congruent processing like our real real
00:26:22.520
application might update the same table or record at the same time by different process or SL for example so so this is
00:26:28.960
a transaction one which try to update the user record three and meanwhile there is
00:26:38.279
another Pro transaction T2 trying to read the user record user so after those
00:26:44.799
as those two transaction are running congruently so what's the returning user
00:26:50.120
record from TR T2 what's what do say is the first po statement the same as the
00:26:57.840
second the post statement is there anyone want to guess so if you think it's the same please raise your
00:27:05.200
hand anyway so the same two Port statement
00:27:12.840
return the same results okay I will leave us several
00:27:20.679
seconds for you to catch the code and do you think it will return the
00:27:27.240
different statement different results anyone want to raise the hand think is
00:27:35.919
different I think no answer is correct because the answer is dependent and might not might be the same might not
00:27:43.640
depend on the different congruent model or transaction asolution level so there are several asolution
00:27:50.600
level we can use so the first the first is real committed is
00:27:58.440
level which is by default in post skill so Le commit means that when I I run a
00:28:06.200
statement the statement can only see loss committed before it
00:28:11.440
begins so let's take example and to explain so when the
00:28:18.919
first uh when the CH uh first the find in transaction two
00:28:24.960
ised the transaction one is already commit uh is not committed and it not
00:28:30.360
update the value it's not updated the name value so it will put it will print
00:28:38.640
the old value and when after the sleep it will try it will find the value find
00:28:45.880
the record ID with three again but the transaction one is already committed and
00:28:53.320
so trans two will get the new updated record so the second
00:28:59.080
input it will print the different value as the first was old and the second will
00:29:06.480
be print the new newest uh name so this for the
00:29:15.000
read committed it's by default at post G skill so here is another uh aeration
00:29:23.600
level which is by default in my CQ repeated repeated read means all
00:29:28.760
statement of the C transaction can only sa the rows committed before the first
00:29:33.960
quy of dat modification statement willed in this transaction so when the first qu executed it will save the snapshot into
00:29:41.559
the database memory and the and the following ques will just use a
00:29:47.120
snapshot so using that let's back to the
00:29:54.120
example when the first find of transaction two exuded
00:29:59.919
it saves with a snapshot and for queries in the same transaction before the
00:30:05.120
commit statement it will always return the same record no matter whether there are other transaction updating the
00:30:11.919
record or not so in the transaction two
00:30:17.640
the second post statement or the second final statement will always return return the old value when the first
00:30:24.559
statement executed so it will return yes so in if the transaction asolution level
00:30:31.480
is a repeated read it will return the same value and this is by default in my
00:30:37.919
SEC so it act different in my SEC and post secq so you should know that when
00:30:43.080
you use a proper database and to know how transaction
00:30:48.159
works the third is the seriz isolation level cable is similar to repeat table
00:30:55.600
read but the transaction just a name the transaction will run one by
00:31:01.039
one so let's use back to the example again so it will as it's seriz so the
00:31:10.480
transaction the other transaction will only start when another transaction is
00:31:15.919
completed so no matter whether transaction one is runfor or transaction
00:31:20.960
two runfor it will always run only one transaction and will it will return the
00:31:27.720
same value but it will rise whether it's it might be the old value or the new
00:31:34.039
value depends on which transtion run
00:31:39.919
first so there is another uh isolation level which call read uncommitted read uncommitted means
00:31:48.399
that the transaction can always read the value not committed so the changes are
00:31:54.519
always visible so back to the example with Just sh so no matter whether the
00:32:00.000
transaction one is committed or not it's the trans transaction two can always
00:32:05.080
read the value and it's the most useless isolution level which we should never use because VI it will get the
00:32:13.080
unexpected results and post G secure doesn't support this level but will just
00:32:20.320
treat it the same as read
00:32:25.480
committed okay so as that's the all all the thing