00:00:00.900
foreign
00:00:12.660
let's start
00:00:14.120
a little bit more about me my name is
00:00:17.880
Fernando or Fair because I know we are
00:00:20.939
all about productivity so
00:00:23.520
that name takes three times shorter so
00:00:25.980
that's good my last name is Perales
00:00:29.640
which I came to the realization one
00:00:32.340
month ago that it's
00:00:34.140
Spanish for pear trees Pluto and I don't
00:00:39.120
like Earth
00:00:41.820
I'm coming from Guadalajara Mexico is
00:00:43.739
not very far from here there's a flight
00:00:45.540
four hour flight it's a nice place and
00:00:49.440
I've been doing pretty much eight years
00:00:51.500
doing program rates mostly Consulting
00:00:55.260
of those eight years five months were on
00:00:58.440
us working at a startup just five months
00:01:01.079
I didn't like the startup life
00:01:05.540
in the part of the Boost team and I also
00:01:09.180
host the Ruby MX community
00:01:11.159
probably you saw something in the in the
00:01:14.520
schedule regarding like Meetup slash
00:01:17.280
community so we recorded that yesterday
00:01:19.740
it was really cool to to meet more
00:01:21.780
people who happens to speak Spanish
00:01:24.540
and this is my fifth railsconf versus a
00:01:28.020
speaker so it's really important for me
00:01:31.200
and the picture is not really a picture
00:01:33.600
it's an illustration by Sarah that's
00:01:36.420
your Instagram like your space that's a
00:01:39.540
nice illustration
00:01:41.880
uh so yeah let's do some warm-up
00:01:44.220
questions
00:01:45.420
raise your hand
00:01:47.579
if
00:01:49.740
you have that access to a production
00:01:52.020
server or database
00:01:54.140
that's interesting
00:01:58.619
raise your hand if you could feel more
00:02:01.079
comfortable not having access to that
00:02:03.360
Production Service yeah it's a big
00:02:07.380
responsibility to to have the case of
00:02:09.720
the kingdom are responsible
00:02:12.300
again raise your hand if you are
00:02:14.340
comfortable with the security
00:02:15.480
measurements your organization takes
00:02:18.900
then you say like okay I can go to sleep
00:02:21.180
every night no problem I know if there's
00:02:23.879
an issue someone's going to take care of
00:02:25.980
it okay not a lot of hands that was
00:02:28.379
expected that's good
00:02:30.959
um regardless of your answer this might
00:02:33.599
not be the talk for you
00:02:35.400
there is more capable people who can
00:02:37.620
help you or your organization to prevent
00:02:40.440
undecided access to your data service
00:02:42.360
from Outsiders also known as hackers
00:02:45.900
uh there are Consulting companies that
00:02:48.000
make a living and are very good in
00:02:49.680
letting you know what you can improve
00:02:51.420
Guang hire one be ready for the wars Tom
00:02:55.680
assume that because you're a small
00:02:57.480
company you are not a target of interest
00:02:59.280
for hackers however
00:03:03.000
raise your hand if
00:03:05.160
you have half a copy of production data
00:03:08.099
in your machine
00:03:10.379
okay it's interesting Maurice have have
00:03:13.080
the people
00:03:14.819
raise your hand is someone from your
00:03:17.040
organization has asked you for a copy of
00:03:19.800
production data
00:03:22.560
okay interesting
00:03:25.200
raise your hand if you have provided a
00:03:27.780
copy of production data to someone in
00:03:30.060
your organization
00:03:32.400
no judgment so feel free to raise your
00:03:34.620
hands
00:03:35.900
last question if raise your hand if you
00:03:39.239
are concerned about copies of production
00:03:41.280
data being in someone's hands
00:03:44.099
yeah so if you answered yes to at least
00:03:47.159
one this is a talk for you
00:03:49.860
the inspiration of this talk some cases
00:03:53.299
I think the reason that you decided to
00:03:56.519
attend this talk
00:03:57.959
is this thing
00:04:01.140
what is this the health insurance
00:04:03.180
portability and accountability Act
00:04:05.879
uh it's a United States federal
00:04:09.140
federal statute signing to law in 1996
00:04:13.400
it pretty much modernizes the flow of
00:04:15.900
healthcare information it depletes how
00:04:18.299
personally identifiable information
00:04:21.260
maintained by Healthcare and healthcare
00:04:23.759
insurance Industries should be protected
00:04:25.620
from froth and fifth uh in general it
00:04:29.699
prohibits Healthcare Providers and
00:04:31.380
businesses from this closing protected
00:04:34.380
information to anyone other than a
00:04:36.419
patient and the patient authorized
00:04:39.060
people
00:04:40.340
and this term it's also related to to
00:04:44.340
that HIPAA term the Phi protected health
00:04:49.020
information what is that
00:04:51.139
also known as personal health
00:04:53.160
information it's the demographic
00:04:55.620
information medical stories test results
00:04:58.259
laboratory results
00:05:00.000
uh mental health condition ensure
00:05:02.400
information alert and other data the
00:05:04.979
healthcare professionals connect to
00:05:07.139
identify an individual and determine
00:05:09.479
appropriate care
00:05:12.060
what is considered protective
00:05:14.400
information
00:05:16.199
pretty much all of this name address end
00:05:19.860
date and just rotate and maybe the date
00:05:23.520
that you were accepted into a hospital
00:05:27.300
maybe the the time you were
00:05:31.440
included into new insurance for number
00:05:34.620
fax number
00:05:36.180
I don't know if Fox is still a thing but
00:05:38.460
probably email address social security
00:05:40.680
number pretty much all of it
00:05:43.620
um but also the last point is also
00:05:46.320
important any other unique identifying
00:05:49.080
characteristics
00:05:50.280
that one is more tricky because maybe a
00:05:53.520
person who has a very uncommon tattoo in
00:05:57.240
a very specific part of the body that's
00:05:59.160
that makes that person easy to identify
00:06:01.800
so we have to protect also that kind of
00:06:04.080
data
00:06:04.979
and one of the cases that
00:06:08.100
call my attention is this one uh
00:06:10.380
lifespan on encrypted stolen laptop goes
00:06:13.259
lifespan more than one million in fees
00:06:16.100
pretty much the issue was an employee's
00:06:19.320
computer went missing with protected
00:06:21.180
health information of about 20 000
00:06:23.400
records and last time I checked in my
00:06:26.160
wallet I didn't have one million
00:06:28.319
thousand dollars to spare on a fee
00:06:31.020
and there's a keyboard here
00:06:33.620
unencrypted so you might think okay my
00:06:36.720
machines are great encrypted so we
00:06:38.940
shouldn't have an issue right
00:06:41.100
um somehow because if you have
00:06:44.720
protected information and you cannot
00:06:47.460
document and prove that the device was
00:06:49.560
encrypted you also need to to follow the
00:06:52.800
requirements uh to to for for a HIPAA
00:06:55.860
Bridge
00:06:57.120
and then you might think I don't have to
00:06:59.520
worry I don't have any health
00:07:00.840
information in my hand maybe I work for
00:07:03.180
fintech or maybe I do any other kind of
00:07:05.460
industry
00:07:06.600
am I safe if my app is not health
00:07:08.759
related
00:07:10.039
well
00:07:12.000
one of the nice things of consulting
00:07:13.919
which is what I've been doing for the
00:07:15.960
last eight years is that you might work
00:07:18.000
with clients from outside the states
00:07:20.840
this is equals to you have to worry
00:07:23.220
about local legislation so let's go back
00:07:26.280
to my country Mexico
00:07:28.139
we have this very long name I'm not
00:07:30.900
gonna say what it means in Spanish but
00:07:32.940
in English is the federal law of
00:07:34.740
protection of personal data help our
00:07:37.380
individuals and it was approved in 2010
00:07:41.819
I think
00:07:43.020
and it's it aims to regularize the rate
00:07:45.780
for uh to form informational
00:07:47.699
self-determination
00:07:50.099
um what it means is that companies such
00:07:51.780
Banks insurance companies hospitals and
00:07:54.060
schools telecommunication companies
00:07:56.520
religious organizations and any
00:07:59.580
professional such lawyers doctors and
00:08:01.680
others are required to comply with the
00:08:03.419
provisions of this law
00:08:04.860
which is very similar to hippies like
00:08:07.020
there's information that identifies a
00:08:09.960
person and you have to protect that
00:08:12.840
this brings another case uh and again
00:08:16.560
this is an interesting case because I
00:08:19.440
was affected and 93.4 million of
00:08:22.860
Mexicans were affected pretty much
00:08:25.280
uh what happened our personal
00:08:27.660
information of almost 94 million Mexican
00:08:30.180
Exposed on Amazon
00:08:31.979
what happened uh we have a vulner entity
00:08:36.060
that has a registration of pretty much
00:08:38.580
all adults in Mexico
00:08:40.620
and for some reason which I don't
00:08:43.320
understand they provide a copy of that
00:08:45.660
database to the political parties which
00:08:47.880
is around 10 political parties
00:08:50.360
and some of one of these parties upload
00:08:53.399
a copy to Amazon without protection a
00:08:56.820
mongodb interface of about
00:08:59.060
132 gigabytes
00:09:01.260
so how did it happen
00:09:03.899
as I mentioned so I'm going to blow
00:09:05.519
something that shouldn't have upload
00:09:08.899
#opsy oopsie
00:09:11.459
so the first lesson don't give
00:09:12.899
production copies to everyone
00:09:14.880
and that could be pretty much the end of
00:09:16.860
the talk but that's the safest thing to
00:09:20.040
do don't give don't let your data go out
00:09:22.440
from your production servers but of
00:09:24.480
course you're not coming to to hear that
00:09:27.240
but what if what if we can provide only
00:09:30.500
what it's needed
00:09:33.899
um and there's uh a general term that we
00:09:38.279
can use or that it will know which is
00:09:41.040
anonymization of data if we think about
00:09:44.600
the reasons why someone
00:09:47.760
from organization requires access to
00:09:50.339
production data
00:09:51.839
we can most of the time realize that
00:09:54.420
they don't want the whole uh the whole
00:09:57.120
data they just need a subset of
00:09:58.680
information
00:10:00.080
and maybe they can meet everything or
00:10:03.120
just specific parts or or they need to
00:10:06.839
do some research on their data maybe a
00:10:10.440
data scientists need to get access to
00:10:12.360
your data sets to do some calculation so
00:10:15.120
I don't know it's up to your
00:10:17.040
organization
00:10:18.120
but we can do we can do some data and
00:10:21.080
which we must do some data anonymization
00:10:24.060
before giving a copy of production data
00:10:27.000
in case we decided that we have to give
00:10:29.760
and there's a tool that I've been using
00:10:31.620
recently which is this one possible
00:10:34.140
synonymizer uh what is this is an
00:10:37.320
extension to mask or replace personal
00:10:40.560
identifiable information or any
00:10:43.440
commercial sensitive data
00:10:45.380
by the name you can guess that this
00:10:47.519
works only for part for postgres
00:10:50.220
which is most probably the jury using
00:10:53.160
Universe application
00:10:54.600
so there's a repo with a demo so you
00:10:58.380
don't have to follow everything on this
00:11:01.380
on this section you can take a look at
00:11:03.180
that report and I'm going to share the
00:11:04.680
link on Twitter and Slack
00:11:07.140
so if you miss anything you can go to
00:11:09.600
the repo and take a look at what's going
00:11:11.339
on
00:11:13.500
for this case
00:11:15.180
I have a sample application and as you
00:11:18.240
can see it's a
00:11:19.560
vanilla rails application
00:11:21.899
uh that has a table called users uh with
00:11:27.360
a we have an ID we have a first name a
00:11:30.959
last name uh Street line one a straight
00:11:33.420
line two a zip code an email salary
00:11:37.079
incense that's just made up numbers
00:11:39.120
doesn't make sense that someone earns
00:11:41.700
200 cents
00:11:44.360
but it's just like a sample of the kind
00:11:47.040
of data that we work uh regularly so
00:11:50.839
again let's see someone needs a copy of
00:11:54.120
this data maybe they want to do
00:11:56.880
I don't know just like uh they want to
00:11:59.940
take a look at the structure a lot of
00:12:01.320
ways they want to maybe do some
00:12:03.779
calculation with the salaries maybe they
00:12:05.760
want they need to calculate bonuses they
00:12:09.660
need to compare the finances of the
00:12:12.600
company maybe they need to I don't know
00:12:15.180
there's a lot of things that you can do
00:12:16.560
with data I'm not a data scientist but
00:12:19.500
I'm sure that they have a lot of good
00:12:22.140
use cases to to get this information
00:12:25.860
so what's the first thing again don't
00:12:27.720
worry you can take a look at this at the
00:12:29.579
example
00:12:30.720
so it's a possible extension that's
00:12:33.839
pretty straightforward to style you use
00:12:35.700
cloud extension it's dragging git love
00:12:38.040
open source of course
00:12:39.740
make install make extension make install
00:12:43.160
and then we have to enable the extension
00:12:46.500
here in our database
00:12:48.420
so yeah pretty much we have to do some
00:12:50.880
SQL commands to alter database in this
00:12:54.959
case I'm working in development so I
00:12:57.600
don't really have to to worry about
00:13:00.120
uh
00:13:01.519
messing something up be careful when
00:13:04.079
you're doing this in production and
00:13:06.120
we're gonna see an example
00:13:07.920
so pretty much altered database and we
00:13:10.380
are pre-loading the extension and then
00:13:12.480
we created extension I know if not exist
00:13:16.500
so okay now we have a an extension in
00:13:18.839
our database what can we do with it
00:13:21.420
um one of the things that I have over I
00:13:24.300
have found with more useful for this
00:13:26.880
tool is this static masking and what is
00:13:30.720
the term
00:13:31.980
uh Sometimes the best way to
00:13:35.160
to deal or transform the original data
00:13:37.800
set is pretty much destroy the local
00:13:40.740
copy that you have so what it means
00:13:43.980
you're someone is getting a copy from
00:13:46.560
the database install in another server
00:13:49.260
maybe locally and then transform that
00:13:52.079
data so it's not recognizable
00:13:55.440
and this works very good for local
00:13:57.779
copies of the data again it's a copy if
00:14:00.120
we destroy anything
00:14:01.860
we don't have to worry the real data is
00:14:03.720
in production the the advantage of using
00:14:07.260
static masking is that if we're computed
00:14:09.720
if our computer is stolen
00:14:13.260
if it's a great encrypted or not the
00:14:15.540
data is anonymized so we have something
00:14:18.240
less to worry about and again as a note
00:14:21.000
don't run this in production because you
00:14:23.639
are going to actually destroy the data
00:14:25.980
and what are the strategies that we can
00:14:28.019
do for static masking this tool allows
00:14:31.500
to do three
00:14:33.540
which is applying masking rules
00:14:36.360
we can do some shortening in columns
00:14:39.240
this uh we can have noise to numerical
00:14:42.839
or date values so let's go let's see an
00:14:46.920
example of each one flying masking rules
00:14:50.699
we have an extension set up in our
00:14:52.980
database so let's go let's connect to
00:14:56.459
our database in this case this in the
00:14:58.699
database name for example
00:15:01.320
we initially initialize extension I
00:15:04.440
don't need and then we can Define some
00:15:06.660
rules
00:15:07.680
um when I show this example to to a
00:15:11.639
co-worker they may they mention oh that
00:15:14.639
looks like like Faker but for your
00:15:17.040
database and that's true
00:15:21.540
um this is how unifying rules impulse
00:15:23.940
research advisor you have probably you
00:15:26.760
have seen the security level we're going
00:15:28.620
to talk a little bit about that security
00:15:31.139
label for the name of extension and then
00:15:33.959
we Define uh okay on this column
00:15:37.560
uh data table users column first name we
00:15:42.540
want to mask the information in this
00:15:45.180
column with with a function what is the
00:15:48.060
name of the function and fake first name
00:15:51.480
and this is how we are defining a rule
00:15:53.639
to to that Google
00:15:55.440
and again this security label it's uh
00:15:58.320
it's a very interesting if you're using
00:16:00.420
postgres which you probably have haven't
00:16:03.420
heard about this
00:16:05.699
um
00:16:06.480
this framework it's a security framework
00:16:09.060
and it's inside of postgres pretty much
00:16:11.279
what it lets you do is uh you can
00:16:14.160
achieve like very fine-grained control
00:16:16.680
and security control on your data
00:16:20.639
um for example as a result of using this
00:16:23.160
you can make that some users
00:16:26.459
can only see office created data
00:16:29.399
oh in this case I'm just pretty much
00:16:32.040
doing more more security uh labels for
00:16:36.240
this example I'm doing pretty much the
00:16:38.220
same for last name
00:16:39.779
in the column last name
00:16:41.820
um just using a fake last name
00:16:45.019
for just a straight line maybe I don't
00:16:48.120
care about the street so I'm just
00:16:50.160
grading a hardcore value confidential
00:16:53.880
and the same for a straight line too for
00:16:56.579
this example I'm I'm only care about the
00:16:58.980
numbers and I'm not really care about
00:17:01.740
what is the salary of each person and
00:17:03.839
where they live pretty much the same for
00:17:06.000
streetline too I'm pretty much the same
00:17:08.160
for zip code I don't really need to
00:17:09.900
provide the real zip code of the person
00:17:11.640
so I can use a random typical
00:17:15.419
and finally I would like to
00:17:18.319
anonymize or scramble a little bit the
00:17:21.660
email because again the email it's if
00:17:24.179
you have one person's email
00:17:26.819
and you put that into any uh search
00:17:30.240
engine you can find interesting
00:17:31.980
information about the person so I also
00:17:34.260
want to anonymize that and this
00:17:36.419
extension provides this function partial
00:17:38.280
email and again I provide just the name
00:17:40.620
of the of the column that I want to
00:17:43.380
it's one running mice and then at the
00:17:46.320
end I just run anonymized database
00:17:48.539
remember this will destroy the real data
00:17:52.200
it's not making a copy of your database
00:17:54.000
that's up to you to make a copy so this
00:17:56.700
will pretty much destroy uh this will
00:17:59.400
mutate the the database
00:18:01.980
what is the result
00:18:03.780
this is pretty much how it looks like
00:18:05.460
one once I run this command if you see
00:18:08.580
first name and last name it's like
00:18:10.380
completely different at the top is the
00:18:12.660
original data set and the bottom is a
00:18:15.240
randomized one we have the straight line
00:18:17.280
one and two confidential we have random
00:18:20.340
cheat codes zip codes and the email it's
00:18:22.740
pretty cool because it's partial I don't
00:18:25.740
know as we have we know that it starts
00:18:27.900
with
00:18:28.799
f e and then there's an ad but if you
00:18:31.980
compare with the with the real emails
00:18:34.440
it's not even the same length it's not
00:18:38.039
even the same the domain is hidden we
00:18:41.700
can Define if we if we want to
00:18:44.220
crystallize how we're an amazing we can
00:18:46.620
do that but the default function this is
00:18:48.660
what it does which is good enough for me
00:18:51.419
the salary I didn't make anything with
00:18:53.400
the salary for now because I care about
00:18:55.140
these numbers I just don't want to
00:18:58.380
associate a person with a salary so if
00:19:01.740
you see this data set you have no idea
00:19:03.780
who these people is they don't exist to
00:19:06.360
be honest
00:19:07.400
the salary isn't done so I can make I
00:19:10.380
can make calculations with the salary
00:19:11.700
without disclosing whose salary belongs
00:19:15.900
to which person
00:19:18.179
I can also shorten columns this is also
00:19:21.419
interesting because again when we are
00:19:23.340
doing calculations
00:19:25.080
we sometimes don't care about where
00:19:28.140
those numbers come from but we need to
00:19:29.940
use the real numbers
00:19:32.039
for shuffling columns again we have a
00:19:33.900
tool Shuffle columns we provide the name
00:19:37.559
of the table the name of the column and
00:19:40.740
we provide a primary key so they are the
00:19:43.980
the function can know how it's going to
00:19:47.100
relate to everything
00:19:48.600
what is the result of this
00:19:50.700
it's pretty much similar in this case I
00:19:52.559
didn't
00:19:53.480
anonymize the the data but if you take a
00:19:57.299
look at the salary sense column
00:19:59.520
it's not the original people who have
00:20:02.580
the salary so I don't affect the
00:20:04.860
calculations if I make uh if I sum the
00:20:09.179
the salaries I'm gonna get exactly the
00:20:11.700
same result if I do pretty much medium
00:20:14.700
or any other operation I get the same
00:20:17.039
results but what is the advantage the
00:20:19.919
real salary is not associated to the
00:20:21.900
real person this might be one of the
00:20:24.000
cases that I want to solve
00:20:26.580
another interesting function adding
00:20:29.160
noise to a column what is the concept of
00:20:31.799
noise in this case
00:20:33.660
again I'm going to add some noise to the
00:20:37.020
table users in the columns already sent
00:20:39.960
and there's a
00:20:42.020
point and what it means uh this function
00:20:46.620
is going to change the values of that
00:20:48.360
column
00:20:49.559
randomly plus minus 10 percent
00:20:55.500
again this is the
00:20:57.620
the result it's spring if you compare we
00:21:01.799
have the same order we're going to
00:21:03.720
change the order
00:21:04.980
but the number is not exactly the same
00:21:07.020
so we are we're adding some noise to the
00:21:09.900
data uh I know that this is sometimes
00:21:12.299
useful for people who train data models
00:21:15.179
that okay I want to have just model with
00:21:17.100
noise you can use that instead of using
00:21:19.500
the real but you know even though you
00:21:22.500
don't know or you don't disclose the
00:21:24.360
real salary you know that this number is
00:21:27.299
between plus minus 10 percent
00:21:31.260
this is pretty much a part of the static
00:21:33.659
and it's what I've been found more
00:21:36.059
useful so far but it's not the only
00:21:38.220
thing that the two can do we'll have
00:21:40.440
Dynamic masking this is very interesting
00:21:42.240
because this can play together which was
00:21:45.299
very in a very interesting way what is
00:21:47.880
dynamic masking that I can hide some
00:21:50.400
data from a role a date of his role by
00:21:53.159
declaring that role as mask
00:21:55.559
and then I can Define rules so when that
00:21:58.440
database database raw connects to the
00:22:00.900
database it's gonna it's gonna get the
00:22:03.659
The Mask version of the data and not the
00:22:06.000
real one
00:22:07.440
how does it work it's pretty much the
00:22:09.480
same we started Dynamic masking
00:22:12.539
in this case we're creating a role and a
00:22:15.000
security label for that role restricted
00:22:17.400
we'll ask
00:22:18.559
and again if we if we open SQL and we
00:22:22.679
join with this use with this role we are
00:22:25.620
gonna see the data with the rules that
00:22:27.659
we have defined but we if we use another
00:22:29.460
user we're going to get the real data
00:22:31.919
we can do some very interesting with
00:22:34.799
trades that we have now multiple
00:22:36.360
database connections we can make one
00:22:37.860
connection with the real user and one
00:22:39.600
with a restricted one
00:22:41.460
and also another tool that it's very
00:22:43.320
interesting Anonymous database Doms
00:22:46.520
it's pretty much the what you think it
00:22:49.500
is it's a Grappler for PG town
00:22:53.280
and this this rubber is designed to
00:22:55.500
export The Mask data so instead of
00:22:57.900
getting the data and then processing you
00:23:00.419
can export the data directly with the
00:23:03.419
with the rules that you have defined and
00:23:06.000
this is pretty much the same interface
00:23:07.740
that the PG Dom command that we know but
00:23:10.140
it's called PhD
00:23:12.000
and we can promote saying the Halls user
00:23:14.400
and where do we want this file to be
00:23:16.620
safe
00:23:18.659
what else we can do data generalization
00:23:21.179
that's very interesting because
00:23:22.380
sometimes
00:23:24.780
let's take a look at this example this
00:23:26.520
is like a fake medical data
00:23:29.120
we have the social number which I know
00:23:32.340
we shouldn't disclose this information
00:23:34.640
and we have a first name we have a zip
00:23:37.080
code which if we remember correctly
00:23:40.580
we have to take care of data related to
00:23:43.860
address when the data is smaller than a
00:23:48.059
state I mean we can say oh this person
00:23:50.580
is from this country or the state or or
00:23:52.919
smaller we have to to take care of that
00:23:55.200
same code is a small organic state
00:23:58.440
uh the Verde okay we we saw that dates
00:24:02.460
are something that we have to protect
00:24:04.020
and we have the disease again this is
00:24:06.179
very confidential data we can use
00:24:08.960
materialized view combined with this
00:24:11.460
tool so we can create a materialized
00:24:14.100
view that it's anonymized uh pretty much
00:24:16.860
this is like a big description of how
00:24:18.840
that would work we have a generalized
00:24:21.659
uh in a Range a weight okay this is the
00:24:25.679
column that we want to generalize
00:24:28.620
and this is the the range we want I'm
00:24:31.500
pretty much the same if you take a look
00:24:33.240
at the other words
00:24:35.880
we can say okay we want to generalize
00:24:38.280
this uh in the range of a decade so how
00:24:42.840
the final data looks like
00:24:45.419
we don't have necessary social security
00:24:47.340
number the first name is redacted
00:24:50.640
the ZIP code oh it's a zip code between
00:24:52.919
this one and this one so we have a range
00:24:55.140
of a thousand
00:24:56.280
Verde again we have a range of a decade
00:24:59.360
sometimes it's good enough maybe you
00:25:01.500
have to you want to make like a smaller
00:25:03.480
range like or like a larger one
00:25:06.960
um again we have an essays but we have
00:25:08.940
no idea which person has that that this
00:25:12.240
is
00:25:15.960
us wrapping thoughts take a look at the
00:25:18.360
top I really I'm really enjoying juicing
00:25:20.700
this tool
00:25:22.080
um
00:25:23.940
it's on git love that's a git love logo
00:25:26.279
is it's a fancy one
00:25:28.919
um just to close what are the takeaways
00:25:31.200
for this talk
00:25:33.480
um
00:25:34.500
first understand the reasons why someone
00:25:37.740
needs Data before saying yes or not and
00:25:40.679
it's the reason of the the title of the
00:25:43.020
talk it's from a meme that it says like
00:25:45.720
I don't know anything the Orcs are
00:25:47.460
coming oh I'm gonna close the door but
00:25:49.919
the queen is also coming oh I'm gonna
00:25:51.659
open the door and then what the queen is
00:25:53.760
with the Orcs oh I'm gonna open the and
00:25:55.740
or a little so it's like sometimes the
00:25:59.340
the decision is not just or not you have
00:26:02.100
to and it's something that I've learned
00:26:03.779
as consultant sometimes your client
00:26:07.679
knows what they want
00:26:09.840
and they ask for what they think it's
00:26:12.840
gonna give the answer but sometimes
00:26:15.360
that's not true what they are asking is
00:26:17.159
not what they need so it's our duty as
00:26:19.679
consultants to make the right and the
00:26:21.720
right questions to know exactly what
00:26:24.779
they want in the cases that that I show
00:26:27.779
it like maybe they don't want the whole
00:26:29.580
database they just want to do something
00:26:31.440
but they think they need the whole
00:26:34.620
but it's up to us to discover that it's
00:26:36.960
not the case or maybe it is at that
00:26:39.179
point we need to know if if we proceed
00:26:41.400
or not
00:26:43.320
if Justified provide only what it's
00:26:45.600
needed without risking your users
00:26:47.340
information it's what we we saw with two
00:26:51.320
uh data reaches uh the one in the states
00:26:55.320
in the one in Mexico and you can find a
00:26:56.940
lot of cases all around the world I'm
00:26:58.679
sure it's not the only two that you can
00:27:01.140
find online
00:27:02.520
and the last one is again this talk is
00:27:05.940
about a very specific tool for passwords
00:27:08.940
but regardless of the tool be careful
00:27:11.400
with the data you have
00:27:13.020
once out of the server it's hard to
00:27:16.080
protect you don't have control about how
00:27:17.760
people takes care of the data you give
00:27:19.380
to them is pretty much when you blow the
00:27:21.600
picture to the internet
00:27:23.100
you don't have control in that so it's
00:27:25.380
pretty much the same with data once you
00:27:26.820
give a copy you don't know if they are
00:27:28.980
they can make they are making more
00:27:30.960
copies you don't know if they are
00:27:32.640
uploading to a server and that copy is
00:27:36.659
not protected which is the case to
00:27:38.159
happen in the second test case or in the
00:27:41.400
second case
00:27:42.539
so be careful with the data that's the
00:27:45.059
that's the main lesson thank you