List

Open the gate a little: strategies to protect and share data

Open the gate a little: strategies to protect and share data

by Fernando Petrales

In this talk titled "Open the gate a little: strategies to protect and share data," Fernando Petrales explores the challenges of granting access to production data while maintaining security and compliance with regulations like HIPAA. He emphasizes the importance of protecting personally identifiable information, especially in industries like healthcare, and shares strategies for safely sharing data when necessary. Key points discussed include:

  • Understanding Data Restrictions: The need to comprehend the specific reasons why someone requests access to production data to ensure only necessary information is shared.
  • Health Regulations: An overview of HIPAA and its implications, emphasizing its role in protecting health information.
  • Case Studies: Petrales highlights significant incidents involving unauthorized access to sensitive data, such as unencrypted laptops leading to severe fines.
  • Anonymization Techniques: He suggests using data anonymization to share subsets of information securely, introducing tools like Possible Synonymizer for PostgreSQL to mask sensitive data.
  • Static Masking and Dynamic Masking: Petrales explains the concepts of static and dynamic masking, demonstrating how to change or hide sensitive data based on user roles or needs.
  • Generalization: He illustrates how data can be generalized to protect individuals while still allowing for necessary analysis or research.
  • Key Takeaways: It is crucial to understand what data is actually needed before granting access and ensure that only minimum necessary data is shared to reduce risk.

Ultimately, Petrales emphasizes the need for careful consideration when handling sensitive data, underlining that data once out of production control can be hard to protect. Participants are reminded that being cautious with data distribution is paramount to maintaining privacy and compliance.

Open the gate a little: strategies to protect and share data
Can you name a more terrifying set of three words in software development than "HIPAA violation fines"? I bet you can't.

We know we know we must protect access to our information at all costs, sometimes we need to provide access for legitimate reasons to our production data and this brings a dilemma to us: how to do it while minimizing the risks of data leakage.

In this talk I'll share some strategies that can give you some guidance on when to close the door, when to open the door and when to open the door to your information a little

RailsConf 2022

00:00:00.900 foreign
00:00:12.660 let's start
00:00:14.120 a little bit more about me my name is
00:00:17.880 Fernando or Fair because I know we are
00:00:20.939 all about productivity so
00:00:23.520 that name takes three times shorter so
00:00:25.980 that's good my last name is Perales
00:00:29.640 which I came to the realization one
00:00:32.340 month ago that it's
00:00:34.140 Spanish for pear trees Pluto and I don't
00:00:39.120 like Earth
00:00:41.820 I'm coming from Guadalajara Mexico is
00:00:43.739 not very far from here there's a flight
00:00:45.540 four hour flight it's a nice place and
00:00:49.440 I've been doing pretty much eight years
00:00:51.500 doing program rates mostly Consulting
00:00:55.260 of those eight years five months were on
00:00:58.440 us working at a startup just five months
00:01:01.079 I didn't like the startup life
00:01:05.540 in the part of the Boost team and I also
00:01:09.180 host the Ruby MX community
00:01:11.159 probably you saw something in the in the
00:01:14.520 schedule regarding like Meetup slash
00:01:17.280 community so we recorded that yesterday
00:01:19.740 it was really cool to to meet more
00:01:21.780 people who happens to speak Spanish
00:01:24.540 and this is my fifth railsconf versus a
00:01:28.020 speaker so it's really important for me
00:01:31.200 and the picture is not really a picture
00:01:33.600 it's an illustration by Sarah that's
00:01:36.420 your Instagram like your space that's a
00:01:39.540 nice illustration
00:01:41.880 uh so yeah let's do some warm-up
00:01:44.220 questions
00:01:45.420 raise your hand
00:01:47.579 if
00:01:49.740 you have that access to a production
00:01:52.020 server or database
00:01:54.140 that's interesting
00:01:58.619 raise your hand if you could feel more
00:02:01.079 comfortable not having access to that
00:02:03.360 Production Service yeah it's a big
00:02:07.380 responsibility to to have the case of
00:02:09.720 the kingdom are responsible
00:02:12.300 again raise your hand if you are
00:02:14.340 comfortable with the security
00:02:15.480 measurements your organization takes
00:02:18.900 then you say like okay I can go to sleep
00:02:21.180 every night no problem I know if there's
00:02:23.879 an issue someone's going to take care of
00:02:25.980 it okay not a lot of hands that was
00:02:28.379 expected that's good
00:02:30.959 um regardless of your answer this might
00:02:33.599 not be the talk for you
00:02:35.400 there is more capable people who can
00:02:37.620 help you or your organization to prevent
00:02:40.440 undecided access to your data service
00:02:42.360 from Outsiders also known as hackers
00:02:45.900 uh there are Consulting companies that
00:02:48.000 make a living and are very good in
00:02:49.680 letting you know what you can improve
00:02:51.420 Guang hire one be ready for the wars Tom
00:02:55.680 assume that because you're a small
00:02:57.480 company you are not a target of interest
00:02:59.280 for hackers however
00:03:03.000 raise your hand if
00:03:05.160 you have half a copy of production data
00:03:08.099 in your machine
00:03:10.379 okay it's interesting Maurice have have
00:03:13.080 the people
00:03:14.819 raise your hand is someone from your
00:03:17.040 organization has asked you for a copy of
00:03:19.800 production data
00:03:22.560 okay interesting
00:03:25.200 raise your hand if you have provided a
00:03:27.780 copy of production data to someone in
00:03:30.060 your organization
00:03:32.400 no judgment so feel free to raise your
00:03:34.620 hands
00:03:35.900 last question if raise your hand if you
00:03:39.239 are concerned about copies of production
00:03:41.280 data being in someone's hands
00:03:44.099 yeah so if you answered yes to at least
00:03:47.159 one this is a talk for you
00:03:49.860 the inspiration of this talk some cases
00:03:53.299 I think the reason that you decided to
00:03:56.519 attend this talk
00:03:57.959 is this thing
00:04:01.140 what is this the health insurance
00:04:03.180 portability and accountability Act
00:04:05.879 uh it's a United States federal
00:04:09.140 federal statute signing to law in 1996
00:04:13.400 it pretty much modernizes the flow of
00:04:15.900 healthcare information it depletes how
00:04:18.299 personally identifiable information
00:04:21.260 maintained by Healthcare and healthcare
00:04:23.759 insurance Industries should be protected
00:04:25.620 from froth and fifth uh in general it
00:04:29.699 prohibits Healthcare Providers and
00:04:31.380 businesses from this closing protected
00:04:34.380 information to anyone other than a
00:04:36.419 patient and the patient authorized
00:04:39.060 people
00:04:40.340 and this term it's also related to to
00:04:44.340 that HIPAA term the Phi protected health
00:04:49.020 information what is that
00:04:51.139 also known as personal health
00:04:53.160 information it's the demographic
00:04:55.620 information medical stories test results
00:04:58.259 laboratory results
00:05:00.000 uh mental health condition ensure
00:05:02.400 information alert and other data the
00:05:04.979 healthcare professionals connect to
00:05:07.139 identify an individual and determine
00:05:09.479 appropriate care
00:05:12.060 what is considered protective
00:05:14.400 information
00:05:16.199 pretty much all of this name address end
00:05:19.860 date and just rotate and maybe the date
00:05:23.520 that you were accepted into a hospital
00:05:27.300 maybe the the time you were
00:05:31.440 included into new insurance for number
00:05:34.620 fax number
00:05:36.180 I don't know if Fox is still a thing but
00:05:38.460 probably email address social security
00:05:40.680 number pretty much all of it
00:05:43.620 um but also the last point is also
00:05:46.320 important any other unique identifying
00:05:49.080 characteristics
00:05:50.280 that one is more tricky because maybe a
00:05:53.520 person who has a very uncommon tattoo in
00:05:57.240 a very specific part of the body that's
00:05:59.160 that makes that person easy to identify
00:06:01.800 so we have to protect also that kind of
00:06:04.080 data
00:06:04.979 and one of the cases that
00:06:08.100 call my attention is this one uh
00:06:10.380 lifespan on encrypted stolen laptop goes
00:06:13.259 lifespan more than one million in fees
00:06:16.100 pretty much the issue was an employee's
00:06:19.320 computer went missing with protected
00:06:21.180 health information of about 20 000
00:06:23.400 records and last time I checked in my
00:06:26.160 wallet I didn't have one million
00:06:28.319 thousand dollars to spare on a fee
00:06:31.020 and there's a keyboard here
00:06:33.620 unencrypted so you might think okay my
00:06:36.720 machines are great encrypted so we
00:06:38.940 shouldn't have an issue right
00:06:41.100 um somehow because if you have
00:06:44.720 protected information and you cannot
00:06:47.460 document and prove that the device was
00:06:49.560 encrypted you also need to to follow the
00:06:52.800 requirements uh to to for for a HIPAA
00:06:55.860 Bridge
00:06:57.120 and then you might think I don't have to
00:06:59.520 worry I don't have any health
00:07:00.840 information in my hand maybe I work for
00:07:03.180 fintech or maybe I do any other kind of
00:07:05.460 industry
00:07:06.600 am I safe if my app is not health
00:07:08.759 related
00:07:10.039 well
00:07:12.000 one of the nice things of consulting
00:07:13.919 which is what I've been doing for the
00:07:15.960 last eight years is that you might work
00:07:18.000 with clients from outside the states
00:07:20.840 this is equals to you have to worry
00:07:23.220 about local legislation so let's go back
00:07:26.280 to my country Mexico
00:07:28.139 we have this very long name I'm not
00:07:30.900 gonna say what it means in Spanish but
00:07:32.940 in English is the federal law of
00:07:34.740 protection of personal data help our
00:07:37.380 individuals and it was approved in 2010
00:07:41.819 I think
00:07:43.020 and it's it aims to regularize the rate
00:07:45.780 for uh to form informational
00:07:47.699 self-determination
00:07:50.099 um what it means is that companies such
00:07:51.780 Banks insurance companies hospitals and
00:07:54.060 schools telecommunication companies
00:07:56.520 religious organizations and any
00:07:59.580 professional such lawyers doctors and
00:08:01.680 others are required to comply with the
00:08:03.419 provisions of this law
00:08:04.860 which is very similar to hippies like
00:08:07.020 there's information that identifies a
00:08:09.960 person and you have to protect that
00:08:12.840 this brings another case uh and again
00:08:16.560 this is an interesting case because I
00:08:19.440 was affected and 93.4 million of
00:08:22.860 Mexicans were affected pretty much
00:08:25.280 uh what happened our personal
00:08:27.660 information of almost 94 million Mexican
00:08:30.180 Exposed on Amazon
00:08:31.979 what happened uh we have a vulner entity
00:08:36.060 that has a registration of pretty much
00:08:38.580 all adults in Mexico
00:08:40.620 and for some reason which I don't
00:08:43.320 understand they provide a copy of that
00:08:45.660 database to the political parties which
00:08:47.880 is around 10 political parties
00:08:50.360 and some of one of these parties upload
00:08:53.399 a copy to Amazon without protection a
00:08:56.820 mongodb interface of about
00:08:59.060 132 gigabytes
00:09:01.260 so how did it happen
00:09:03.899 as I mentioned so I'm going to blow
00:09:05.519 something that shouldn't have upload
00:09:08.899 #opsy oopsie
00:09:11.459 so the first lesson don't give
00:09:12.899 production copies to everyone
00:09:14.880 and that could be pretty much the end of
00:09:16.860 the talk but that's the safest thing to
00:09:20.040 do don't give don't let your data go out
00:09:22.440 from your production servers but of
00:09:24.480 course you're not coming to to hear that
00:09:27.240 but what if what if we can provide only
00:09:30.500 what it's needed
00:09:33.899 um and there's uh a general term that we
00:09:38.279 can use or that it will know which is
00:09:41.040 anonymization of data if we think about
00:09:44.600 the reasons why someone
00:09:47.760 from organization requires access to
00:09:50.339 production data
00:09:51.839 we can most of the time realize that
00:09:54.420 they don't want the whole uh the whole
00:09:57.120 data they just need a subset of
00:09:58.680 information
00:10:00.080 and maybe they can meet everything or
00:10:03.120 just specific parts or or they need to
00:10:06.839 do some research on their data maybe a
00:10:10.440 data scientists need to get access to
00:10:12.360 your data sets to do some calculation so
00:10:15.120 I don't know it's up to your
00:10:17.040 organization
00:10:18.120 but we can do we can do some data and
00:10:21.080 which we must do some data anonymization
00:10:24.060 before giving a copy of production data
00:10:27.000 in case we decided that we have to give
00:10:29.760 and there's a tool that I've been using
00:10:31.620 recently which is this one possible
00:10:34.140 synonymizer uh what is this is an
00:10:37.320 extension to mask or replace personal
00:10:40.560 identifiable information or any
00:10:43.440 commercial sensitive data
00:10:45.380 by the name you can guess that this
00:10:47.519 works only for part for postgres
00:10:50.220 which is most probably the jury using
00:10:53.160 Universe application
00:10:54.600 so there's a repo with a demo so you
00:10:58.380 don't have to follow everything on this
00:11:01.380 on this section you can take a look at
00:11:03.180 that report and I'm going to share the
00:11:04.680 link on Twitter and Slack
00:11:07.140 so if you miss anything you can go to
00:11:09.600 the repo and take a look at what's going
00:11:11.339 on
00:11:13.500 for this case
00:11:15.180 I have a sample application and as you
00:11:18.240 can see it's a
00:11:19.560 vanilla rails application
00:11:21.899 uh that has a table called users uh with
00:11:27.360 a we have an ID we have a first name a
00:11:30.959 last name uh Street line one a straight
00:11:33.420 line two a zip code an email salary
00:11:37.079 incense that's just made up numbers
00:11:39.120 doesn't make sense that someone earns
00:11:41.700 200 cents
00:11:44.360 but it's just like a sample of the kind
00:11:47.040 of data that we work uh regularly so
00:11:50.839 again let's see someone needs a copy of
00:11:54.120 this data maybe they want to do
00:11:56.880 I don't know just like uh they want to
00:11:59.940 take a look at the structure a lot of
00:12:01.320 ways they want to maybe do some
00:12:03.779 calculation with the salaries maybe they
00:12:05.760 want they need to calculate bonuses they
00:12:09.660 need to compare the finances of the
00:12:12.600 company maybe they need to I don't know
00:12:15.180 there's a lot of things that you can do
00:12:16.560 with data I'm not a data scientist but
00:12:19.500 I'm sure that they have a lot of good
00:12:22.140 use cases to to get this information
00:12:25.860 so what's the first thing again don't
00:12:27.720 worry you can take a look at this at the
00:12:29.579 example
00:12:30.720 so it's a possible extension that's
00:12:33.839 pretty straightforward to style you use
00:12:35.700 cloud extension it's dragging git love
00:12:38.040 open source of course
00:12:39.740 make install make extension make install
00:12:43.160 and then we have to enable the extension
00:12:46.500 here in our database
00:12:48.420 so yeah pretty much we have to do some
00:12:50.880 SQL commands to alter database in this
00:12:54.959 case I'm working in development so I
00:12:57.600 don't really have to to worry about
00:13:00.120 uh
00:13:01.519 messing something up be careful when
00:13:04.079 you're doing this in production and
00:13:06.120 we're gonna see an example
00:13:07.920 so pretty much altered database and we
00:13:10.380 are pre-loading the extension and then
00:13:12.480 we created extension I know if not exist
00:13:16.500 so okay now we have a an extension in
00:13:18.839 our database what can we do with it
00:13:21.420 um one of the things that I have over I
00:13:24.300 have found with more useful for this
00:13:26.880 tool is this static masking and what is
00:13:30.720 the term
00:13:31.980 uh Sometimes the best way to
00:13:35.160 to deal or transform the original data
00:13:37.800 set is pretty much destroy the local
00:13:40.740 copy that you have so what it means
00:13:43.980 you're someone is getting a copy from
00:13:46.560 the database install in another server
00:13:49.260 maybe locally and then transform that
00:13:52.079 data so it's not recognizable
00:13:55.440 and this works very good for local
00:13:57.779 copies of the data again it's a copy if
00:14:00.120 we destroy anything
00:14:01.860 we don't have to worry the real data is
00:14:03.720 in production the the advantage of using
00:14:07.260 static masking is that if we're computed
00:14:09.720 if our computer is stolen
00:14:13.260 if it's a great encrypted or not the
00:14:15.540 data is anonymized so we have something
00:14:18.240 less to worry about and again as a note
00:14:21.000 don't run this in production because you
00:14:23.639 are going to actually destroy the data
00:14:25.980 and what are the strategies that we can
00:14:28.019 do for static masking this tool allows
00:14:31.500 to do three
00:14:33.540 which is applying masking rules
00:14:36.360 we can do some shortening in columns
00:14:39.240 this uh we can have noise to numerical
00:14:42.839 or date values so let's go let's see an
00:14:46.920 example of each one flying masking rules
00:14:50.699 we have an extension set up in our
00:14:52.980 database so let's go let's connect to
00:14:56.459 our database in this case this in the
00:14:58.699 database name for example
00:15:01.320 we initially initialize extension I
00:15:04.440 don't need and then we can Define some
00:15:06.660 rules
00:15:07.680 um when I show this example to to a
00:15:11.639 co-worker they may they mention oh that
00:15:14.639 looks like like Faker but for your
00:15:17.040 database and that's true
00:15:21.540 um this is how unifying rules impulse
00:15:23.940 research advisor you have probably you
00:15:26.760 have seen the security level we're going
00:15:28.620 to talk a little bit about that security
00:15:31.139 label for the name of extension and then
00:15:33.959 we Define uh okay on this column
00:15:37.560 uh data table users column first name we
00:15:42.540 want to mask the information in this
00:15:45.180 column with with a function what is the
00:15:48.060 name of the function and fake first name
00:15:51.480 and this is how we are defining a rule
00:15:53.639 to to that Google
00:15:55.440 and again this security label it's uh
00:15:58.320 it's a very interesting if you're using
00:16:00.420 postgres which you probably have haven't
00:16:03.420 heard about this
00:16:05.699 um
00:16:06.480 this framework it's a security framework
00:16:09.060 and it's inside of postgres pretty much
00:16:11.279 what it lets you do is uh you can
00:16:14.160 achieve like very fine-grained control
00:16:16.680 and security control on your data
00:16:20.639 um for example as a result of using this
00:16:23.160 you can make that some users
00:16:26.459 can only see office created data
00:16:29.399 oh in this case I'm just pretty much
00:16:32.040 doing more more security uh labels for
00:16:36.240 this example I'm doing pretty much the
00:16:38.220 same for last name
00:16:39.779 in the column last name
00:16:41.820 um just using a fake last name
00:16:45.019 for just a straight line maybe I don't
00:16:48.120 care about the street so I'm just
00:16:50.160 grading a hardcore value confidential
00:16:53.880 and the same for a straight line too for
00:16:56.579 this example I'm I'm only care about the
00:16:58.980 numbers and I'm not really care about
00:17:01.740 what is the salary of each person and
00:17:03.839 where they live pretty much the same for
00:17:06.000 streetline too I'm pretty much the same
00:17:08.160 for zip code I don't really need to
00:17:09.900 provide the real zip code of the person
00:17:11.640 so I can use a random typical
00:17:15.419 and finally I would like to
00:17:18.319 anonymize or scramble a little bit the
00:17:21.660 email because again the email it's if
00:17:24.179 you have one person's email
00:17:26.819 and you put that into any uh search
00:17:30.240 engine you can find interesting
00:17:31.980 information about the person so I also
00:17:34.260 want to anonymize that and this
00:17:36.419 extension provides this function partial
00:17:38.280 email and again I provide just the name
00:17:40.620 of the of the column that I want to
00:17:43.380 it's one running mice and then at the
00:17:46.320 end I just run anonymized database
00:17:48.539 remember this will destroy the real data
00:17:52.200 it's not making a copy of your database
00:17:54.000 that's up to you to make a copy so this
00:17:56.700 will pretty much destroy uh this will
00:17:59.400 mutate the the database
00:18:01.980 what is the result
00:18:03.780 this is pretty much how it looks like
00:18:05.460 one once I run this command if you see
00:18:08.580 first name and last name it's like
00:18:10.380 completely different at the top is the
00:18:12.660 original data set and the bottom is a
00:18:15.240 randomized one we have the straight line
00:18:17.280 one and two confidential we have random
00:18:20.340 cheat codes zip codes and the email it's
00:18:22.740 pretty cool because it's partial I don't
00:18:25.740 know as we have we know that it starts
00:18:27.900 with
00:18:28.799 f e and then there's an ad but if you
00:18:31.980 compare with the with the real emails
00:18:34.440 it's not even the same length it's not
00:18:38.039 even the same the domain is hidden we
00:18:41.700 can Define if we if we want to
00:18:44.220 crystallize how we're an amazing we can
00:18:46.620 do that but the default function this is
00:18:48.660 what it does which is good enough for me
00:18:51.419 the salary I didn't make anything with
00:18:53.400 the salary for now because I care about
00:18:55.140 these numbers I just don't want to
00:18:58.380 associate a person with a salary so if
00:19:01.740 you see this data set you have no idea
00:19:03.780 who these people is they don't exist to
00:19:06.360 be honest
00:19:07.400 the salary isn't done so I can make I
00:19:10.380 can make calculations with the salary
00:19:11.700 without disclosing whose salary belongs
00:19:15.900 to which person
00:19:18.179 I can also shorten columns this is also
00:19:21.419 interesting because again when we are
00:19:23.340 doing calculations
00:19:25.080 we sometimes don't care about where
00:19:28.140 those numbers come from but we need to
00:19:29.940 use the real numbers
00:19:32.039 for shuffling columns again we have a
00:19:33.900 tool Shuffle columns we provide the name
00:19:37.559 of the table the name of the column and
00:19:40.740 we provide a primary key so they are the
00:19:43.980 the function can know how it's going to
00:19:47.100 relate to everything
00:19:48.600 what is the result of this
00:19:50.700 it's pretty much similar in this case I
00:19:52.559 didn't
00:19:53.480 anonymize the the data but if you take a
00:19:57.299 look at the salary sense column
00:19:59.520 it's not the original people who have
00:20:02.580 the salary so I don't affect the
00:20:04.860 calculations if I make uh if I sum the
00:20:09.179 the salaries I'm gonna get exactly the
00:20:11.700 same result if I do pretty much medium
00:20:14.700 or any other operation I get the same
00:20:17.039 results but what is the advantage the
00:20:19.919 real salary is not associated to the
00:20:21.900 real person this might be one of the
00:20:24.000 cases that I want to solve
00:20:26.580 another interesting function adding
00:20:29.160 noise to a column what is the concept of
00:20:31.799 noise in this case
00:20:33.660 again I'm going to add some noise to the
00:20:37.020 table users in the columns already sent
00:20:39.960 and there's a
00:20:42.020 point and what it means uh this function
00:20:46.620 is going to change the values of that
00:20:48.360 column
00:20:49.559 randomly plus minus 10 percent
00:20:55.500 again this is the
00:20:57.620 the result it's spring if you compare we
00:21:01.799 have the same order we're going to
00:21:03.720 change the order
00:21:04.980 but the number is not exactly the same
00:21:07.020 so we are we're adding some noise to the
00:21:09.900 data uh I know that this is sometimes
00:21:12.299 useful for people who train data models
00:21:15.179 that okay I want to have just model with
00:21:17.100 noise you can use that instead of using
00:21:19.500 the real but you know even though you
00:21:22.500 don't know or you don't disclose the
00:21:24.360 real salary you know that this number is
00:21:27.299 between plus minus 10 percent
00:21:31.260 this is pretty much a part of the static
00:21:33.659 and it's what I've been found more
00:21:36.059 useful so far but it's not the only
00:21:38.220 thing that the two can do we'll have
00:21:40.440 Dynamic masking this is very interesting
00:21:42.240 because this can play together which was
00:21:45.299 very in a very interesting way what is
00:21:47.880 dynamic masking that I can hide some
00:21:50.400 data from a role a date of his role by
00:21:53.159 declaring that role as mask
00:21:55.559 and then I can Define rules so when that
00:21:58.440 database database raw connects to the
00:22:00.900 database it's gonna it's gonna get the
00:22:03.659 The Mask version of the data and not the
00:22:06.000 real one
00:22:07.440 how does it work it's pretty much the
00:22:09.480 same we started Dynamic masking
00:22:12.539 in this case we're creating a role and a
00:22:15.000 security label for that role restricted
00:22:17.400 we'll ask
00:22:18.559 and again if we if we open SQL and we
00:22:22.679 join with this use with this role we are
00:22:25.620 gonna see the data with the rules that
00:22:27.659 we have defined but we if we use another
00:22:29.460 user we're going to get the real data
00:22:31.919 we can do some very interesting with
00:22:34.799 trades that we have now multiple
00:22:36.360 database connections we can make one
00:22:37.860 connection with the real user and one
00:22:39.600 with a restricted one
00:22:41.460 and also another tool that it's very
00:22:43.320 interesting Anonymous database Doms
00:22:46.520 it's pretty much the what you think it
00:22:49.500 is it's a Grappler for PG town
00:22:53.280 and this this rubber is designed to
00:22:55.500 export The Mask data so instead of
00:22:57.900 getting the data and then processing you
00:23:00.419 can export the data directly with the
00:23:03.419 with the rules that you have defined and
00:23:06.000 this is pretty much the same interface
00:23:07.740 that the PG Dom command that we know but
00:23:10.140 it's called PhD
00:23:12.000 and we can promote saying the Halls user
00:23:14.400 and where do we want this file to be
00:23:16.620 safe
00:23:18.659 what else we can do data generalization
00:23:21.179 that's very interesting because
00:23:22.380 sometimes
00:23:24.780 let's take a look at this example this
00:23:26.520 is like a fake medical data
00:23:29.120 we have the social number which I know
00:23:32.340 we shouldn't disclose this information
00:23:34.640 and we have a first name we have a zip
00:23:37.080 code which if we remember correctly
00:23:40.580 we have to take care of data related to
00:23:43.860 address when the data is smaller than a
00:23:48.059 state I mean we can say oh this person
00:23:50.580 is from this country or the state or or
00:23:52.919 smaller we have to to take care of that
00:23:55.200 same code is a small organic state
00:23:58.440 uh the Verde okay we we saw that dates
00:24:02.460 are something that we have to protect
00:24:04.020 and we have the disease again this is
00:24:06.179 very confidential data we can use
00:24:08.960 materialized view combined with this
00:24:11.460 tool so we can create a materialized
00:24:14.100 view that it's anonymized uh pretty much
00:24:16.860 this is like a big description of how
00:24:18.840 that would work we have a generalized
00:24:21.659 uh in a Range a weight okay this is the
00:24:25.679 column that we want to generalize
00:24:28.620 and this is the the range we want I'm
00:24:31.500 pretty much the same if you take a look
00:24:33.240 at the other words
00:24:35.880 we can say okay we want to generalize
00:24:38.280 this uh in the range of a decade so how
00:24:42.840 the final data looks like
00:24:45.419 we don't have necessary social security
00:24:47.340 number the first name is redacted
00:24:50.640 the ZIP code oh it's a zip code between
00:24:52.919 this one and this one so we have a range
00:24:55.140 of a thousand
00:24:56.280 Verde again we have a range of a decade
00:24:59.360 sometimes it's good enough maybe you
00:25:01.500 have to you want to make like a smaller
00:25:03.480 range like or like a larger one
00:25:06.960 um again we have an essays but we have
00:25:08.940 no idea which person has that that this
00:25:12.240 is
00:25:15.960 us wrapping thoughts take a look at the
00:25:18.360 top I really I'm really enjoying juicing
00:25:20.700 this tool
00:25:22.080 um
00:25:23.940 it's on git love that's a git love logo
00:25:26.279 is it's a fancy one
00:25:28.919 um just to close what are the takeaways
00:25:31.200 for this talk
00:25:33.480 um
00:25:34.500 first understand the reasons why someone
00:25:37.740 needs Data before saying yes or not and
00:25:40.679 it's the reason of the the title of the
00:25:43.020 talk it's from a meme that it says like
00:25:45.720 I don't know anything the Orcs are
00:25:47.460 coming oh I'm gonna close the door but
00:25:49.919 the queen is also coming oh I'm gonna
00:25:51.659 open the door and then what the queen is
00:25:53.760 with the Orcs oh I'm gonna open the and
00:25:55.740 or a little so it's like sometimes the
00:25:59.340 the decision is not just or not you have
00:26:02.100 to and it's something that I've learned
00:26:03.779 as consultant sometimes your client
00:26:07.679 knows what they want
00:26:09.840 and they ask for what they think it's
00:26:12.840 gonna give the answer but sometimes
00:26:15.360 that's not true what they are asking is
00:26:17.159 not what they need so it's our duty as
00:26:19.679 consultants to make the right and the
00:26:21.720 right questions to know exactly what
00:26:24.779 they want in the cases that that I show
00:26:27.779 it like maybe they don't want the whole
00:26:29.580 database they just want to do something
00:26:31.440 but they think they need the whole
00:26:34.620 but it's up to us to discover that it's
00:26:36.960 not the case or maybe it is at that
00:26:39.179 point we need to know if if we proceed
00:26:41.400 or not
00:26:43.320 if Justified provide only what it's
00:26:45.600 needed without risking your users
00:26:47.340 information it's what we we saw with two
00:26:51.320 uh data reaches uh the one in the states
00:26:55.320 in the one in Mexico and you can find a
00:26:56.940 lot of cases all around the world I'm
00:26:58.679 sure it's not the only two that you can
00:27:01.140 find online
00:27:02.520 and the last one is again this talk is
00:27:05.940 about a very specific tool for passwords
00:27:08.940 but regardless of the tool be careful
00:27:11.400 with the data you have
00:27:13.020 once out of the server it's hard to
00:27:16.080 protect you don't have control about how
00:27:17.760 people takes care of the data you give
00:27:19.380 to them is pretty much when you blow the
00:27:21.600 picture to the internet
00:27:23.100 you don't have control in that so it's
00:27:25.380 pretty much the same with data once you
00:27:26.820 give a copy you don't know if they are
00:27:28.980 they can make they are making more
00:27:30.960 copies you don't know if they are
00:27:32.640 uploading to a server and that copy is
00:27:36.659 not protected which is the case to
00:27:38.159 happen in the second test case or in the
00:27:41.400 second case
00:27:42.539 so be careful with the data that's the
00:27:45.059 that's the main lesson thank you