List

How to migrate to Active Storage without losing your mind

How to migrate to Active Storage without losing your mind

by Colleen Schnettler

The video titled "How to migrate to Active Storage without losing your mind," presented by Colleen Schnettler at RailsConf 2019, explores the challenges and solutions involved in migrating a production application from Paperclip to Active Storage in Ruby on Rails. Colleen shares her journey of migrating a Rails application utilizing Amazon S3 for storage and provides insights into the inner workings of Active Storage.

Key Points Discussed:
- Introduction to Active Storage:

Active Storage allows Rails applications to easily attach files to Active Record objects and store these files in cloud-based storage providers. It's important to migrate because Active Storage is now the default solution for handling file uploads in Rails, and Paperclip has been deprecated.

  • Migration Steps:

    1. Install Active Storage and configure:
      Colleen explains the installation process, configuring cloud storage, and creating necessary tables in the database.
    2. Moving Data:
      The process of migrating data from the existing user table (where Paperclip stores attachments) to the new Active Storage tables (attachments and blobs) is critical.
    3. Writing a Rake Task:
      She discusses writing a Rake task to facilitate this migration, emphasizing understanding the structure of the existing data and mapping it correctly to the new tables.
  • Common Pitfalls:

    Colleen elaborates on potential issues, particularly focusing on the correct handling of keys and checksums when moving files to Amazon S3. She highlights how data relationships defined by Paperclip need careful consideration to avoid errors during migration.

  • Testing the Migration:

    After running the Rake task, it’s essential to verify the migration's success by checking that the correct number of records exists in the new tables and optionally peeking into the database to ensure data integrity.

  • Variant Processing:

    Active Storage's support for image variances offers new possibilities for image sizing, although she notes some limitations with image processing capabilities of the previous tools compared to the new ones.

  • Conclusion and Best Practices:

    Colleen concludes with a summary of her steps and stresses the necessity of validating each phase of the migration. She points out that while Active Storage may seem straightforward to use, it requires a solid understanding of different configurations and careful handling of existing data to ensure a successful transition.

By the end of this talk, attendees are equipped with knowledge on how to approach their own migrations confidently, avoiding common pitfalls and ensuring a smooth transition to Active Storage.

Main Takeaways:

- Active Storage is a robust solution for handling file attachments in modern Rails applications.

- Migration requires careful planning, awareness of database structures, and writing scripts for data mapping.
- Ensuring data integrity during the migration process is essential for a successful transition.

RailsConf 2019 - How to migrate to Active Storage without losing your mind by Colleen Schnettler
_______________________________________________________________________________________________
Cloud 66 - Pain Free Rails Deployments
Cloud 66 for Rails acts like your in-house DevOps team to build, deploy and maintain your Rails applications on any cloud or server.

Get $100 Cloud 66 Free Credits with the code: RailsConf-19
($100 Cloud 66 Free Credits, for the new user only, valid till 31st December 2019)

Link to the website: https://cloud66.com/rails?utm_source=-&utm_medium=-&utm_campaign=RailsConf19
Link to sign up: https://app.cloud66.com/users/sign_in?utm_source=-&utm_medium=-&utm_campaign=RailsConf19
_______________________________________________________________________________________________
Active storage works seamlessly in new rails applications - but how many of us only work on new applications? Migrating to Active Storage can be a daunting task on a production application. This talk will explain active storage, why you might want to use it, how it modifies your database, and the benefits and drawbacks of migrating your existing application. I’ll walk you through my painful journey migrating an existing application. You will leave this talk with a better understanding of the inner workings of active storage and with the confidence to tackle your own migration. This talk is appropriate for all levels of skill and no prior experience or knowledge of active storage is required.

RailsConf 2019

00:00:21.260 hi everyone
00:00:22.920 how about this amazing Rails comm 2019
00:00:26.789 I have hope you I hope you all have had
00:00:34.110 an amazing week like I have and I really
00:00:36.629 appreciate you coming to my talk cuz I
00:00:38.550 know you're probably tired my name is
00:00:40.980 Colleen and I run a Ruby on Rails
00:00:43.170 consulting business I'm here today to
00:00:46.050 teach you a little bit about my
00:00:47.940 adventures poor misadventures as they
00:00:51.449 were migrating a production application
00:00:54.149 from shrine to active storage using
00:00:56.489 Amazon s3 storage actually you shrine
00:00:59.760 when I did this for my client but for
00:01:02.250 the purposes of this talk I'm gonna use
00:01:03.750 paperclip because the five people that
00:01:06.509 responded to my Twitter poll said they
00:01:08.160 used a paperclip more than shrine but
00:01:11.940 first I'd like to start with a little
00:01:13.530 story so how did I get here I was
00:01:17.729 contacted by a cool new startup looking
00:01:20.789 for a rails developer to do just that
00:01:22.649 migrate their solution from shrine to
00:01:24.330 active storage I was excited to work
00:01:26.550 with this company and this was the first
00:01:28.950 time I was going to get to use active
00:01:30.929 storage and I was very excited to use
00:01:32.789 active storage I had actually attended
00:01:35.729 the active storage talk I believe it was
00:01:37.860 railsconf last year so I was feeling
00:01:40.229 quite confident in my ability to migrate
00:01:42.929 this application to active storage so
00:01:45.660 for those of you who are not yet on
00:01:47.520 rails 5.2 let's start with what is
00:01:51.360 active storage so active storage is an
00:01:55.979 easy way to attach files to active
00:01:59.009 record objects and store those files in
00:02:01.530 cloud-based storage have you ever needed
00:02:04.530 to add an avatar to a user or maybe a
00:02:08.610 resume to an applicant active storage
00:02:11.310 helps you take care of all of those file
00:02:14.129 attachment needs well that's great
00:02:17.069 Colleen but paperclip is working fine
00:02:19.470 for me why should I go through the
00:02:21.540 trouble of switching well that's a good
00:02:23.610 question why should you migrate to
00:02:26.819 active storage well the first and
00:02:30.209 possibly most important reason
00:02:32.310 is because active storage is now the
00:02:34.560 built-in solution for handling file
00:02:37.200 uploads to cloud storage in rails
00:02:40.100 supports Amazon Google and Microsoft and
00:02:44.270 this one's fun there's no additional
00:02:47.280 migrations needed maybe if you remember
00:02:49.680 with paperclip every time you add a new
00:02:52.170 file you have to write a new migration
00:02:54.980 active storage is different it doesn't
00:02:56.910 work that way and if I still haven't
00:02:59.610 convinced you paperclip is deprecated so
00:03:02.400 you're out of luck so I accepted the
00:03:06.209 contract and the first thing I did was I
00:03:08.790 went and looked at the active storage
00:03:10.590 Doc's so in my experience the
00:03:14.430 documentation for rails is usually
00:03:16.380 excellent and active storage appeared to
00:03:19.019 be no different step one install active
00:03:22.110 storage step to configure cloud storage
00:03:26.120 step 3 add an attachment to a model and
00:03:30.750 step 4 let the magic of rails
00:03:33.650 extrapolate away all of the heavy
00:03:35.610 lifting for you and it just works well
00:03:39.180 has anyone tried to migrate an
00:03:41.459 application to act of storage following
00:03:43.260 these steps if you have tried you might
00:03:47.100 know that implementing active storage in
00:03:49.980 a new application you can follow the
00:03:52.290 steps and it is relatively easy but
00:03:56.540 migrating to active storage can be quite
00:03:59.220 challenging why is that
00:04:03.650 well active storage is fundamentally
00:04:08.040 different from paperclip paperclip works
00:04:11.730 by attaching file data to the user table
00:04:15.120 so for example here we have an avatar on
00:04:18.900 a user so if we added an avatar to our
00:04:22.019 user using paperclip it's going to
00:04:24.180 change the users table
00:04:26.220 it adds these four columns to your users
00:04:29.400 table I didn't include the whole table
00:04:31.110 here so you could actually see what
00:04:32.729 paperclip does at the store they're just
00:04:35.490 different active storage creates two new
00:04:39.300 tables the active storage attachments
00:04:42.270 table and the active storage blobs table
00:04:45.810 so if we revisit our steps I'm gonna say
00:04:50.610 that step 3 had an attachment to a model
00:04:53.490 while active storage is not going to be
00:04:55.200 able to access the data since there's
00:04:57.720 currently nothing in your active storage
00:04:59.730 tables but we can't do step 3 yet but we
00:05:04.980 can do step 1 and step 2 so step 1 is
00:05:12.169 install active storage create the tables
00:05:16.669 and then you need to configure your
00:05:20.820 cloud storage so the way this is set up
00:05:23.880 right right here is we have an Amazon
00:05:26.130 which is going to be our production
00:05:27.630 storage and Amazon Dev which is our dev
00:05:30.810 storage I created this little contrived
00:05:32.940 example for this talk so you can see I
00:05:35.220 came up with a very clever bucket name
00:05:36.840 they're a really fun bucket for Colleen
00:05:38.640 which was unique
00:05:39.660 so go me but when we did this on our
00:05:41.970 production application this is how we
00:05:43.889 had it set up and wet as well and it's
00:05:46.229 really going to depend on your setup but
00:05:48.330 I would highly recommend testing this on
00:05:50.070 a dev bucket on your cloud storage
00:05:52.590 provider and after you configure it in
00:05:58.560 storage DML you then have to configure
00:06:00.570 it on a per environment basis so what
00:06:03.720 I'm showing you here is development and
00:06:05.640 as I said you're gonna configure
00:06:07.289 configure it to use Amazon dev and
00:06:09.900 production would be using Amazon oh so
00:06:13.530 okay great so that took like one minute
00:06:16.700 so at this point you already have active
00:06:20.370 storage installed and now your active
00:06:23.370 storage tables exist in your database so
00:06:27.510 let's talk about step three I have
00:06:29.700 changed step three to say move Avatar
00:06:33.120 data from the user table to the active
00:06:36.810 storage tables well how do we move data
00:06:42.390 from one table to another in our
00:06:45.060 database a rake task so we are gonna
00:06:50.910 write a rake task together and let's
00:06:53.760 talk about this rake task we're gonna be
00:06:56.400 moving a good amount of data and we're
00:06:58.500 not it's not a
00:06:59.160 one-to-one because we have one user
00:07:00.900 table and two active storage tables so
00:07:03.030 we're also going to be mapping some data
00:07:04.680 so the only way to make this work is to
00:07:07.470 understand what we are doing I don't
00:07:09.510 really think there's a copy and paste
00:07:10.800 solution for this particular problem so
00:07:13.710 let's talk a little more about what we
00:07:16.470 are trying to do so we are moving this
00:07:21.990 data from the users table which I'm
00:07:26.790 going to show you again to the active
00:07:30.870 storage attachments and active storage
00:07:32.340 blobs and we're technically copying it
00:07:34.830 over there for now but so I don't know
00:07:38.370 about you but I find reaching into my
00:07:41.460 database with sequel to change records
00:07:43.830 on a production application to be a
00:07:45.900 little bit scary plus I was told I
00:07:49.440 wasn't gonna have to write sequel this
00:07:53.850 is from last year so this is recent but
00:07:57.000 alas it seems to be the case here so
00:08:00.560 Before we jump into what the rake task
00:08:03.630 is going to be let's talk about the
00:08:06.570 active storage tables because as I said
00:08:08.760 you really need to understand what
00:08:09.960 you're doing here so the first table I
00:08:13.710 want to talk about is the active storage
00:08:16.169 attachments table we're gonna start with
00:08:19.860 a name which is the name of your
00:08:22.050 attachment in this case avatar then you
00:08:25.530 have your polymorphic Association
00:08:27.600 columns user and the user ID and then
00:08:31.350 you have your blob ID okay so that's
00:08:33.840 Table one now Table two is the blob
00:08:35.940 table so if we look at the blobs table
00:08:39.570 the key is the location of your current
00:08:43.380 file in Amazon s3 storage and then you
00:08:47.490 have your file name your content type
00:08:51.740 bitesize I don't know why I skipped that
00:08:53.760 one
00:08:53.940 and your checksum alright so how do
00:08:57.450 these tables relate to one another so
00:09:00.510 I'm going to do one table at a time so
00:09:02.370 on your left is the users table and on
00:09:06.420 your right is the active storage
00:09:07.710 attachments table so user becomes our
00:09:10.140 record type the ID
00:09:12.480 he becomes our record ID and the name
00:09:16.970 becomes just avatar okay so now I have
00:09:22.709 users table user table on the left and
00:09:25.440 the blobs table on the right and we have
00:09:28.350 avatar file name that's from our user
00:09:30.750 table is going to go to our blog as the
00:09:33.029 file name avatar content type is going
00:09:36.089 to go to the content type and file size
00:09:39.089 is gonna go to byte size so let's get
00:09:44.490 started on that rake task so the good
00:09:51.000 people of thoughtbot put together the
00:09:53.250 skeleton of a task that's an excellent
00:09:56.459 starting place as I mentioned they
00:09:59.070 actually use a migration as I mentioned
00:10:01.709 I would recommend using a rake task so
00:10:04.290 if we look at this if we look at this we
00:10:08.279 get our blob ID and then these two
00:10:10.910 statements are just defining our insert
00:10:14.160 statements so this is actually all
00:10:16.110 pretty cut and paste for you after that
00:10:21.630 what's happening here is we're looping
00:10:24.839 through all of the models and pulling
00:10:27.240 out the attachment names the important
00:10:30.690 thing to realize here is this code
00:10:33.149 that's used to pull out the attachment
00:10:35.819 name is specific to paperclip because
00:10:38.069 that's how paperclip names the files on
00:10:41.449 your user table right so that's what
00:10:44.339 we're looking at right there avatar
00:10:45.630 underscore file underscore name so if we
00:10:49.139 go this is the same slide the same piece
00:10:50.490 of code so if you look at this that is
00:10:52.949 specific that's just pulling out your
00:10:55.139 avatar string and that is specific to
00:10:56.579 paperclip so as you are going through
00:10:57.980 depending on what gem you are migrating
00:11:00.690 from you have to be aware of this and
00:11:03.180 all this is doing is pulling out the
00:11:05.069 string avatar so once we get the
00:11:08.970 attachment names the next step is to
00:11:14.370 loop through the models and their
00:11:16.709 associated attachments
00:11:24.550 and as a side note what I wanted to
00:11:27.020 share if you only have one or two models
00:11:29.120 with attachments or one model with one
00:11:31.190 attachment you don't have to do all of
00:11:33.050 this you can just call out the model and
00:11:35.030 the attachment name instead of looping
00:11:37.310 through every single model looking for
00:11:39.170 attachments so the thing I wanted to
00:11:44.390 show you here is this the reason I want
00:11:49.400 to show you this is this this instance
00:11:51.230 which is just the instance of your user
00:11:53.930 so this is instance our attachments
00:11:56.300 avatar in our example so user avatar
00:12:00.110 path blank that statement is dependent
00:12:05.210 on the relationship paperclip creates
00:12:07.580 between user and avatar that is
00:12:10.460 important it's important because this is
00:12:14.660 going to take two deploys so why does
00:12:18.380 this process require two deploys well
00:12:21.260 the rape task we're building right now
00:12:23.440 needs that user avatar relationship
00:12:26.420 defined by paperclip I just showed you
00:12:28.070 in circle and it needs the active
00:12:30.950 storage tables because it's it needs a
00:12:32.720 place to put to move the data to to put
00:12:34.790 the data so it needs to put the data in
00:12:37.160 the active storage tables now active
00:12:40.970 storage needs data in the active storage
00:12:44.870 in the active storage tables so you
00:12:47.120 can't run active storage without first
00:12:50.060 running the rake task and the rake task
00:12:52.940 is dependent on paperclip and we will
00:12:54.890 revisit this all right so let's go back
00:12:57.350 to our rake task
00:13:00.160 this right here is okay so this is just
00:13:04.370 calling our blob insert statement and
00:13:06.070 the thing I wanted to point out here are
00:13:09.640 the key and checksum methods and the
00:13:13.910 other ones are just you know use their
00:13:15.320 avatar file name content type file size
00:13:17.810 but I want to call out the key and
00:13:20.020 checksum for a few reasons you're gonna
00:13:24.650 have to write these methods yourself I
00:13:26.750 didn't actually include my solution
00:13:28.880 because your solution is going to be so
00:13:30.560 specific to your paperclip configuration
00:13:33.620 and your Amazon s3 configuration
00:13:36.339 and okay so the key the key is we're
00:13:41.209 active storage is gonna look for your
00:13:43.580 files as a funny or frustrating aside
00:13:47.990 depending on how you want to look at it
00:13:49.490 I was using paper clips so I assumed the
00:13:51.649 key would be user avatar path so that's
00:13:55.130 what I put in my rig tasks well maybe it
00:13:57.890 was the way I had my as three buckets
00:13:59.209 set up or my paperclip config that
00:14:01.520 actually returned a forward slash right
00:14:05.149 there and because of that forward slash
00:14:09.110 when active storage went to look for my
00:14:11.240 files could not find my files so
00:14:13.339 everyone knows that keys are hard so
00:14:15.380 that's that's a potential pitfall as
00:14:17.180 you're going through this process and
00:14:18.260 then checksum so we when I did this on
00:14:22.190 production we had about 80,000 images so
00:14:25.010 it wasn't too many so I actually opened
00:14:27.470 each image and ran it through the md5
00:14:29.360 process I think some of the gems
00:14:31.220 actually provide the checksum for you so
00:14:34.370 that'll just be depending on what you're
00:14:35.839 migrating from alright so that is then
00:14:42.620 all of those records we need to write
00:14:44.750 the very last step is just writing to
00:14:47.029 your attachments table and that's your
00:14:50.329 attachment which we discussed is the
00:14:52.070 string avatar model name which is our
00:14:54.860 user instance ID okay excellent
00:14:59.720 so that is the whole rake task so the
00:15:05.120 next thing to do after you have run your
00:15:07.550 rake task is figure out if it worked so
00:15:11.720 the quickest way to figure out if it
00:15:14.029 worked is to actually see if you've
00:15:16.040 created the correct number of blob
00:15:18.020 records and attachment records if you're
00:15:20.540 feeling feisty you can go into your
00:15:21.950 database take one record from your user
00:15:25.160 table and see if it has transposed
00:15:26.839 correctly to your attachments tables and
00:15:29.120 your blobs tables but if you're not
00:15:32.209 that's fine we'll figure it out when we
00:15:35.570 get there alright I feel like I kind of
00:15:39.860 sped read through a lot of code there so
00:15:43.640 let's do a brief overview of what we
00:15:47.209 have done
00:15:49.400 so we created the active storage tables
00:15:51.080 by installing active storage and running
00:15:53.300 the migrations we configure the active
00:15:56.510 storage cloud storage so that was
00:15:58.550 storage PMO and that was configuring on
00:16:01.070 a per environment basis so it's kind of
00:16:04.400 long we wrote the whole rake task to
00:16:06.290 create the user avatar records in the
00:16:08.600 attachments and blob State up tables and
00:16:10.760 we source that data from the user table
00:16:13.700 or whatever table hat currently has the
00:16:15.830 file attached to it
00:16:18.280 and we have hopefully confirmed that
00:16:23.360 records were created in the active
00:16:25.700 storage table so we don't actually know
00:16:31.520 if the records are right unless you took
00:16:33.260 the time to actually poke peek into your
00:16:34.940 database and look we don't know if
00:16:37.400 they're right but we know they exist
00:16:38.720 so that's good enough to move on to the
00:16:40.700 next step okay before you move on to the
00:16:46.220 next step I would highly recommend
00:16:48.160 checking out a new branch technically
00:16:51.800 you do not have to do this you can push
00:16:54.140 one branch up run your rake task and
00:16:56.930 then push the second branch up with
00:16:58.160 active storage but for testing I think
00:17:00.440 it's a lot easier to do a new branch
00:17:03.430 this was my preferred method as I
00:17:06.770 mentioned I got the key wrong the first
00:17:08.089 time so I had one branch with paperclip
00:17:10.310 in the rake task and another branch with
00:17:12.320 active storage run the rake task use
00:17:15.770 active storage if it didn't doesn't work
00:17:18.080 you can blow out the active storage
00:17:20.150 records fix the rake task rewrite to the
00:17:23.209 tables and try again as I said here's
00:17:29.930 our deploy run the rake test then go to
00:17:33.890 your active storage models and views so
00:17:38.170 now I will show you that alright so now
00:17:44.270 we have installed active storage we have
00:17:47.810 data in our tables our active storage
00:17:49.880 tables so now we can actually preferably
00:17:53.870 on a new branch in my opinion now we can
00:17:57.050 actually change our code and our models
00:17:59.870 views control
00:18:00.690 in test two use the active or active
00:18:04.289 storage functionality so the thing I
00:18:08.580 really want to show you this is you know
00:18:10.590 this is why it looks so easy in the
00:18:11.940 docks right because you just do has one
00:18:13.440 attached but it only works you know if
00:18:16.649 there's data so the reason I wanted to
00:18:19.950 show you this is I wanted to show you
00:18:21.960 the bottom here I wanted to show you
00:18:24.019 views if you look at the views you can
00:18:28.169 see if you're gonna be using multiple
00:18:29.970 sizes of images use something called
00:18:32.940 variance and the cool thing about
00:18:36.479 variance is you can just pick your image
00:18:38.849 size kind of on the fly
00:18:40.649 you aren't hamstrung into specific sizes
00:18:43.499 that you've predefined so let's talk a
00:18:47.190 little bit more about variance because
00:18:48.840 if you're working with images as I was
00:18:50.789 they're very important so paper clip
00:18:55.229 paper clip I think pre-process is all
00:18:57.419 your image sizes so they're going to
00:18:58.619 give you your whatever they are large
00:19:00.479 thumb medium whatever sizes you're
00:19:02.820 working so active storage is gonna do a
00:19:05.700 lazy transform on the original blob on
00:19:09.359 the fly hence the airplane and rails
00:19:15.119 does cache the variant so that the
00:19:16.889 processing is only gonna happen the
00:19:18.179 first time it's generated so here I was
00:19:23.099 in this process of working for this
00:19:26.099 client migrating this application and I
00:19:29.220 had a rake task I knew it was working I
00:19:31.679 had looked at my database active storage
00:19:33.720 could find my files and I ran it and
00:19:37.669 probably 30 percent we're a very
00:19:40.349 image-heavy website it's important to
00:19:42.450 note probably thirty percent of the
00:19:44.759 images were blurry that's how I felt
00:19:49.859 right then so why were 30 percent of our
00:19:55.049 images blurry
00:19:56.099 they were blurry because active storage
00:19:59.700 uses mini magic for image transformation
00:20:02.369 mini magic does not support the advanced
00:20:05.849 image processing that we had been using
00:20:07.830 with shrine and that I think is pretty
00:20:10.379 important and that was a really big pain
00:20:13.259 point for us
00:20:17.210 but fortunately there will be a happy
00:20:20.610 ending so we did this I want to say I
00:20:24.990 did this it was eight months to a year
00:20:26.460 ago and I feel like we're a little early
00:20:30.510 to the active storage party mainly
00:20:32.790 because of this image processing snafu
00:20:36.900 we had to deal with fortunately for us
00:20:42.110 rail six should be solving something
00:20:45.480 this specific issue active storage on
00:20:47.760 rail 6 has deprecated mini magic and is
00:20:50.070 now using the image processing gem so
00:20:52.830 fortunately that image I believe it was
00:20:54.780 like the resize to fill resize to fit
00:20:57.740 that did not work with mini magic all
00:21:02.100 right so we have already done what's up
00:21:08.310 there deploy a paperclip run the rake
00:21:10.650 task and the act of storage tables and
00:21:13.340 the next step is to deploy with the
00:21:18.420 active storage models and views
00:21:21.410 implemented that I just showed you and
00:21:24.270 if that works then you have completed
00:21:28.200 well made good progress on your
00:21:30.420 migration to active storage so let's
00:21:34.110 revisit all of our steps alright so we
00:21:39.210 installed active storage configure the
00:21:42.420 cloud storage move the avatar data from
00:21:45.720 the user table to the active storage
00:21:47.250 tables and now active storage can work
00:21:52.320 its magic and it should just work