List

A Survey of Surprisingly Difficult Things

A Survey of Surprisingly Difficult Things

by Alex Boster

In his presentation at RailsConf 2017, Alex Boster explores the intricate complexities surrounding seemingly simple real-world concepts that developers frequently encounter. He emphasizes that commonplace tasks like handling time, currency, and human names can reveal unexpected difficulties if not managed carefully. Boster outlines the following key points throughout his talk:

  • Time Management: Boster explains that while timestamps may seem straightforward, time zones complicate matters significantly. He highlights issues such as differences in daylight savings time and the many time zones that exist globally. Boster advises developers to store time values in UTC and use Rails’ time zone-aware methods to avoid confusion.

  • Date Handling: The speaker contrasts dates with times, stressing the importance of only using date formats when the specific time of day is not relevant. Mismanagement in this area can lead to errors in applications.

  • Human Names: Boster discusses the fallacies developers often have regarding names, such as the assumption that names are fixed and unchanging, or that all names can be represented using ASCII characters. He advises against over-validating names, advocating instead for a more flexible approach to accommodate diverse naming conventions.

  • Physical Addresses: Address modeling is addressed next, where Boster notes the significant variation in address formats even within the United States. He cautions against relying strictly on conventional validation methods and suggests using services like those provided by the US Postal Service to standardize data where possible.

  • Financial Data: Currency management is another crucial topic. Boster recommends using integers for all monetary calculations instead of floats to avoid rounding issues and inaccuracies in financial representations, particularly when interfacing with JavaScript.

  • Email Validation: He also touches on the pitfalls of over-validating email addresses and the importance of accepting diverse formats that may not align with strict validation rules.

  • Internationalization: Though he does not delve deeply into this topic, Boster encourages developers to think about internationalization from the onset by storing hard-coded strings in locale files, which can ease future translations.

  • Recurrence in Events: Lastly, Boster notes the complexities of modeling recurring events in software applications and the potential for error when handling such features.

In conclusion, Boster's main takeaways revolve around the need for awareness of the global and cultural diversity inherent in software design. He urges developers not to make assumptions based on their own experiences, adopt inclusive coding practices, and remain vigilant in thoroughly understanding these deceptively complex elements.

RailsConf 2017: A Survey of Surprisingly Difficult Things by Alex Boster

Many seemingly simple "real-world" things end up being much more complicated than anticipated, especially if it's a developer's first time dealing with that particular thing. Classic examples include money and currency, time, addresses, human names, and so on. We will survey a number of these common areas and the state of best practices, or lack thereof, for handling them in Rails.

RailsConf 2017

00:00:12.679 alright let's get started I can't see
00:00:18.300 very well out there so shout if you feel the need to ask questions or interrupt
00:00:25.099 how's the conference for everybody so far good all right excellent
00:00:31.080 all right this thing's not going to work apparently so I'm a Alex Foster I work
00:00:38.040 for appFolio which is a company in Santa Barbara I work in their San Diego engineering
00:00:43.649 office we're hiring come talk to me so I the last would have said one
00:00:51.480 particular project inspired this talk but really it's also the culmination of you know the kind of experience you get
00:00:57.809 after a few years of doing web development so this is a survey of surprisingly difficult things now what
00:01:04.439 things do I mean I mean commonplace things it's you know all about in your day to day life so
00:01:11.670 they're easy to model and you model them and they're great and then it turns out
00:01:16.680 it's actually a lot harder than that so things were the obvious implementation
00:01:22.770 may very well cause problems and in fact you know these are things like time
00:01:31.259 stamps time zones physical addresses human names now when you hear these
00:01:38.310 these terms what do you think does this sound easy these are solve problems
00:01:44.460 right easy no problem no complications what am I not going to talk about I'm
00:01:51.570 not going to talk about cache invalidation or distributed systems or other also surprisingly difficult things
00:01:58.710 but there's about real-world stuff so
00:02:05.969 one of the things you know one of the reasons I wanted to give this talk was that developers
00:02:12.490 fall into these traps all the time we spent months cleaning up old buggy code
00:02:18.540 and then new bugs of the same time very introduced six months later by other developers so maybe you know with a
00:02:26.800 laundry list of things to be aware to watch out for maybe you won't fall into that trap and even very senior
00:02:33.880 developers you know it might help to just have the occasional reminder here if you're not a senior developer and you
00:02:39.190 haven't dealt with this stuff much before then hopefully this talk will
00:02:44.290 save you some time in the future another good thing is that if you follow these
00:02:49.750 best practices or similar things you then you're apt to be more inclusive so
00:03:00.970 I know that when I start dealing with
00:03:06.190 with with these real-world things I you know causes me to want to drink it might
00:03:14.920 just drive you to drink too so let's start with time so there are a bunch of
00:03:22.720 time and date classes available to you the only one I really want to draw
00:03:28.960 attention this is mostly for reference later but the one I want to drive to introdu the last line where there's just no good cross system standard for
00:03:37.780 direction it's different in every database it's different in Ruby than it is in the databases so pay a little
00:03:44.710 attention to that and check out active support duration but what makes time actually hard to deal with is not this
00:03:53.020 it's time zones so again isn't this a
00:03:58.900 solved problem you can just have a sufficiently large integer to represent seconds or fractions of a second and now
00:04:07.360 you have a time value and you're good right no problems let's all go home
00:04:13.840 well again the problem is in time zones so how many time zones do you think
00:04:19.450 there are anyway thirtysomething sayin 40 well you may be
00:04:32.080 right you know at a certain level certainly the time zone database which
00:04:38.650 I'll talk about in a minute defines three hundred and eighty five time zones and then has a further 176 internal
00:04:47.800 links which are aliases basically to give them different names so something
00:04:54.250 to remember there are half-hour time zones there are quarter hour time zones
00:05:01.770 you've got daylight savings time to take into account daylight savings time may start a police may start observing
00:05:08.050 daylight savings time that previously didn't like Arizona currently doesn't a
00:05:13.169 place may change their schedule like the entire United States did ten years ago
00:05:20.590 or so and shifted when daylight savings time started I don't know if this is
00:05:26.830 true currently but certainly in the past we've seen examples of - our daylight savings time changes this was called
00:05:33.159 double summer time in the UK at a place may change time zones entirely they just
00:05:40.599 switch so this times our database I
00:05:49.300 spoke of is used in many unix-like systems it's used all over the place with a small band of dedicated
00:05:55.689 developers it tracks all Geographic time zones since 1970 they define an area as
00:06:04.000 you know where any two places share the same same time zone at the same share
00:06:09.729 the same time at the same time so not
00:06:15.699 before seriously before 1970s they don't care as much but
00:06:20.860 they do have historical data and it and it does track daylight savings time
00:06:26.800 changes so an example this is updated several times a year if you think this stuff is static no it's updated several
00:06:35.380 times a year and here's an example of some release notes you can read through
00:06:43.930 that quickly Mongolia no longer observes daylight savings time this region moved from one to another year-round and the
00:06:52.300 clocks starting at this particular time I've got the new area which also affects
00:06:57.340 part of Antarctica it's this change fixed many entries for historical time
00:07:03.640 for Madrid before 1979 and it noted that
00:07:09.790 Ecuador switched actually observed daylight savings time to particular day so the exact details aren't important
00:07:17.350 but just know this stuff is really complicated and thank goodness somebody's keeping track of it and
00:07:23.800 that's an example of all the actual regions defined that's again just unique
00:07:29.620 regions since 1970 another little trivia you know how many time zones are in the
00:07:35.680 United States according to this map just the continental United States I count at least six so we use UTC as a way to
00:07:46.570 standardize things you know when things happen as the same instant in time regardless of what you actually call
00:07:53.830 that time in a particular place hopefully we know that that stands for
00:07:59.500 neither coordinated Universal time nor Thomson Reuters El Corte a name by
00:08:07.930 diplomats no doubt UTC is not a time zone but every time zone has an offset
00:08:13.720 from UTC and as a rule you should store your time values in UTC so also before I
00:08:23.560 proceed how many what are the possible offsets
00:08:29.360 from UTC UTC is kind of in the middle and you can go forward from it and you
00:08:34.969 go back how far do you did they go anyone want to take a guess that is
00:08:42.800 exactly what I thought so in 1995 Kiera
00:08:52.430 bas got tired of having their country in two different days so they moved the
00:09:00.050 time zone of some of their outlying islands from minus 10 to plus 14 so
00:09:07.760 there are plus 14 time zones on that side so keep in mind without a time zone
00:09:14.690 any time value you have without context could take place within a twenty six
00:09:20.000 hour range possible half-hour or quarter hour increments so how do you handle
00:09:27.470 this well if you don't explicitly provide a time zone a time you provide
00:09:34.610 could be interpreted using the operating systems default using your databases
00:09:41.120 default or using your applications default time zone that should be
00:09:47.810 configured in your rails app okay now I need a slightly bigger drink so as we
00:10:01.760 said keep your system and database time in UTC rails will store its date times
00:10:08.510 in UTC and timezone aware methods in rails will use the applications default
00:10:15.079 if you don't overwrite it with my express expressly providing one so for
00:10:23.149 example if you have users be sure to store a time zone on the user's model
00:10:28.699 and always use that in their in your views if you me if you care at all about when things occur for said users
00:10:39.080 and just as an example here you can see that you want to use time zone now I'll
00:10:45.860 talk about this a little more in a minute and activesupport provides some really sophisticated stuff around it so
00:10:53.450 don't just use bare Ruby time use the rails classes for all this stuff
00:10:59.829 so these time zone aware methods you can
00:11:05.209 see we're parsing to different times in different time zones but they're actually the same time so it all works
00:11:10.670 that's cool and here are some examples of methods you should use hours from now
00:11:20.149 days ago those are all good always do times that zone dot parts don't do time Tod parse if you use string parse time
00:11:30.230 in the middle here always use in time zone at the end or you will be screwed
00:11:38.050 prefer definitely time current to other
00:11:45.740 methods for getting the current time and the UTC iso 8601 is for if you're
00:11:52.820 providing an api but excusing if you're providing something to an api these
00:11:59.329 examples are all from a stolen shamelessly from a blog post that i note there on the bottom so dates are simpler
00:12:09.950 right they don't have a time zone how do
00:12:15.230 you know if you should be storing something in a date or a time ask
00:12:21.410 yourself does it matter what time of day this seems really basic but people make this mistake all the time and just
00:12:27.800 convert a date to a time really newly so
00:12:34.760 don't do this so what are some examples of dates well birthdays a birthday occurs on a day we don't generally observe when the
00:12:40.640 actual minute of the day person was born all day calendar events maybe holidays
00:12:48.170 you might think of so for example No you know to take a Western example
00:12:56.610 Chris Christmas is on the 25th regardless of whether or not you're in Beijing or Toronto or where so as I said
00:13:05.910 don't store dates and day times you will have problems the very leery of converting back and forth you almost
00:13:14.040 never want to do that for example if you
00:13:19.110 the one case I can think of offhand is if you have a calendar that you've written and you want to convert a
00:13:24.600 somebody's editing an event and it goes from being say an all-day event to a timed event
00:13:29.730 then maybe so let's see where are we so
00:13:40.190 this is fine where did that come from I
00:13:49.950 don't know so you want to use date
00:13:57.690 current because I ran these two lines in the middle of the day seconds from each
00:14:03.450 other and that's what I got what happened
00:14:13.110 why did it behave that way anyone sorry
00:14:22.550 yeah basically current takes into account your time zone the other one is basically telling me that in London it's
00:14:28.970 the 24th but I see date done today all
00:14:34.890 over the place in code so is there
00:14:40.590 really the only to date methods you should be using and you absolutely have to avoid not now time to parse time dot
00:14:48.660 stream parse time without the in time zone things at the end or date dot today
00:14:54.930 something that can help you with this is be sure to use Robo who uses we look up okay
00:15:00.620 how many use put that into your actual process formally a few good cool I give
00:15:07.440 a lightning talk a couple years ago on this and depressingly few people did that so there are services like hound
00:15:13.400 Farsi there's probably others that I'm coda seeing that I'm not thinking of or not familiar with but you can make it
00:15:19.680 this a blocker so that you can't merge a PR unless it passes your style check and amongst the many benefits of that is
00:15:27.680 Rubick a lot to catch some of these errors for you any more comments
00:15:35.700 questions about dates and times let's
00:15:41.700 move on to human names so many of you may have read there were a couple of
00:15:47.640 blog posts that were related one was called falsehoods programmers believe about names so I believe is
00:15:54.090 actually inspired by falsehoods printers believe about time and to take a few
00:15:59.760 examples from this we have things like none of these statements are true people
00:16:06.600 have exactly one canonical full name nope people have exactly one full name which
00:16:11.970 they go by no people have at this point in time exactly one canonical full name
00:16:19.970 people have at this point in time one full name which they go by no
00:16:25.529 people's names do not change that's not true people's names change but only in a
00:16:31.589 certain enumerated set of events no people's names are assigned at Birth not
00:16:37.829 true people's names are written in ASCII absolutely not true I'm guessing in this
00:16:45.089 room there's a bunch of people whose names are not actually written in ASCII although I'm guessing very few in just
00:16:51.240 emojis yet but that'll that'll come people's names are written in a single
00:16:57.600 character set that's not true people's names are all mapped in Unicode code points that's not true two
00:17:06.390 different systems containing data about the same person will use the same name for that person that's hopefully pretty
00:17:11.429 obviously not true either terrible so now I'm feeling a bit crabby so really
00:17:20.220 the only thing you can do here with names is validate as little as possible just don't bother why are you trying
00:17:26.640 right like you know yes their card holder name probably has to match when
00:17:32.909 you submit a credit card but that's their problem you know if you can avoid
00:17:40.110 first-name lastname consider doing so just use full name no idea what's going
00:17:47.970 on and you know maybe use given name
00:17:53.460 family name to be a little bit less English specific also you know store
00:18:01.799 things in Unicode remember you can't guarantee real names are used and don't
00:18:07.649 assume just because you might have a us-based and a us-centric business that
00:18:13.190 your users will be primarily english-speaking or even have as key
00:18:19.980 names that's you know these things are true in the US as well as overseas
00:18:26.640 okay physical addresses so how do you model physical addresses right yeah hmm
00:18:40.370 that's good well there are a lot more variations on this than you might expect
00:18:47.190 even in just the United with the United States Postal Service so even in the US
00:18:52.650 remember there are rural routes that look like this there are military addresses that look like that it doesn't
00:19:01.380 quite fit the city state paradigm remember US Postal Service services
00:19:07.530 Puerto Rico which has a very different address structure and you can actually
00:19:15.210 have a surprising number of lines in a valid address I saw one example that was supposedly
00:19:20.250 valid that had 12 lines long it was an international one but still and
00:19:27.679 basically don't do this so until I moved
00:19:33.240 recently my address had a slash in it these are actually pretty common in California to have one haves in numbers
00:19:40.410 it doesn't validate with Southwest and it doesn't validate with quite a few legacy systems that you'll see out there
00:19:46.230 I had have a bunch of play a bunch of banks and things and things to apartment one half so you can standardize these
00:19:58.020 addresses via the US Postal Service they'll melt convert them for you and give you some of the right abbreviations
00:20:05.059 but remember that special characters are still allowed even after that note for
00:20:10.980 example cities can have apostrophes in them addresses can have slashes or
00:20:17.760 dashes and so forth some things to know about US postal codes don't use just zip
00:20:26.190 codes in general try to use postal code which is the international version of
00:20:31.980 the word and don't just make it five characters long or ten characters long
00:20:38.190 if you wanted to do plus nine because again you're excluding the ability to store addresses from other countries
00:20:46.190 including Canada which is close enough you might actually want to be able to ship to it and also as a bit of trivia
00:20:54.419 remember that you can't even use zip codes to map to states there is a database you can buy or possibly
00:21:01.710 download for free that will attempt to give you city information but zip codes not only can cross city boundaries they
00:21:09.899 can cross state boundaries and here are the ones that currently cross state boundaries that's all because zip codes
00:21:17.549 map to postal routes not to geography it just so happens that most postal routes
00:21:23.820 are geographically constrained so let's
00:21:29.700 say you want to validate addresses again my first thinking instinct is to say why
00:21:35.039 are you doing this but ok great the US Postal Service has a database of them
00:21:40.760 however these are not always the same as physical addresses there are entire
00:21:47.760 towns and communities that have no physical addresses in the US Postal Service database for example I'm in San
00:21:54.389 Diego one of the very wealthy suburbs not suburbs it's actually separat City
00:22:01.019 one of the very wealthy cities in the center of San Diego County is Rancho Santa Fe postal service delivers
00:22:06.210 everything to their post office and that's it because they didn't want ugly postal trucks driving around going to
00:22:12.480 people's houses yet UPS and FedEx will
00:22:18.269 deliver to people travel there so if you're shipping something maybe you
00:22:24.330 should let them enter their home address even though it's not going to validate oh geez
00:22:36.020 any comments questions about fun stuff with addresses here
00:22:41.810 anyone fewer all right money yay we all
00:22:48.480 need to get paid right how do you model money in your rails apps database
00:22:56.670 schemas that's a given
00:23:03.590 no correct not as a float I've seen that though any other possibilities that's a
00:23:13.650 good one and decimal so I heard those two those are sort of the other two
00:23:20.990 approaches that I'm familiar with so you may use decimal values that's what a migration doing that would look like
00:23:29.390 there are some issues with that it's not that it's invalid here we have a lot of
00:23:37.680 Ruby code that will render your product for your API we'll send this out because
00:23:45.720 it's a decimal it'll look like that when rendered and now in JavaScript we do
00:23:51.510 this what's wrong with that I'm sorry
00:24:03.800 well maybe anything else but incorrect
00:24:10.970 ding-ding-ding he said floating-point current product up price in JavaScript is now naively a a float that Salvage is
00:24:18.170 the JSON you know works so you can well
00:24:26.770 sorry after that bug came up three different times so you can't get around
00:24:35.420 that if you want to always remember to use a decimal library on the JavaScript
00:24:42.050 side there's no problem with that but inevitably someone will forget to cast to to make a new decimal object and
00:24:49.520 the bug will be introduced also you'll get very strange rounding when for
00:24:56.300 example you're multiplying by say a tax rate or something like that that's maybe has three significant digits and then
00:25:04.250 suddenly you'll have I Triple E rounding issues in your money not good so I
00:25:10.040 prefer the just Dorsett's you use integers everywhere you won't
00:25:17.570 have rounding errors I recommend you keep you know the name in in sense as
00:25:24.140 part of it everywhere and then only convert at the last minute for display
00:25:29.780 purposes two dollars or whatever your currency is do be aware some currencies
00:25:36.440 have mils instead of cents so if that's important just remember you're
00:25:41.660 multiplying by a thousand instead and in this case usually it's obvious when
00:25:47.270 you've forgotten to convert because it's there on the display your totals would
00:25:52.790 be wrong easier to test email addresses
00:25:57.830 those are easy right
00:26:05.300 yeah so this is now a valid email
00:26:10.410 address just keep that in mind this is
00:26:17.640 now a top-level domain there are still places out there that try to validate
00:26:23.570 ComNet EU you can validate that you know
00:26:30.450 there's an @ there has to be an @ sign and an email address and there has to be a dot somewhere in the domain name
00:26:36.120 that's it otherwise you can do the whole verification email dance right which
00:26:44.520 everybody the people people notice but again people still try to over validate and it's kind of like again why are you
00:26:51.480 doing this I'll also just mention as a sidebar that for people who aren't aware
00:26:59.070 of this most email systems particularly Gmail will allow you to add a plus after
00:27:05.460 your username it will still get delivered to you but now you can have an infinite number of email addresses
00:27:11.100 without creating your users this is great for testing or for creating a thousand free trials or whatever you
00:27:17.370 needed to do so all right internationalization I'm not going to
00:27:23.100 talk much about internationalization because it's a huge topic there are literally entire conferences on it I do
00:27:32.580 have a suggestion which is that if particularly if you have a Greenfield application you've just done rails new
00:27:38.630 start putting your you know hard-coded strings in your config locales just from
00:27:47.460 the beginning even if you have no particular plans to go international or
00:27:54.510 support other languages a cool thing the cool result of doing that is that one changing copy or copy is easier you
00:28:01.920 don't have to search all over the place it's all in one place and furthermore you can even turn over the keys to like
00:28:09.420 a product person or a non dev and they can make copy changes themselves directly in the code instead
00:28:17.280 of heading off to you and then you just review their change also if you add a
00:28:23.940 locale to your user model and always use it then again you won't have to back
00:28:29.550 port or fill this stuff in later and you can just start if you're in the u.s. with us en or whatever makes sense for
00:28:36.720 you and then it's their payments and
00:28:43.620 credit cards another huge topic there's a lot to talk about read up on PCI compliance again there's an entire
00:28:51.570 industry around PCI compliance I think you know what most people know about it
00:28:58.200 and what you need to take home is never store it credit card information but don't even send it from your client
00:29:05.010 browser to yourself use a service that will let you send it directly from the
00:29:11.160 client browser and then give you a web hook back that way you're not even relying on the browser that you know
00:29:18.150 somebody's using on an airplane with a credit you know they're on a Chromebook
00:29:23.220 on an airplane just imagine that scenario and now you're relying on them getting the token and sending it
00:29:29.040 forwarding that to you know just use web hooks here and be sure to consider what
00:29:36.060 happens if the call to your provider stripe or whatever times out right that
00:29:42.630 can happen and depending on your architecture if you don't use web hooks you have to actually escalate timeouts
00:29:48.570 to a human being to go and look and see what really happened like did the call go through or did it fail recurring
00:29:58.170 calendar events that's easy right you just have like a day of the week and say
00:30:05.130 you know when it when it recurs or something like that and you're fine right now read this RFC and consider that most
00:30:14.760 recurring events have no end date so how are you going to model that there's an infinite number of them
00:30:21.720 the rules can get pretty complex this is a fairly simple one like every month on the second to the
00:30:27.210 last Thursday and individual instances of a recurring event can be edited or
00:30:33.630 moved or canceled separately from the actual recurring event yeah now I need
00:30:51.770 check out the garage in Seattle I'd like to just point out that this particular
00:30:57.390 one has as a garnish a second Bloody Mary for you to get through while you're
00:31:07.170 working your way down to the Bloody Mary and in case you think onion rings chicken wings a submarine sandwich or
00:31:16.110 12-inch pizza french fries and a lime are not enough here's the back view
00:31:22.070 there's a cheeseburger and onion lime lemon and two grilled cheese sandwiches
00:31:33.710 but you might need this if you start working on recurring events I highly recommend it so I talked a
00:31:43.860 little fast ran a little early in conclusion the main takeaways don't over validate people people just have this
00:31:51.240 need to to like check the values of these things when you can't and I don't
00:31:58.320 understand it even us only products will
00:32:04.920 have global like problems be culturally aware remember that your experience
00:32:10.740 isn't universal and probably isn't typical and just don't make assumptions
00:32:16.470 and there are some references if you get my slides later for the blog post I was
00:32:24.990 talking about you I can request for that class