 Welcome to why we can't have nice things and why it's all our fault by me. So as Leah told you, I'm a co-founder of a company called Uphex. We do analytics for digital marketing agencies and I'd love to talk to you about that afterwards if you happen to represent an agency or be friends with powerful people who own agencies. You can read more about the company at uphex.com and that's my website and that's my Twitter handle on the bottom. I also want to give a shout out to the organizers, that's Leah, Josh, Jim, and the volunteers who are Kate, John, Jonathan, Matt, Sarah, Ryan, Rachel, and Emily and if I forgot your name it's because you're not on the website so it's not my fault. And thanks very much to the sponsors, you know, without you guys this event wouldn't happen and I want you guys to know that every time I took a bite of food at lunch I was thinking, wow, this tastes so much better because I didn't have to pay for it. So this talk is fundamentally about assumptions and the assumptions that we make as people and how that translates into what winds up in our software. So most of the time, for most of the assumptions we make, they serve us well. There's no problem. So I think few people would disagree that eating a balanced diet is good for your health or that you should be careful in your open flames. Now sometimes the assumptions don't turn out to be true. Like yesterday when I was walking to the party I thought it would be okay to cross in the middle of the street in front of a police officer but it was not. But more specifically with reference to software, sometimes things that seem like they should be true and are perfectly reasonable statements don't turn out to be true in every possible case we could imagine. So here's an example from Python where if you do reference equality comparisons, two numbers that are small are actually cashed by certain implementations of Python so that all the numbers for example between negative five and 256 and CPython are given a singleton identity but every time you make a new number that's outside of that range you're getting a new object. So reference equality doesn't work outside those ranges. That's a very surprising result if you didn't know that's what was going on under the covers. And for Ruby you know we have kind of similar things that might crop up. So here's one that's not necessarily specific to Ruby but we'll use Ruby code to demonstrate it. You might think that for all x, x is equal to itself but that's not true for float nans or not a numbers. In fact that's how they're defined is they're not equal to themselves. They're the only thing that's not equal to themselves. Sometimes you might make assumptions about the way our dates and times work so you might think that this is a perfectly reasonable statement that the day after October 4th is always October 5th. Could it ever be true? Could it ever be an incorrect statement? So we'll explore that later. And sometimes we make assumptions about social or cultural situations that wind up in our software. Here's a database schema that I actually once had to update in a Massachusetts county court system. So in 2003 the Massachusetts Supreme Court decided a case called Goodridge versus Department of Public Health and that was the case of legalized same-sex marriage for Massachusetts at least. So lots of assumptions that people made at least the original implementers of the system turned out to be wrong not because of anything that was wrong about the relationships between the objects at the time but because of changes in how we assumed things would work. So fundamentally this talk is about people and the assumptions that people make and what happens when they go wrong and how we can do a better job. So let's start with time. So time is hard. Why is it hard? Well because people made it that way. So before we can talk about that let's talk about what time is. So let's forget about your software constructions of what time is. Let's talk about what time is generally speaking. And to do that we have to go back to what some philosophers thought about time. So this is an open question I would say in philosophy and it's one that's being debated all the time for centuries. Isaac Newton thought that time was sort of like a container that existed independently of whether or not we as people existed. So it was sort of a universal property of some kind of essence of the fabric of the world and the universe that we inhabit. Whereas Immanuel Kant thought sort of the opposite. Time was basically entirely a human construct, a way that we perceive the world around us. So for purposes of this talk we'll treat time like it's an ordering of events. So just like space is what separates me from someone in the audience, time is what separates two events from happening at the same moment. So we're doing pretty good so far, right? We're only a few slides into this and we've already resolved a few major philosophical questions so I think we can give ourselves a pat on the back. So everyone can just breach conference etiquette for a moment here and take a look at your watch or your smartphone or and just note what time it is and just yell out what time you think it is right now. Okay, so most people yell day, hour and a number and I heard someone over here yell business time which I thought was funny. So most people said an hour and a minute. And we can get more specific in that of course. We can be more granular about the moment of time that we're referring to. So we could specify the minutes and then the seconds on top of that and the milliseconds and the microseconds and so on. And we can be as granular as we'd like to be about that moment in time. So when you think about a moment in time what we're really talking about though is not a specific, not a dot so much as a interval on which a statement is true. So when we say 230, what we mean is that there is a range of time, a range of possible values on the time continuum for which the statement it's 230 is true. So that's not a point rather but an interval of possible values. And if we get more specific with what we mean, all we're really doing is narrowing that interval so we're not talking about different times necessarily. We're just focusing in on one specific smaller subset of that larger interval. Now there's a problem with this of course, which is that moments are ambiguous. If I told you that it's 230, there are many possible values of time for which that's true, it's not one specific continuous interval. So here are two different moments that for which it would be correct to say that it's 230. And of course this pattern repeats on a daily basis. There are many, many moments, not just two, where it's also correct to say that it's 230. And if we get more specific, it doesn't really help the situation. So we just add on seconds and microseconds and milliseconds. That doesn't get us anywhere. Now the reason that I understand you when you yell at me that it's 230 is because I understand that we're talking about a window of possible values that includes just this afternoon, probably sometime between two and four. So I don't have to know the exact time, but I know the possible range of values you might be talking about. There's only one interval that falls in that context, so I can resolve the ambiguity. But we need specificity if we want to tell a computer that additional context, right? We can't just say, it's 230 and I'm talking about sometime this afternoon, a computer's not going to know that. So we have to be more specific and tell basically a computer everything that would be necessary to resolve that ambiguity. So we can't just say it's 230, if I say it's 230 PM on a Monday, on this date, in this year, and so on. So now that we have this idea of what times are, let's talk about how we can compare times, right? Because it's one thing to just tell someone what time it is, but usually you want to know a fact about time, like whether or not it's time for an appointment or if it's your birthday yet or something like that. So what kind of problems might we encounter when comparing times? The first problem is that we might not be using the same calendar. So you can imagine that if someone in San Francisco, California, is probably using the Gregorian calendar, January, February, March, etc., that we've all come to know and love. But someone in Beijing might not be using that. They might be using the Han calendar or a different civil calendar. Okay, so we can solve that problem pretty easily. Let's just get everyone on the same calendar, check. But if we do that, we're gonna have to remember when we switched calendars so that all dates still make sense. So for example, before we were on the Gregorian calendar, we were on the Julian calendar. And when we switched calendars, we had to jump 11 days in the future. And when that calendar was introduced, four countries did it right away. Basically, all the Catholic countries moved over right away. But a lot of other countries moved over different times. The colonies of those Catholic countries didn't all move over at the same time, and it was important that you switched on that date. Because if you didn't, you needed to add a different number of days, depending on when you actually did switch, or otherwise your alignment would be wrong. Next problem, your local time isn't the same as my local time, depending on where I am in the world. So San Francisco, California, if it's 2.30 here, it's gonna be dark in San Francisco in the Philippines. That's an actual city name in the Philippines. You'd be surprised how many cities are named San Francisco. So, we can solve that problem. What if we all adopt a local time that's offset with reference to some global time? And that's what the UTC is. So we can all establish a global time, and we'll just tell each other what our offsets are relative to that time. So then we can all have the same daylight hours over, so the same hour on the clock corresponds to the same amount of daylight that we'll each get. And that's what Charles Dowd, a US seminary teacher, who proposed time zones to a bunch of railway operators did. So they liked the idea of time zones so much that it was legally adopted in the US in 1883. So when you got off a train or a platform, you would set your watch in a different time zone by whatever the clock was on the platform. So, all right, great. So now we've got these time offsets that will minimize the amount of difference between our solar days. But different problem. The time offset that we have at one point in the year isn't always the same as the time offset in a different part of the year. So for example, in the US, we have daylight savings time. So in the summer, we're at UTC minus eight, but in the winter, we're at UTC minus nine in Pacific time. And in San Francisco and the Philippines, they have UTC plus eight year round. There is no daylight savings time. Okay, fine, we can fix that. We'll just tell each other what the offsets are at different times in the year. And that will solve that. But another problem, the offsets weren't the same all the time for all of historical values of time. So if I want to go back and tell you what time it was or what time an event occurred in 1970, I would need to go look up what the time offsets were in 1970 in that location. And even in the US, maybe you think that daylight savings time is pretty easy, but there have been a lot of changes to daylight savings time when it started, when it ended and so on. So San Francisco and the Philippines didn't observe the year round stuff. They had daylight savings time briefly to avoid an oil shortage. They wanted to conserve energy, so they adopted daylight savings time for about 12 years, and then they stopped doing it. Another problem, some of the local times you want to talk about don't actually exist at all in my time zone. So here's 2.30 AM on the spring forward day of this year. Now, if this time doesn't exist, there's no UTC time that maps to 2.30 AM. So if you try to resolve it, because you go forward an hour, right? As soon as the clock ticks 2 AM on that Sunday, you jump ahead to 3 AM. So there are no moments in time for which that's a valid UTC time. And sometimes the conversions are ambiguous. So if we look at 1.30 AM on the Sunday in the fall, when we go back an hour, there are two possible moments, right? Because you arrive at 1.01 AM the first time through. And then it's 1.59 AM, and then it's 2 AM, and you roll your clock back, and you have a second 1.01 AM. So there are two possible UTC values for that same local time. There's more than one local time. You have to be able to resolve that ambiguity somehow. Okay, so maybe we can fix this by telling each other all of the possible offsets. When they happen, when they change historically, and then we can store that somewhere and share it with each other, then we won't have this problem anymore. So that's what TZData is. This is a time zone database that's managed by IANA or the Internet Assigned Numbers Authority, and it's their job to keep track of this database. So we create one time zone for every distinct list of historical offsets we have to remember. So here's the Pacific Time zone set of rules. So you can see here's all the times that Pacific Time changed. That first line under the zone header near the bottom, that's when Pacific Time was actually established. So you can see that it started in 1883 with Charles Dowd. And sometimes time zones are way more complicated than you would think they are. Because remember, you need a new time zone every time you establish a new set of rules. So Indiana has lots of different rules within its cities. Anybody from Indiana, just out of curiosity. So you guys probably know that it's difficult when you go between two cities in Indiana, you might change time zones two or three times just driving through different areas. And so each one of those is a separate time zone, just like Pacific Time or Central Time or whatever. And that's distinct from all the others because that specific city or locale has decided that they want to start daylight saving time earlier or later or not observe it at all or whatever. So times in Ruby really center around two big classes, time and date. There's also date time, which is sort of funny. If you go look in the documentation, the header for the first sentence of it, the first and only sentence of the documentation in the Ruby core docs for date time is the single word date time. So not very helpful. But time and date are the two ones that you probably should be most concerned about. If we're not assuming something like an active support, then these are the two plain old Ruby objects. So you use time to make time offset aware objects. So when you make a new time, you're encoding the offset and it's by default your local time offset into that time value. Dates, however, don't know about a time offset and they don't know about what time zone you are. And neither dates nor times know about what time zone you're in. So that's what Ruby, the tzinfo class does for you. This uses a tzdata library. And what this will do is get you a time zone object, which you can then use to make conversions between different time or date instances. Here's a problem though. Ruby, unfortunately, does not have a native time zone aware concept of duration or period. So that means we can't do some things easily. So for example, here's that daylight savings time changeover on November 1st, November 2nd, and November 3rd. If we try to measure what the difference between those two dates is, we get a different answer for the distance between November 1st and November 2nd than we do between November 2nd and November 3rd. And that's because you can probably see the offset changes from negative 0.4 hours to negative 0.5 hours. So it was an extra hour in the 1 to November 1st and November 2nd change. So that distance changes. So there is active support has a advanced method that it puts on to time. And this will let you do the period stuff sort of. So this can resolve the ambiguous time stuff we saw before. But the problem is that it may not always work the way you expect. So for example, if I want to advance the date January 30th by two months, I might expect that the correct value would be March 30th. But if I do that two times in a row with one month each, I get a different answer because the first month advances to February 28th. And the second month advances a month from February 28th to March 28th, which is a different value than March 30th. So it's not an associative operation. So in conclusion, with times, they're really, really hard to get right. We want to make sure that we always store and work in UTC. You should probably let your library handle anything. Don't invent anything from scratch. And be aware of the limitations of whatever particular time library you like to use. And just a special plea, if anybody's seen Joda time in Java, that's really awesome and you should try to port that to Ruby because that would make a lot of these time problems go away. So next problem, floating points. Why are these hard? I think if you look even cursorily on Stack Overflow or just Google on the internet for problems that people are having with floating points, they'll insist that something is wrong with their computer or that something is broken about the language of a library they're using. Here's one guy complaining that JavaScript is... Well, JavaScript may be broken for other reasons, but it's doing the right thing at least for floating points here. So is it actually broken? And the answer, I think, is no. It's just not a very obvious mental model. So we've made a lot of assumptions about how we thought people would use floats with these standards that were developed to use floating points. And we've wound up with answers that aren't great. So we can see that 1 plus 2 equals 3. But 0.1 plus 0.2, we get a result that looks like it's 0.3. But when we look a little bit deeper, we notice that we don't wind up with exactly 0.3. There's some extra digits at the end. So it's not exactly representing this value that we thought it should be exactly representing. What's even weirder is that for some values, we do get an exact representation. So if I add 0.25 plus 0.25, it is truly representing 0.5 exactly. So why does that happen? How can it be that these will be so inconsistent or apparently inconsistent? And I'm not going to have time to do it too much today. But if you want to go play around with the internals of floating points, I just put up a quick little library last night that lets you open up the internals of Ruby and look at the bit strings that are corresponding to floating point numbers that you might generate. So the first thing I understand here is something called the pigeonhole principle. Has anyone heard of the pigeonhole principle? OK, great. So if you're a computer scientist, maybe you've heard this term. And a pigeonhole principle is pretty simple. All it says is that if you have n objects and you have m places to put them, then if you have more objects than places to put them, you're going to have at least one place to put them with more than one object. So here's an illustration of this. If I have three pigeons and six slots to put the pigeons in, I can fit a pigeon into each slot without a problem. But if I have more pigeons than slots, some of the pigeons are going to have to share a slot, right? So these two guys are going to have to share a slot. So if I imagine the integers in a similar way, we can say, OK, the integers are basically like the pigeons. They're a set of different objects that we might want to store somewhere. And let's say that we allocate one byte worth of space to store integers. And that would give us eight bits of space to play around with to store those integers somewhere. And of course, since a bit is short for the word binary digit, we've got one space to put all that stuff. So we can put a zero or a one in each of those slots. And we have two choices there. So a zero or a one, and we have eight choices to make. So that's two to the eight possible values, or 256 possible values. And these values can map to anything, right? We could say that they are 0 to 255, or 127 to 127, or we could pick some arbitrary numerical range doesn't matter, they only have to represent numbers, right? We could assign the values to colors, or cat names, or musical notes, or UTF-8 characters. And if we give ourselves more space, we can add more possible values. So we have a 32-bit place to store things. We can store about 4 billion possible values. So if we assign the integers to each slot, we can do it in one natural way to do it is just to map each consecutive integer starting from 0 to a slot. But it can be arbitrary, like we said before. We could decide that the first slot corresponds to 183, and then just randomly fill in the other 255 values that we wanted to. But it's easier if we stick with sequential values, so we'll use that. So floating points, we saw what integers look like. But what if I told you that there are numbers other than integers, right? So here's an example of this. There are physicists often care about very large and very small numbers. And those large and small numbers can't exactly be represented by integers. So there's one notation that scientists like to use, it's called scientific notation. And this breaks up each of those values into three components. There's a sign that's either positive or negative. There's a fraction, which is a number between 0 and the base of the representations, or in this case, we're representing things in base 10. So this is a number somewhere between 0 and base 10. And then the exponent, which is the power that's on the power of 10 that you're raising it to. So you can say that the sign, or which we'll call S here, is the number that's either 0 or 1 representing positive or negative values. 1 means negative, 0 means positive. F, which will be a value with somewhere between 0 and the base that we're working in, minus 1. And then an exponent, which will be somewhere in between a range of possible exponents that we can store. So all floating point values can be represented by a tuple of S, E, and F, those three values, determine the choices you're making in terms of how you want to store those bits. So IEEE 754 is a standard that describes how that's going to work, and the one in particular that we care about is called binary 64. This is the one that Ruby uses. So every floating point value is allocated 64 bits, and they're split up in this way. The IEEE standard calls what we call the fraction, they call that the mantissa. And the reason it's not the same as the fraction is the mantissa is actually one point the value. There's an implicit one at the front. And remember, since this is a binary choice, since we're in base two, all the values will be somewhere between 0 and 1. So that's what the fraction is. I mean, the exponent, for reasons we won't get into, the value that's actually stored is 1, 0, 2, 3, minus the exponent you actually care about. So if it's, the exponent is say 2, you will store the value 1,000, 200, or 1,021. So that brings up another problem, which is we have a fixed amount of space to store this stuff, but how many values are there in between, let's say, negative 2.7 and 0.3? Well, there are an infinite number of possible numbers that we could store there. And what happens if we try to store an infinite number of pigeons in a finite number of slots? It's gonna get really crowded. So some of the values that we'd like to represent cannot be exactly represented. That's the crux of the floating point problem. We would like to represent all possible values with floating points, but what winds up happening is only a subset of those values can be represented exactly. And in fact, since there's an infinite number of possible values we could represent, but only a finite number of values that we can represent, that means that virtually all values cannot be represented exactly. There are, it's the exception to the rule that a value can be represented exactly with floating points, right? You have an infinite number of pigeons, only a very small number of them will be able to fit into these slots without crowding. So that's the crux of the floating point problem. We don't have time to do a demo, but I encourage you to go check out the library there, and I'll be happy to answer questions about that after the fact. So the problem here is that we have this infinite number, infinite set that we'd like to represent and a finite number of values, and that's the crux of the floating point issue. So if you are tempted to write your own floating point library or something that translates stuff into strings and then back into decimals, please don't do that, please avoid doing that. Use big decimal if you'd like to represent values exactly. Vuby also has a rational type that's very nice, so if you're representing fractions, you can use eight slash three, and then an R, suffix will give you a rational value, and never use floats for precise calculations. So if you care about the exact value, you shouldn't be using a floating point. So finally, names. Why are names art? Because humans made them that way. Everyone should be reading this article after the talk. It's by a guy named Patrick McKenzie, great article called false hits programmers who believe about names, and I'll put the slides up afterwards. So the fundamental problem is that names are a possibly empty set of strings, that is someone may not have a name, that map to something, a person, a place, a location, and there are, so there may be many names for the same person, place, or thing, but we don't model things that way. We just assume that names fit into this very strict rule about, or very strict structure about how we think they should work. So for example, a lot of systems assume that someone in the West has a first, middle, first, and last name, and then an optional middle name, and then maybe a suffix and a title. That's true for very few people in the world. In many systems, treat names differently depending on which part of the name they're talking about, they may require that your first name be no longer than 20 characters, or your last name be no longer than 35 characters, and the total string be no longer than 40 characters, or things like that. So this always leads to inevitable disaster, and that's formulated most precisely in something called the Skunthorpe problem. So Skunthorpe, the Skunthorpe problem is named after a town in England, and this town contains an offensive word in the characters two through five, but that town also contains about 100,000 people who live there, all of whom at some point or another get caught by overzealous filters. This woman named Linda Callahan couldn't sign up for a Yahoo email account because Callahan contains the string Allah, which at the time Yahoo was banning, this is after September 11th, and Yahoo is banning all accounts that contain that name. Why did they do that? Because they thought it was a good idea, even though it was clearly a terrible idea. You couldn't register this domain name for the first 10 years of the internet's life because ICANN, the canonical registrar at the time, forbade people from registering any of the seven dirty words that George Carlin came up with, and this contains an offensive word, offensive to some people in the first four characters. So you couldn't register it. So internet, I'm sorry, not ICANN, but internet prohibited domains that had profane terms in them. So you can probably guess based on the previous ones, what's wrong with this name. So this guy couldn't register. Colburn is how you pronounce that, at hotmail.com. And see if you can guess what happened here. His email type, he's not just his, once he was able to register it, he also had trouble sending email because it kept getting caught by spam filters. And his title was software specialist. Can you guess why? That's a problem. Well, it's because it contains the string Cialis, which is highly associated with spam. So Google Plus banned people, real people for having names that looked fake, like me, my name's John Feminella. I think that's a pretty weird name. Maybe it sounds fake to some people, I don't know. But there are people with way weirder names than me on Google Plus. So I feel like Dr.... If anyone here is from Google, I would like you to answer for Dr. Loki Sky Lizard. Or how about this University of Alabama football player, haha, Clinton Dix, real football player, actually has stats, has played in games. And real names are awesome. We should be proud of these names. These are really sweet names. And I wish I had names that were as cool as some of these people. But the problem is that we make a lot of bad assumptions about how names work. And some assumptions that we make that aren't good are that names don't change. So imagine if someone gets married and changes their last name, terrible assumption to make. Another assumption we might make is that legal names don't change without going through a court. But think about the case where someone enters witness protection. Or that people have a single canonical name. How many people have nicknames? Right? Or that people have a single canonical name but for legal or financial purposes. So forget about the nickname thing. But what about if you have credit reports that aren't in sync with each other? What if you got married and changed your last name? Now you'll have two different reports that match the same thing. Another problem is that some people think names are unique. Not true. 820,000 people have this Chinese name. 46,000 people are named John Smith. 120 people are named John Feminella. Another problem is that some people assume names will have capital letters in them. Not true. Bell hooks of prominent feminists. E Cummings, a prominent poet. Or that names will contain numbers. Surely nobody's name would contain a numerical value. This is an actual New Zealand child's name. Number 16, bus shelter. Or that names won't contain alphanumeric characters, right? How about Jay-Z? There's a hyphen in both his stage name and his actual legal name. Okay, fine. But names are always unicode characters, surely. No one could have a name that wasn't a unicode character. How about Prince? And finally, what about the assumption that people have names? Probably everyone in this room has a name, but is that required? Are people actually required to have names? The answer is no. You're not legally required to have a name. That might make your life really hard, but there's no actual requisite value that someone have a name. So the conclusion I think that we should learn from this is that you shouldn't filter inputs with the name of anything. That's a real-world person, place, or thing. And you should probably just have a single unrestricted field for whatever the name value is. So thanks very much for having me and I appreciate all of your awesome comments on Twitter from earlier today. If you have any questions, I'd love to take them afterwards. Thanks very much.