 Hi. My name is Paul Gansel. Today I'm going to talk about Python DateUtil. It's a library for dealing with dates and times and that sort of thing. Everyone knows that it's super fun to deal with. It's not a dry and boring subject, so strap yourselves in. It's going to be a mile-a-minute ride. There's something that I got involved in, maybe like a year and a half ago, and I've been the primary maintainer. Surprisingly, a lot of these libraries like this have very little attention paid to them, so if you want to do some really fun work, you can stop by DateUtil or other libraries like that. I'm going to format this kind of an overview tutorial on how to use a library. The five basic modules are relative delta, which is for handling calendar offsets. It's just about meaningful ways to reduce some of the ambiguity around telling how long it is between two dates. Our rule is an implementation of the iCalendar spec for recurrence rules, so that's if you want to generate a series of dates based on some rules of increasing complexity. Then TZ module, we provide a whole bunch of different TZ info subclasses, so however you want to generate your time zone information, hopefully we'll be able to parse it. Then the parser, which is the sort of automagical function where you give it a string that looks like a date and it gives you a date. I'm not going to go too much into that, but I will try and point out some pitfalls. Then there's the Easter module. I'm not going to talk about that beyond this because I can explain what it does in about two seconds. It tells you when Easter is. We'll start with the relative delta. As opposed to time delta, which tells you how much time is elapsed between two events, a relative delta gives you some fuzzier definition of what happens between two, say, calendar dates. I'll give some examples, but one month from now, what does that mean? 30 days? 30.14 days? Who knows? There's two ways to construct these. One is just like a time delta, you can just give it the raw inputs, all the components of the relative delta. The other is as a difference between two dates. Just like you can take two date times and subtract them together, time delta, you can pass two date times to relative delta to generate relative delta to the relative distance. When constructing the relative delta, there's two broad categories of arguments. There's the absolute arguments. These are all the ones that have singular. They're singular, they're not ending in an s, basically, an s, basically. These are basically equivalent to calling the replace method on a date time. When you add or subtract your relative delta to slash from a date time, the first thing it will do is it will take each one of these components from the date time and replace it with its equivalent on the relative delta specified. For example, this is a relative delta, this is a relative delta that if you add it to or subtract it from a date time, it gives you the same date and time but in 2015. I just specified the year. This one is one where I only specified the time aspects of this. If I add this to any given date, no matter what the time section of it is, it's going to give me 12.15 on that date. Common thing I guess. The other type of arguments that you're going to get are relative arguments. These are the ones that are pluralized. These are basically offsets from the date time. One thing is that if you want something that's one month away, this is going to give you one calendar month. It's going to try and give you the same date but one month now. If that date doesn't exist, for example, if we're here, we go one year from 2016, January 29th, and one month, that's going to try and give us February 29th. It's just going to fall back to February 28th. This is the kind of thing that this is useful for so that you don't have to add your own logic about handling leap years or the ragged ends of months. One thing to note is that with this, the normal addition, subtraction, multiplication, multiplication rules apply. If you subtract 15 days, that's going to give you 15 days to go. It's not going to just always add to it. A special type of the relative argument, you could argue that it's relative or absolute, is these weekdays. What these tell you is if you want a date that matches a specific day of the week. This first one I can construct is something that you add it to something and it gives you the Sunday following the date that you add it to. Here, I just generated three. One of them, or two, these last two, they're in the same week. The first one you add it, it's already Sunday. It gives you that Sunday. You add it to three days later. It gives you the eighth. You add it to three days after that. It's still not the eighth, so it gives you the eighth. We can also pass an argument to these dates, the weekdays, that says an additional offset. An additional offset. This is saying, give me not next Wednesday, but the Wednesday after next. You can see it works essentially the same way. The other thing that makes this particularly fancy is that you can add these all together to get some very specific rules about what the next date is supposed to be. Here, this is just some very simple logic that says, if I'm given a date, I want to jump to the beginning of the next month. That adds one month. Remember, we're always going to be in the same month. Then it's going to reset the absolute argument of days to one. No matter what month you are, it gives you the first of the month. Then this guy, this is going to say it's going to add one year and then it's going to reset to February 1st. Then it's going to go for the third Monday after that date. That's going to give you the third Monday in February, which is Martin Luther King Day. This is an object that if you add it to a date time, it gives you the Martin Luther King Day for the next year. If I multiply that by a number, it's really only going to multiply the relative components. Since the only real relative component that's multiplicative is the year component, I can just do this and I can generate the Martin Luther King Day for the next three years. That's not actually probably the most efficient way to generate Martin Luther King Day dates, because we have this thing called an R rule, which is an RFC called the iCalendar specification. It's kind of confusing. I think it was from Microsoft. I don't know what the name confusion is like with Apple's products there. This is for generating recurrences with some potentially complex interval. For example, one thing you could do with this is if you want to know all of Pat Merida, the actor from Happy Days ran the diner. If you want to know all the times that his birthday fell on a Monday, you can do that. If you want to get the subset of those that happened during the Vietnam War, you can get that he had two of them. It was 1965 and 1971. R rules were originally invented to generate Happy Days trivia, but it turns out that they are useful for other things. The basic components of an R rule is the start date. That's what the basis of the rule is. The start date may or may not be part of the rule. It may be excluded from the rule depending on what the other things are. Freak or frequency is technically not the frequency of it. It's really more the units of the fundamental frequency, but it is the mental frequency. This is sort of like does the rule apply on a yearly basis or monthly basis or what? Then the interval, which is actually the frequency, but it's implicitly one. That is how many periods of the fundamental frequency do you wait before you create a new recurrence? I forgot to re-order this. Well, no matter. Here's an example. The first rule is an hourly rule with interval one. It starts at 9 a.m. You can see I generate that. I ask for the first three of them. I'll go into this counting a little bit later. It's going to give me 9, 10, 11. If I replace the interval here, I set it from, instead of saying it to 1, I set it to 2. Starting at 9, it's going to give me 9, 11, 13. 13 is one o'clock. If I start at a different start time, obviously it's going to go 10, 11, 12. That's fairly straightforward, but the real interesting parts of the recurrence rules come about when you start using these by-rules. These modify the frequency in some way. They have their own set of frequencies by month, month day, etc. The way they work is if the by-rule is of a frequency that is greater or equal to the fundamental frequency, then it's constrained. For the example here, I'm going to say generate a rule that's daily, but by month, November, by weekday, Tuesday. This is going to filter out anything that's not a Tuesday in November. If I start it in 2015, the first date it's going to give me is in November. It'll give me 4 November, 4 November, and then it'll give me 2016's first November. For by-rules that are less than a fundamental frequency, that is more a, I don't know what you call it, a sideband or something. It increases the frequency. If I have a monthly rule and I apply it by month day, I apply it by month day in multiple days in a month, this is going to give me every occurrence of these month days within each month. Here I'm going to say, this says every month I want the first, 15th, and 30th, 15th, and 30th of the month. You see I started after the 15th. Even though I started on the 16th, that DT start is not in the rule because it doesn't match, but it's going to give me, it gives me the first 30th, then move into February, then move into February. There's no 30th in February. When it hits that, the RFC says you drop it. It skips dates that don't exist. You can see, and it will just continue on this way. It'll give you first, 15th, and 30th of all the rest of the months. If you don't limit these rules in any way, they'll just be generated infinitely, or until Python can no longer represent the dates. Obviously, they're generated lazily. There are two ways to specify how to terminate the rule as part of the rule itself, and these are mutually exclusive just because of the RFC. It's not logically necessary for moving mutually exclusive. Here you can, the first one is count. It's pretty self-explanatory. If I give it count two, it gives me the first two occurrences. The second is until, and that stops generating occurrences when we reach that date. That is inclusive. If the until is a recurrence, it will be returned. Even if you don't have limiting information in the rule itself, you can still retrieve subsets of it using the methods on the object. For example, this is the after method. It passes a date. It gives you the next recurrence that happens after that date. There's a corresponding before method, and then there's also a between method where you give it two dates. I think this is inclusive, and it'll give you all the dates between. This is also, this is by Easter rule. This is not part of the RFC spec. This is an extension. I don't know who added it, but I think it's a date exclusive. Here, I'm just passing a by Easter of zero, and this tells me when Easter was in the 90s, because I specified I want the dates between those two. While these rules are powerful, they can't really express everything that you would want to express. When you want to get something really complicated, you would use an R rule set, which allows you to combine recurrence rules, either adding them or subtracting them, and you can also add and subtract dates. I'm just going to work through an example where we're going to generate an example where we're going to generate a fictional bus schedule. Let's imagine that the bus comes from 6 to 11. It comes on 30, comes on 37, and it comes every hour, except in the evenings it comes at, it comes every other hour. First, I'm going to generate an R rule that gives me every hour on the 37 for just the weekdays. I'm limiting this to the weekdays from 6 to 10. That's too many dates. That's too many times. There's going to be too many times. There's going to be two extras. Then what I do is I'm going to take this. I'm going to generate the same schedule, and instead of generating all of the dates, I'm going to generate all of the dates, I'm going to generate just the ones I want to drop. Then I apply that, I add that to my R rule set as an X rule. What that's going to do is when an X rule and an R rule are the same, they cancel out and you don't get a recurrence. Then my fictional bus schedule on the weekends, schedule is a little different. 8 to 8, it comes on the 7 or 8 to 7 to 7 or 7. Basically the same thing, but now I've limited it to the weekends. My fictional town that has poor bus service, they also have scheming politicians. On November 8th, which is election day, they've decided that they don't want people who take the bus to be voting. They arrange to have the schedule just taken off. The bus is out of service. If I want to reflect that in my code, I can generate all the dates between on that day and I'm going to add them one by one as exceptions to the rule to say if this is a recurrence, it doesn't occur. But they're not totally heartless. They're going to offer you one bus in the morning, one bus in the morning at 4.32 in the morning, and one in the evening at 7.49. The polls close at 8 so you're probably not going to be able to make it in time. These are just one off exceptions again, but in the positive direction. So now I'm just going to add both as an R date. The rule is on that date of the course. So now I have the schedule. I can take this and this and I've hidden the part of this where I turn it into a pandas data frame and arrange it like this, but now you can see this is the result of the output, the result of the output organized as a bus schedule and we've got a rule that generates a bus schedule. And that's the week of election day. This is there. This is there. Scheming schedule. Alright. So I'm going to move on to the time zones. So time zones are obviously incredibly tricky and I could probably spend an entire talk just giving you crazy education about people who switched over from the international date line, international date line, some crazy thing like that. But I think I'm just going to give you a quick overview of the key features of time zones. You can maybe just have an understanding of a little bit of what's going on here. So the TC module provides a number of ways to construct time zones because there are a huge number of ways that people have devised to specify time zones. But the important ones are first there's UTC. So this is just an alias for UTC. No offset, no DST. It's called UTC. And this is the basic thing. If you're working with timestamps, your optimal situation would be to convert your stuff to UTC and working with UTC. It's very simple. We also provide a TC local which is just a wrapper which is just a wrapper around pythons functions that allow you to query the operating systems time zone information. So here, you know, you can see I'm on Eastern time and what if I, where is that my TC environment variable to UTC, it would give me UTC. If I set my environment variable to Pacific time, it'll give me Pacific time, it'll give me Pacific time. And that's not just the offset for today. It will give you the offset for any given day. So you specify any day. However, usually your best situation is if you know specifically what time zone you're in in terms of like the geographical and political position of the date you want to express. And the best way to do this is to use one of these IANA or they call it the Olsen database zone info files. If you have a NICS system, it ships with this automatically. Windows does not. But they will ship zone info file. But I will caution you against actively using that. It really just use the normal date you tell time zone handling logic and it will fall back if you don't have it. But you know, unless you update your date you tell all the time, you tell all the time, your time zone file is probably going to get out of sync. Anyway, so these are very useful in a sense that they provide all the historical zone information about a given zone. So here's the New York zone. And today it's eastern daylight time. In 1944 it was eastern war time. It was eastern war time, which is just a permanent day like saving time that happened during World War II. And if I go back all the way to 1901, that was before UTC was invented. And so this happens to have information for the local mean solar time. So these are how dates were expressed in New York in 1901. So obviously that's powerful and if you can you should use it. The way you get those, don't construct them directly from the files. Use the get-to-z file. The get-to-z file is kind of the equivalent of the automatic, the equivalent of the automagical parser function, but for time zones, you pass it something that looks like a time zone and hopefully it will give you the right time zone. If you pass it nothing, it assumes you want local time. So it'll look for the best thing that looks like local time. If you pass it an old time zone, it'll give you that and it'll either pull it from your local file or from the get-to-fallback. And if you want to express some new time zone tz variable, you can do that and this will give you time zone in Australian eastern time from another one of the tz subclass that I didn't mention. One thing to note though is I did a major overhaul of how the time zone back end works for the forthcoming 2.6.0 release, which will be coming up probably next week. So if there's any chance you're going to have to represent something like an ambiguous date, like this is 1.30, 2004, October 31st, there was a daylight time transition, so there were two 1.30s there. In the old version of Datetail, there's no way to express this first date. You always get the second one. It causes a discontinuity in some systems, because even if you're started with UTC, as I'm doing here, it doesn't translate properly. But the new version will. You can also just use PyTZ. They have a slightly different way of handling their time zones, but they do handle this correctly. The final thing I'm just going to quickly go over is the parser. It doesn't require that much explanation, because it's designed not to. You can either instantiate a parser object and parse, or you can just use the default parse method. You just give this something that looks like a date, and it pulls out. It'll do its best to turn that into a date. There's some fuzziness around there where you might get the wrong date, and there's basically no transparency here about how it was generated. You can reason about it like here. You can say here. You can see it's going to try and say it once a day, month, year, but if that doesn't work, it's going to give you here. It's going to say, well, if it's not day, month, year, it must be month, day, year. Here, it can't be month, day, year. It can't be month, day, year. This guy has to be year, month, day. When should you use the parser? Only use it if you don't know the format of the date, or if you want to pull out a date that's not just a date, you can use this fuzzy keyword to just pull out a date from a string. Obviously, if this guy's name was April, that could be a problem. The other time is that the SCRP time mechanism for parsing dates where you know the format doesn't really handle time zones correctly. If you need to parse a date that has time zone information, we'll do that in many cases. Here's an example where it understands how to turn these formats into offsets. Be aware, though, this uses the POSIX format, and POSIX is completely backwards. So CST minus 8 means 8 hours ahead of U2C. UST minus UST minus 4 means 4 hours ahead of U2C. So it's bizarre. But a bear minus 0, 4, 0, 0, 4, 0, 0 does mean U2C minus 4. It's very confusing, but that is the ambiguity inherent in parsing dates. The only other thing I'll say about time zones is if you have a limited subset of time zones that it could be, like, for example, CST, ISD, these are overloaded terms, but a lot of times when you're printing things, you use the local TZ name. So if you know that you have base only coming from India and China, you can specify if it's ISD, give me this. If it's CST, give me that. Otherwise, you don't know what it is. So, all right, that's the end of the talk. Just want to give my plug for if you guys want to help out. Come on, get help out. Come on, get help. Like, you dive into issues. We could probably use help by reformatting the documentation. I've been pretty terrible about that. Also, the parser doesn't have enough transparency, and I think it could be really improved. So I'm going to be trying to solicit feedback on the parser. So if you use the parser or you want to use the parser, you know, you can get in touch, and I'm going to try and come up with a sort of mini RFC about how I'm going to change the parser. And then finally, I'm a big believer in the web of trust, especially with software and stuff. So if you guys want to cross-sign my keys that I use to sign my keys that I use to sign the library when I release it, this is my PGP key, and I have them on little slips. I can show you my ID if you want to cross-sign my key or you can show me yours. You can just sign it if it doesn't have to be cross-signed. And if you don't understand that, feel free to ask me and I'll explain that to you. So, any questions? Go ahead. Sorry. In your singular, ARBS, do you handle a negative? So can you say something like the last Thursday in a month? Yeah, those can be negative, and that is how they're interpreted, I think. But, yeah, those are more relative. They're like a hybrid. It's relative slash, you know. Okay. In other words, use it cautiously. You can play with it. Use it cautiously. You can play with it. Yeah. I mean, it's really, I think what you really do is you get the first of the next month and then you get one Sunday before that to get the last Sunday in a month. There's examples in the documentation of that. Okay. And then can you also do, say the, so I'm thinking of like some financial products where it's, you know, it's the last Friday within the, you know, within 30 days, you know, within 30 days, it's the last Friday of the 30 days. Well, I mean, you could generate an ARB rule and then specify within 30 days or 30 days or, I'm not sure. Oh, I get it. Do the between, but then the last Friday that. Yeah. I mean, I have to think about exactly how you want to do that. Okay. Exactly how you want to do that. I mean, worst case scenario, you just generate the last two and then find out which ones are within 30 days. Within the 30 days. Okay. Thank you. Thank you. I just have two quick questions. The first, since time zone data is always changing, is there like a canonical source of time zone data out there, time zone data out there? The second, like, how does a parser work internally? They might ask so many times, like, is it a bunch of like regular expressions or like, how's it work? How's it work? Okay. So the first one that, that's the IANA time zone database that I mentioned. Usually you're operating, usually your operating system ships with it. And when it gets updated, which happens about 15, 16 times a year, your operating, your operating system is responsible for that. And that's why I say use get TZ instead of passing a specific file, because otherwise you have to manage those files, to manage those files. I also maintain a mirror of the database that I ship with every new release, but I don't do a release every time there's a new time zone. So, so you can build yourself, you can pull, you can pull the time zone data from my mirror and then use it directly and then use it directly. But ideally you have your own time zone data. And then the second question was, How does a parser work internally? High level. It's not, it's, it's like a legitimate parser where it tokenizes it and lexes it and then it just looks for stuff that looks like dates. But I really think the real issue with that is that it's very opaque and it's not modular and it's not modular. So that's what I'm looking for comments on is that, you know, my hope is, and I think date parser, which is another library, another library just wrote their own parser that does something very similar where they allow you to specify more like a list of possible dates. They have a slightly different aim. But anyway, we can talk about that more, but I can talk in length about that. You had a question. Have you seen the EX time library from DevMod? No. What is EX? No. What is EX time? EX time. Go take a look at it for many of the parsing and higher level APIs that you've been asking about. Okay, I'll do that. Thanks. A question about anything, a question about anything kind of got you to see more and more people about it. And there's some kind of weird edge cases that it's like a topic. Well, I mean, there's a lot of things around time zones that are very complicated, that are very complicated, like counterintuitive, like, for example, in Samoa there was no December 30, 19, 2011. It went directly from December 29th at midnight to December 31st, 31st, because they moved over from UTC minus 11 to UTC plus 14. Plus 14 obviously a ridiculous time zone offset. Also, you know, I mean, there's so many things you could say like Morocco, for example, Morocco, for example, usually you think you either have zero daily savings transitions or two. Morocco has four, but only when Ramadan occurs in the summer and only if their legislature passes a new bill every time. So this year they have four because in Ramadan, you're not allowed to eat until after sundown. So daily savings time is artificially for longing sundown in terms of civil time. So during Ramadan they push it back because everyone just, you know, goes to work at noon and then, like, it is hungry all day. It is hungry all day. So they're like, well, we don't want to do that during Ramadan. So they go daily savings back and forward, but then, you know, Ramadan drifts around during the year. So eventually they'll be, you know, there will be different overlaps, different overlaps. So, yeah, I mean, the big gotchas about time zones though are about ambiguous dates, and Piscesy handles those much better. He handles those much better for now. If you're using Piscesy though, adding or subtract, doing arithmetic on time's metric on time zone aware things is not a winning proposition. I think that you tell handles that better than Piscesy, but, you know, you know, there's an argument to be made for the fact that that might be a fool's errand because it's kind of dangerous. Like you're really supposed to either do your operations on your like additions or generate our rules on completely naive time zones or UTC. And then apply your time zones at the end. So you mentioned in the talk that if that person's birthday in the parser was April, it could potentially... Yeah, I think I saw this on a Stack Overflow blog. Basically what it does is, it's just trying to find anything that's date-related. So when you go with the fuzzy parsing, this is going to say, all right, is this date-related or not? If it is, I apply the normal rules. If it's not, I throw it away. So if this guy... If this was April Merida instead of Pat Merida, it would say April Merida. Okay, this is in April. And then it would keep going. It would find June. It'd be like, well, I already know the month is April. So let's forget this June nonsense. That's probably someone's name. And then it would give you the rest of the date. So it would parse probably as like April 28th. And if it was July 31st, then it would give you an error because it would be like, well, April 31st isn't real. The improvements to the parser, I think, will hopefully add something to that. But there's not much that can be done when it's real fuzzy like that. Yeah, because I was just thinking, if you're putting something into this parser function automatically, thematically, it could be troublesome to pre-filter it to say, oh, if the person's name is... The real issue is that the parser gets zero context. So the more context you can give it, if you can pull out sections of the date you already know and pre-parse them or whatever, and just put them in a known format if you really need to parse the rest of it, you know, that really helps. So for example, you guys probably all got a ticket that says like, admit one PyGotham. On that, it had predicted 072016, right? Now, we happen to know that it is July of 2016. So if we saw that in context, we know that that's just referring to the month of July in 2016. However, DataTill does not know anything like that. It'll know anything like that. It'll see 072016, and it'll say, oh, that must be July 20th, 2016. So, you know, that is... It's not a huge error, but it just illustrates that context is actually kind of important. And when you're parsing just... When you're trying to be automagical, it's tough, like, without context. Is that all the questions? All right. Thanks for listening to the talk.