 All right, excellent. Well, welcome to Data Stories. Hopefully, we'll have some interesting dialogue and see what's going to happen. So first off, my name's Jeremy Heingartner. That's enough, yeah. So yay, yay, yay, yay. I wasn't expecting to pause. So how many people do a fair bit of data analysis on a daily, weekly basis? Raise your hands. I got bright lights, so let's see. That looks to be about a third to a half. Of that data analysis, how many people are examining other people's data? And by this side, I kind of sort of categorize most of us into two groups. You got the people that are consultants and the people that have products. Are there any others? Any other of these? OK, so who looks at other people's data? OK, a little bit. And how many people examine their own data? OK, that's actually the majority. What kind of data? How you feel about things. How you feel about things. All right, what other kind of data do you have that you examine? Logs. Logs. Traffic history. Traffic history. Scientific data. Ooh, so you're creating scientific data. And consuming. I don't want to chat with you. That sounds interesting. So yeah, so your own data, web logs, data transactions, performance metrics. How many people are basically doing of their own data metric analysis? Roughly. OK. Anyone looking at public data? Two. All right, what do you guys work in on? Ben. Real estate listings. Real estate listings? But we shouldn't say. Say again? But we shouldn't say. Oh, you probably shouldn't say. OK, David? Yeah, real estate listings. Real estate listings. Well, yeah, you guys should chat. So there's a lot of interesting public data out there. And it's starting to come around in the past few years. Is it Hans Gosling in Sweden? Or Grosling? I forget the last one. Rosling, yes. OK. Who has a wonderful site that he did a whole TED talk about UN statistics of health data and world health. It's an amazing talk. And there's actually an hour-long program that he puts together called The Joy of Stats, which is pretty fun. So most of that data that he got is all from the UN statistics website, data.gov.uk. Anybody looked at it? Yeah, data.gov before it disappears. It's about to lose its funding. Yeah, I just saw this thing. I don't know if it's correct, the Scottish Home Survey. Is anyone familiar with this? I found it when I was looking for data sets online. Apparently, it has something to do with people that, like every few months, there's a survey that goes around and people track different things. But it's available for public consumption if you sign up to get it. So it might be really interesting to look at. The Guardian. Who's familiar with the Guardian's blog? Data blog. All right, yes. That is probably the best if you want to call it data journalism. I think they're coining the term data journalism these days. And I've highlighted IMDB, Internet Movie Database. Yeah, that's my sample data set that I'm going to play with today. So anyone working with all three types of data? OK, Ben. Ben is the only one. Well, besides me. I work at Collective Intellect, and we both deal with private customer data, our internal data, and public data. And for our private customer data, we'll take data feeds of their chat logs, their email lists, or different things like that and analyze them for mostly content type things. Sentiment analysis, theme generation, that kind of stuff. A lot of statistical language analysis. Internal data. This is where I'm similar to most of you guys. We look at metrics of our internal systems and see what we can have. And we also have public data because we take data feeds of blogs, boards, tweets, all that kind of good stuff so we can analyze and see what everyone is talking about and what just happened. So first thing, when you analyze data, is what? Clean it? Well, that's the second one. That's it. First one is get it. Get the data. So we've got all these public locations. You've got your logs. You've got all that kind of good stuff. Yes, and we have everyone knows about cleaning data. How many people have, in their data analysis, how much time do you think you spend doing your data cleaning? Yeah, it is. Yes, the majority of your time. Majority of your time is spent cleaning the data. Because hell is other people's data formats. Yes, including your old formats from last year. So yes, one of the interesting, one of the things I'm trying to get better at is actually using some very common data formats. How many people are familiar with Avro? Avro. All right, we got one down here, somewhat. It's an Apache project that deals with basically a common data file format. There's writers and readers in every single language that I've ever written in. And so it's a really good format, especially if you're going to do Hadoop MapReduce type things too. But its goal would essentially be to try to replace the CSV. So that would be a really good thing. Who's had really interesting data cleaning problems? Yeah? Yeah, I want to hear about them. We should chat later. Probably not right now. But I'm interested in hearing about interesting data cleaning problems. And the hell and joy you've had doing so. So the first thing is the internet movie database. Who's ever downloaded this thing? Yeah? So it's a whole bunch of GZ text files. Yes, each with its own format. Yes, you'll have one GZ text file, which is the list of the titles. Different format than the one that's the list of the release dates with the same titles. The titles might not be the same in each different file. Each one of these files may have this chunk of text, like a read me at the beginning of it, or it's at the end. You don't know which one. So yeah, it's real fun. So when dealing with text, this is probably one of the biggest one is encoding differences. So who's done encoding changes? All right, isn't it great fun? I'm actually not sure how the internet movie database works with this, because everything I can look at says that all the files are ISO 8859. But it also has Japanese film titles, which doesn't seem to really work. So I'm not sure how they're doing that. So yeah, ISO 8859 to UTF-8 conversion is chunky and horrible in any type of encoding format. You're going to lose something. Dates, yes, date. Can't we all use the ISO 8601 standard? Isn't that great? Would be nice if we all did. And then there's just kind of crazy generic inconsistencies. So there's one file which is the color of the movie. Is it color or is it black and white? There's two choices. You go through the file, there's actually floor. You got black and white, black and white, black and white, and color. Yeah, that's not fun. So that's one of these little special case situations where you're going to need to convert, special case conversion to normalize. I normalized on the first one. It really didn't matter. But that was the one that was the most constant. Country inconsistencies. So one of the other files in here is there's multiple different places that countries are in the internet movie database. There's actually a country's titles file, which it doesn't say what the country means, but I'm assuming that it's the country of origin of the movie. There's also running times of the movie. So apparently movies, TV shows have different running times in different countries. Yes, yes, it's great fun. And then there's also release dates. Release dates are in different times in different countries. Now the fun part is, is you'd think, hey, if I'm gonna put release dates in countries, I might use something like, I don't know, the two letter code or the three letter code that's an international standard. No. So most of the time I'm actually ignoring the countries, but you've got countries in three different locations and all three of them are different. So it's great. So why are we doing this? So who is doing data analysis because they have to? And who's doing it because they want to? Oh, see, here we go. Some fun folks. I do it because I wanna learn something new. I may not be useful, but it's something new. How many people have read Super Crunchers? We got one. So Super Crunchers is an older book. I think it came out around the first three economics or the second three, or the, or before. And the whole forward, he talks about analyzing data to predict the future price of auction of wine. So he had this magazine journal, something like that, that would talk about what he would recommend what wines to buy. And he was predicting wines with a huge degree of accuracy. The day or basically the year they were released and for wine that wasn't supposed to be drank for 20 years. So he's saying, hey, in 20 years, this wine's gonna be worth this much, or it should probably go for auction for this much. And people are like, oh, that's just wrong. That's just wrong. No, no, no, no, you have to go by taste. The guy didn't taste wine at all. He didn't, I mean, he liked wine, but he did, he wasn't a nose. He wasn't, is enophiles at the appropriate term. What he did was he had a ton of weather data that he'd gathered from every single winery and chateau and all over the place in Europe because all of these ventiners had collected their weather data for a very long time. So he correlated the weather data for wines, for grapes, with their price 20 years later and was able to basically predict the prices of wines. And then he was also saying, hey, the wine is two, within a five-year period he predicted something like two wines that were 100-year wines, like the best wines in 100 years. And everyone's like, no, that can't happen. That's just wrong. But it was true. And it sort of turned out because there was a greater area that had the right temperature and weather for growing the appropriate kind of grapes for whatever those wines were. And it was also a lead-in to showing why Argentina and Australia were starting to get better and better wines was partly because the temperature was getting just a degree or two higher. And so it was getting drier, a little bit more sandy soil, all that kind of good stuff. So the wines were getting better. So it was really interesting. But that's a data story. He did some analysis and there was some sort of, there's a story behind it that was really interesting. Outliers? Who's read Outliers? Okay, in this one, Malcolm Gladwell talks a little bit about the 10,000 hours of becoming great at something. And one of the interesting ones in here is he goes in, the story is about going into a hockey team and finding out that it's everyone's birthday in January. It's like, well, that's kind of wrong because birthdays aren't that collected. But looking at the data, you could see, oh, well, all these people have birthdays at the basically the first quarter of the year, January most, a little bit of February, some March, goes on down and then the ones that have birthdays at the end of the year are few and far between. And it turns out that this happens to do with your ages when you start playing hockey the very, very, very first time. So if you are at the appropriate age in January, you'll actually be bigger, stronger, faster than the seven year old when you're eight. And that one year, and then after that, every single year, you'll be a little bit better because you're better, you'll get more training and so forth and so on. The fact that you were born in January is a greater predictor of success in hockey than pretty much anything else, or unless your parent was a hockey player, that's another one. Yeah, the highest statistical correlation between if what's the best way to be a professional baseball player is be the child of a professional baseball player. Yeah, that's pretty much what it is. Freakonomics, probably the most famous of all these. I'm not gonna talk about a story from that one. Super Freakonomics, another one. It's got more and more data. And the science of fear. Does anyone read this one? Now this one is basically about talking about a fear culture and what's happening when fear is a basic instinct that is the one that makes this easiest to manipulate us. And an interesting one in here is a data story in science of fear. Is there's a statistician I think in Germany who after 9-11 collected all of the highway death information, all of the highway death data from the United States. And then did a correlation between an increase in death on the highways in the United States and a decrease in flight. So he was able to correlate that right after 9-11, and actually for almost exactly one year, more people died on the highways in the United States. Because they chose to drive instead of fly. And that number of deaths is more than the number of people that died in the airplanes. So that was a really kind of, oh my gosh, data story that you're going, it's interesting. There's a statistical thing there, but there's a bigger story behind it. That story behind that one is a choice that people made. On a lighter note, who knows, okay, Cupid. Okay, this is a really cool website. Yes, it's a free dating website. But it's got all sorts of interesting statistics, and they have the best interesting data stories about people. One more of the recent ones is there's a website called What White People Like. Okay, well, okay, Cupid written and said, well, we'll actually tell you. Because we know everyone's race and what they talk about and what they say they like. So they drew out basically word balloons of, hey, this is what white guys like. This is what white girls like. This is what African-American men like. It's amazing. It's absolutely hilarious. There's all sorts of stuff. The time that a photo is shot versus the time of day that photos are shot and that correlation to first emails that people send you, that kind of stuff. It's kind of crazy. But they have a lot of data and it's really, really interesting. And The Guardian, which of course has data stories all the time. So all of those stories were to say, hey, let's find something interesting in the internet movie database because we've got these interesting data stories. We're not gonna try to get as good as that, but we're going to just kind of have fun and see what we can come up with. So the internet movie database, this is the pieces that I pulled down. We have the title. I think it's the year it was made. There's a year column in the main titles, but it doesn't say what that year column is for. So talk about your data cleanliness. And also, most of the titles have year and parentheses and there's a year column. Sometimes those don't agree. So I'm not exactly sure what's going on there. We have an entire Actors and Actresses files separate. The release dates by country. The color, black and white color. Running times by country, country of origin, language, production companies, which I didn't actually do anything now with, but we might be able to play with it a little bit. Genre. So what's the genres of the movie and directors. So there's actually a bunch more files, but these are the only ones I pulled down to put into the database. So now this is the tools that I used for this. The only tools. Okay, so let's start with... So one of the things when I start playing around with some sort of data is first clean it, put it in database, see what we kind of have, and then let's just do the simplest thing possible. So the simplest thing possible for me in this case was... Oh, I already have that open. We're just going to dump the movies. Oh, that's movie types. Let's see. Dump movies. Oh, by year. There we are. So this is, I wrote a couple little libraries to deal with this, or built on a couple. So the first thing we're gonna do is we're just gonna look at the movies. And movies, in this case, is actually movies, television shows, mini-series, all sorts of different things. It's the internet movie database, but it includes television and made-for-TV movies and direct-to-video movies and all sorts of stuff. There's even a title for mini-series, but I didn't find a single mini-series in it. So we're just going to look at everything by movie. And then I cheated a little bit, so there we are. That's what I wanted. How many people use R? Okay, the rest of you should raise your hands next year. So R is probably, it's an amazing stats tool, and probably the best package in it is a thing called GG Plot 2. I'm gonna move this over here for now, so where did it show up? Ah, it's on the other screen. Okay, so there we have movies by year. Oh, that looks a little wrong. I think we have some more data quality issues here. Anybody know if a movie was made in the year 500? Yeah, so you think you do a little cleaning of the data? I mean, we've got some here in year zero, it looks like. 1500, yeah, it's O1, year one. So this is one of the things, the first thing that's gonna happen when you look at data, you're like, okay, I think I parsed it and put it in the database. It all looks good. Oh, crap, you know. So luckily I've walked through this once already. So what we're gonna do is, I'll just show you right here on the screen, is, everybody see this? I did a little Wikipedia research. Turns out the first, what people are talking about is the first actual motion picture is that little spinning horse thing that everyone did. That was in 1878. So we'll use that as a lower bound. We'll probably know movies better than that. And I decided to make it less than or equal to 2010 because 2011 is not done yet. So everything we look at is gonna have a downward curve for this year. So decided to just do last year. So what we have here then, get a data set. I'm actually skipping over a bunch of the stuff in R. But this is basically, R says I'm going to read in a CSV. I'm gonna read in a CSV. It's got a header on it. And the columns are factor, which you can just kind of think of as a dimensional non-string enumerated type thing and then integers. And then we're going to graph it. And there we go. We have movies by year. Now interesting thing here, and here you can see you've got direct video, movies, TVs, TV shows, video game. This is GG Plot 2. So you should all, if you're using any R and any graphing, you immediately install GG Plot 2 and then you're happy. That was one line to draw this, which was very, very cool. So yeah, so that should be had some dirty data, 1878. So that's interesting. Okay, that's kind of cool. We can see, well actually an interesting thing is this right here, right here. Movies, and what is this? 1910, 1912, 1913. It's a huge spike. That's kind of cool. And then it falls off. And then it's not until you get over here where the volume is even higher again. So, but we can see that there's a lot more movies and direct video has actually had a spike and maybe a few years ago and it's falling. This is down here is video games and then you've just got a continual increase in TV shows and TV movies seem to be doing pretty well. So I think that's kind of interesting. But how about if we look at TV, movies by year by color. So is it black and white or is it color? So that's an excellent, that's what's running this time. If anyone's looking to query, there we go. Have some graphing. And this one, I chose a different thing. If you're looking at R, this facet grid, I basically told it, I want you to graph color versus black and white and put each different type of movie on a different graph. So this is what we have. Well, that's kind of interesting. So we still have this spike. The spike is definitely in movies. There's probably some dirty data over here. TV show in 1870 something, probably not a really good thing. But what's interesting here is this spike and then there's this crossover point. It's kind of cool. Let me see if I've got a better one. No, that works. And then you see direct to video. It's slowly, there's still black and white direct to video and there's all this stuff. But here's an interesting thing. This spike right here, how did I do a separate one for this? No, okay. So this is the year that this graph is using year that's in the title. Well, the other thing we might want to do is look at the release date. So maybe there's some bad data in here because we've got bad data right here, but maybe some of this might be a little bad too. So we're going to run the other one and we'll see one of the things we want to look at and see is if this is still here and if there's anything going on, if these are still around. So this might just be bad data in the main titles file. And so for the release date, I'm actually, yeah, nevermind. I'll let that one run and then we'll still talk about this one. Now an interesting thing here is this crossover point and it looks to be a little bit further along in, let's see if I can widen this up a little bit. We'll look at it closer. Okay, so the crossover point right here looks for what? 1964, somewhere in that range and these are like 67, something around that range. Well, it turns out that's that, I saw this crossover point, it's like, what's going on here? Well, it turns out that's actually the crossover point when black and white switch the color. And this is about the time when color televisions became available and are not available. The color televisions been available for a while but that's when they became more affordable for the average person on the street to have a color television. So this is the production of television shows but that has a correlation with more color television shows were done as soon as more people had color televisions. So this is actually kind of interesting. You can see, hey, when did movies and televisions change from black and white to color? I think it's kind of interesting. Now this is an earth shattering information but it is interesting to look at and say, you know what? Just by drawing the release of the years and black and white versus color, we can see something interesting. And the other piece right here, is anyone a film buff from the 1900s? Yeah? You know what happened there? Okay. Yep. So I did 10 minutes of research. This is actually when Hollywood was founded, right? About 19, 13, 14, 15, somewhere in that range. And this is also, but that's the main thing. But yeah, there is this huge increase in black and white. And there's also another interesting thing. There's also has to be, the ease of production of movies was much easier at that point for the person on the street to make a movie or something, it was a little bit easier than it had been five years before. Now another interesting one. World War I. Yeah, 1914, 1918, is that right? Yeah. It didn't come back. It didn't come back. Yeah. Now, and yeah. Well, here's an interesting thing. If you look at this, this peak of movie production in black and white is what? Maybe 8,000 movies a year? Yeah. When do we get to 8,000 movies a year? Yeah. I mean, look at that. It's amazing. I'm interested in that. We looked at this data and now there's an interesting story to find out about. I'm gonna go look at some history and figure something out. It's kind of cool. Yeah. Yeah, so a larger portion, yeah, the proportion of rubbishness here is unknown. Let's put it that way. We don't, I didn't put the star data in here. So, but let's take this. We've got genre information. So let's take the genre information and see what we can come up with. Oh, actually, we'll do another quick one right here, which is, I did it by release date, which shows you, you know, hey, when you can have something that's sort of a crap graph, which you'll get. Or maybe I won't, because I got a bug, so. All right, we won't do that one then. But let's look at genre by year and year by genre. Okay. And there's a whole bunch of genres in here. The other thing I'm doing in this one is because, and the other graph was the one that was gonna illustrate this. I'm doing this only for movie movies, not for TV, not for direct to video, all that kind of good stuff. When you look at the release dates, well, every episode of a television show has a different release date. So the release date graph just blown out by television because there's like 80,000 releases a year or something like that. So it's kind of a junk piece. Okay, so here we have genre. Now here's an interesting one. See that huge spike in those early 1900s? The color, there's a lot of different genres. And actually, let me redo this, because I did a little thing where, I said let's cut out the genres that really don't have enough data in them. So that's one thing, it's just too noisy. So we're gonna trim out a little bit of data. And I did that and said anything that is a genre that has fewer than 100 movies a year, we're just not gonna look at. Just from a playing around perspective. So that huge, a large piece of that huge spike looks to be shorts as a genre. So short films, that's kind of interesting. That's good to know. So now we have something interesting. We've got this huge spike in what, 1913, 1912. And it may be mostly short films. So was there a large, was there a short film, Renaissance or something like that? It was the new start, it can't be a Renaissance. So but it was interesting. But we also had, it looks like probably documentary or drama and comedy. So those look like to be the main types of films that were made in the early 1900s. And then you go over here and look and so we've got, what is this? Probably more shorts all the way down. Documentary, drama, I think this is action right here. We've got the blue for like horror. So you can definitely see that the types of movies that are made in terms of volume, the number of movies that are made. So that's kind of interesting. Let's see, what do we got next? Ah, running time. This was kind of an interesting one to play with. And this may be all we're gonna get to. Movies by running time by year. So I wanted to look at, okay, and I'm not actually segregating by country here. So to look at the different countries and see which ones have different running times and things like that. But if we look at the running times by year, hey, maybe there's a thing. What was the time of movies? We saw that all those shorts were there. Well, maybe all the other movies that weren't categorized as shorts, but were short movies. So let's see what we can find out from this one. So this is the average running time of a film that year by genre. And that's not really all that awesome, is it? Hey, look, apparently there's a 300 minute movie in about 1917, or that's the average of one particular type. But the interesting thing here is it looks like movies are getting shorter in general. Let's actually see if we can actually graph that and see if that's true. So this is all of the movies by genre, by color. So it's a little noisy, but there is sort of this, it looks like movies are getting shorter. So let's see if we can actually agree that they are. So we'll put, we're gonna re-graph this and put in a trend line. Yeah, it looks like movies are getting really shorter. So we add this little graph line and that's in QQ plot and GG plot is just, hey, I want a smooth line in it. So that's interesting that movies are getting shorter. So I don't know why, there's probably a story there, maybe there's people don't like to sit in chairs longer or something like that, but there's a story there, somewhere in there, there's a reason why movies are getting shorter. Maybe it's the cost per minute of a movie. Who knows? Yeah, there's a lot of short stories too. So that's the other thing that we may be able to say, hey, you know what, let's take out short and look at that. So let's try something different. Let's take, let's just graph the frequency of those times. So we're gonna do, let's do a histogram of, oh, I just lost my console. So let's take the times and let's bend them in 10 minute windows. So we'll have all the movies that are zero to one or zero to 10 or on one bend, 10 to 20 or another bend. So this is basically a frequency analysis of the length of movies. So we can see, hey, look, it looks like there's a lot of movies that are between about 10 to 20 minutes long. I bet you if we did this by genre, actually I have that right here. Let's do this by genre. I'll tell you right now, it actually, it doesn't look that great. But just because I haven't dealt with any of the defaults. But look at this right here. This block is one big, oh, I don't know, short genre. So it looks like shorts are generally 10 to 20 minutes long. You can see that there's this in terms of this is actually just movies. This isn't TV shows and stuff like that. But it looks like a large portion of movies are all, what is this, 50, 60, 70 to 100 minute long. That's kind of the general range of movies. And there's a whole other spike over here. So we can see, hey, movies fall into generally two categories. Ones that are zero to 20 and 70 to 100. Which is kind of cool information to know for whatever reason. So this is kind of the stuff. It's data stories. Just play around on the data and see what you find. Let's see. Now, this is movies. So what if we did the same thing with TV shows? So TV shows probably have, we can probably guess and say they probably have a more rigorous time limitation because they gotta fit ads in it. They're all in a certain times. So let's see what we can do with that real quick. And did TV shows over time, did their length change? It's another one. So we'll do the same genre, color, plot, pointed thing. And it's a little bit more regimented. It sort of looks like they're in definite minute increments instead of something else. But it's still a bunch of hooey. It doesn't look that great. So this really didn't give us anything. But how about we plot that by the same thing we did before and by golly. That's kind of interesting. So this huge, there's a huge spike here which is probably, it looks like 50 to 60 minutes. And then this over here is the 20 to 30 minute range. And then zero to 10, 10 to 20, 20, 30, 30 to 40. Yeah, 40 to 50, 50, 60. So this probably looks like television shows. And it looks like it's generally spread across all genres. So it's kind of interesting to see that yeah, television shows have a, basically actually, that is, if we bring up the other graph, which was movies, that dip in movies is where the spike is in TV shows. That's kind of interesting. And actually, I didn't even think about it. I just saw it right now. So that's kind of cool. So that's a little bit of data analysis of the internet movie database. I mean, we didn't even get into actors, directors, production companies, countries of origin, language of the film, all that kind of stuff. That's all we did was just running time, length and stuff like that. So that's sort of a basic data story that we talked about some data stories that already exist. And then we tried to create our own data story. We got a little bit in, maybe it's a little background research, but there might be a bigger story in there somewhere in just terms of all sorts of stuff with data stories. So questions, comments? Yeah. So how do I keep my stories from being fictional? I don't know. Mostly, this story is it fictional? I don't know. Well, there's more research. You're gonna have to come up with some background research and verify sources and all that kind of good stuff. Yeah. Yes. What tools are you using to clean data? For this particular one, what tools am I using to clean data? For this particular one, it was just basically Ruby scripts. I wrote some Ruby scripts to look at the data. I'd try to insert it in the database and then I'd blow up because of some type, some things that were wrong in type or didn't parse well, or just some quick scans to get it all stored. So most of the time, for me, for cleaning data, it's ad hoc something, so write an ad hoc program. So, yeah. A rigorous data clean, run a business on. So are there tools that you use in that context or do you also use a lot of custom scripts for cleansing data? Cleansing data for work. Most of the time, our biggest one is generally encodings. So we're using icon V XML type stuff to convert things, character encodings, the entity encodings. That's basically our big one. A lot of the stuff is non-breaking white space and those are really horrible. Or the one that reverses the character so they print in a different order. So that's a really interesting from spammers to use. Yeah, so mostly ad hoc, we don't have any production tools, tools that we've made based on open source tools. Yeah, up in the back. Have I used Google Refine? Have I used Google Refine? No. So I know the name. That's as far as that's all I know. There's another one in the back. We're not, in most of the data stuff I do actually is the easy stuff. We're getting into the more in-depth correlation stuff. R could do the correlation. I could do some correlation stuff. I don't do a lot of it, but I could give it the data set and say, hey, correlate these two. What's the correlation factor? And it would tell me. In terms of statistical stuff, R is what I would use. Yes. Yeah, public data, it's hard to get a hold of. Yes, and published stuff in PDF, which is a pain. Okay, look at the World Cup. Yeah. Yeah. Any of the claims made by the Protesters. Yeah, so claims and proposals that the World Cup will bring economic benefit to the city that's hosting. It's the same with the Olympics. And I think they've done the same with the Olympics and shown that the Olympics, most cities that host the Olympics never recover. Actually, it's anecdotal, but a friend of mine said, like Barcelona is one of the few that's actually made a profit out of it. So I don't know if that's the case or not. I do have one interesting graph that I wanted to show you guys. Some of our data analysis that we did for some stuff at work. So we were having some problems. This actually isn't gonna look all that great. This is the only version I have. My internet died on my computer. The airport's gone, so I couldn't download the better one. So this graph right here is the rate of insertion into our database, so by the 15 minute interval. So this is, and then it starts going down. Then it goes back up, then it goes back down. Then it goes back up, then it goes back down. Then it goes up, then it goes back down. And you can see that the general trend is down. And this is over since February to probably last week. And we're like, what is causing this thing? Well, it turns out, this is Monday, and this is about, this is Sunday. This is Monday, and Sunday, Monday, Sunday. Well, it turns out, we do, these are, these are in tens of thousands of insertions per second, I think. So it turns out that Sunday night, midnight, and this happens at midnight is when that spike happens. It turns out that's when we lay down the new partition for the tables. So you can see as the partition gets full, it gets slower. The brand new partition, it gets full, then it gets slower. The troubling piece here is this rate from good to bad is getting faster. And we're thinking, you know, maybe that's actually just disk layout. You know, it's the different section of the disk, so I forget which the inner or the outer ring is actually slower to write to than, so this basically can show disk performance and insertion into your database. So that was kind of interesting. What did you do about it? We haven't done anything about it yet. We're trying to figure out what to do about it right now. Yeah, I told a great story, but we're sitting here, we're going, what's the problem? So I actually, I'm out of time, so I'd love to hear anyone's data stories, what's going on, but please chat, love to hang out and see what's happening. So thank you all, I appreciate it.