 Thank you. We'll see if my phone has data or not later on. But yes. OK, so this is Boo and Panda. And we're going to talk about validating big data jobs and stopping failures before they get into production. This comes from all of my experiences having failures get into production. And while I keep a resume up to date, I'm told it's not a normal thing to want to do. That number is going up. OK, we'll pretend that's intentional. OK, so yeah. If anyone's interested, the slides, and if there's a recording, I'll put a link to the recording. But the slides will be at this link. They're not there yet because, of course, as any good person, I'm adding only fresher cat pictures every time. And so I have to update the slides at the very last minute. Oh, and I have to just, there we go. OK, yes. Senior data scientist means unicorn, right? Very difficult to find. OK, so my name is Holden. My preferred pronouns are she or her. And I'm a developer advocate at Google. And I'm on the Spark PMC. And I contribute to Beam and a lot of other projects. I've been at a whole bunch of other companies. They were all very nice folks. They paid me money. I'm a co-author of two books on Spark. And there's actually a third one that's not very good, so I don't tell people about it. If you want, you can follow me on Twitter. And if you're interested in learning more about open source or learning more about Apache Spark and how its internals work, one of the things that I've started doing is code review live streams. And that just means that every Friday, most Fridays, I sit down and I look at the pull requests coming into Spark. And I'll do my code reviews live. And I'll talk through the process. And we'll go and explore the code base together to figure out what it is people are trying to do. And I do this on Twitch and YouTube. And they're recorded on YouTube if anyone wants to watch them. In addition to who I am professionally, I am trans, queer, Canadian. I live in America on a work visa. It's a really exciting time to be in America on a work visa. It's all kinds of fun and part of the leather community. And this is not directly related to stopping failures in big data. I'm going to take the unicorn horn down. It's distracting me. But I think it's important for those of us who are be building machine learning tools and tools with data to look at our teams. And if everyone on our team looks like ourselves, that is not a good sign. You are just going to recreate yesterday's problems with more data and better tools. And I don't want that. I want us to solve new problems. So both in your teams inside of your companies and in your open source projects, try and build diverse teams with people from different backgrounds. And to do that, we have to talk about where we're from. OK, I'm going to look really quickly, briefly, at property testing. Then we'll switch focus on validation and how it's related to property testing. We'll talk about making simple validation rules, limitations for making validation rules for big data pipelines. We'll talk about what people do in practice, which is depressing. And then I'll try and convince you that you should do better than most people. And I promise at least one cat picture. I didn't quite get to the one cat picture per slide. I was trying. But I'll get there one day. So I'm hoping you're nice. I'm hoping you like silly pictures. I am curious. How many people are familiar with Spark or Beam or Flink or a system like that? OK, cool. So that's a good percentage of people. If you're not, this is still relevant. If you're doing data work locally and you're not having to use a distributed system, you probably still want validation tools. They're just going to be a lot simpler to build. And that's kind of exciting. Yeah, OK, cool. So hopefully we all test our software. I'm not going to ask you if you don't test your software. The last time I did that, I found out my bank didn't test their software. So we should be better people. We should avoid making our users angry. And we should save money. We all know that we need to test our software, even if maybe we don't, because we put it in a Jupyter notebook and we pretended that everything was OK. Why should we validate? Your tests probably aren't perfect. The real world is so much more mischievous than I am when I'm coming up with my tests. At some point, we are going to get aboard the SS failboat or HMS failboat or whatever, depending on your localization. And that is not a great boat to be on. But the best thing that we could do is know that we're failing so that we can just pretend everything's normal. And we just don't tell the customers that everything's actually on fire. We just give them yesterday's data. And we hope no one notices. And that's the important part. We want to minimize the impact of our failures because we're going to have failures. I have carried a physical motorola pager in far too many points of my life. And I hope none of you ever have to do this. But if you don't have tools like these, there's a good chance that you're going to get phone calls or pager duty pages when your models are pushed into production and things start going poorly. One of these things is about, in a little survey of people who use Spark, 36% of people were automatically deploying the results of their pipelines into production. That is terrifying. And 62% were not. That's fine. Even if we're not deploying it automatically into production, it's still important to have validation tools. But they can be a little bit more human in the loop if we need to. The other one here is if we look at this, there's a lot of failures. 14.9% had the output of their Spark job cause a serious production outage. In my personal understanding, a serious production outage is one where I have to update my resume afterwards. And the remaining 32% had an outage they weren't going to get promoted that year. But it's OK. They didn't have to update their resume. And the remaining 52% got lucky or are not working in production. OK. So why don't we test? It's hard. And testing distributed systems is especially hard. And it takes a lot of time. Jupyter notebooks give us extra excuse not to write tests. It's really hard to put my tests in a Jupyter notebook. Why don't we validate? We already tested our code. Our tests are probably good. I spent at least 20 minutes on it that one Friday. And my test coverage is in this double digits. That's pretty awesome. And the other part is that for those of us working in distributed systems, getting the metrics that we would want to use to build validation tools are kind of annoying. And there's a lot of problems with them. OK. There's two cameras. And I see red lights on. So the personal stories are not going to mention any companies by name. But let's just say that occasionally, once upon a time, when I was a younger developer, I may have had a job building recommendation systems. And in many languages, there are some words which have multiple meanings. Some of these meanings may perhaps be innocent. And perhaps some of these words may also have meanings, which I don't want to explain to small children. And this is OK because we had a pretty good idea of what items we probably shouldn't be showing to small children. However, another valued team that we worked with decided to change their data without telling us. And so then I had five minutes to fix a system. It took me 20, but I still had a job at the end of it. But there were probably some very awkward conversations about the multiple meanings of some terms. And yeah, I'll not say them, though, because there's a camera on. Anyways, lots of fun things. That one was probably one of my low points as a software developer, where I'm like, oh, oh, that's sad. Other times, I've broken things that cost a few million dollars. I mean, I don't feel bad about that. It's my employer's fault for hiring me. However, they don't always agree with that. And the last one is more funny, in my opinion. I worked at, sorry, cameras, a small company that was doing local-based search. And every result returned to coffee shop. And the problem was that I like coffee a lot. And so my entire test set, before I put it into production, was does this find coffee shops? And it did. And so I was like, great. And then my boss came to me and was like, holding. I tried to find a steakhouse, but it told me to go to Starbucks. And I pointed out that they had delicious breakfast sandwiches, but he still wanted me to roll back to the previous model. OK. So hopefully, I convinced you that this is worthwhile, if not. I cannot see you leave. The lights are very bright. Don't feel bad about leaving. So it's OK. So how do people get data for their tests? A lot of people write test data by hand. It's really slow. It means, especially if you're doing this, that you're not going to have a good representation of what your actual data looks like. There's going to be some severe limitations. Some people sample their production data. That's pretty good, right? However, there's this thing that in America, we are afraid to say the name of. And my communications people told me to be careful about saying things involving lawyers. But there's reasons that you might not be able to take your production data and copy it into your development environment for testing. So the people who say they use their entire production data for testing, I have some follow-up questions for them that I would only ask them in a phone call, not in an email, because those are subject. Oh, wait. OK, never mind. OK, let's focus on things that don't involve sampling production data. One option is property checking. How many people are familiar with property checking? Yay. OK, there's like eight people. And the rest of you are either all asleep or this is going to be really exciting. So the one that I'm going to show you an example of uses Scala, but there are property checkings in other languages. Hypothesis is one of the Python ones. And it's really cool. What we do is we specify properties about how our code is going to behave, sort of invariance. And then it tries to come up with pathological cases that will break our code for us. So I don't have to spend the time being like, well, I want to try and make a key skew issue here to make sure my code handles key skew. I want to try and create a bunch of null records. Like, the computer will do that for me. And I actually have a testing library that does some of this for Spark users. And we can test with one million records. Dr. Evil was not so popular in Spain. OK, or my jokes are just bad. Fine, either way. So we have some important business logic, genuine business logic, and we assert that our business logic should not lose any data. It's a pretty simple test, but we could do other things. We could assert that we expect some users to be dropped, or we could assert that we expect to not lose any users, depending on what we were doing. And it will generate pathological cases for us. And the cool thing about this is we can use this as a start to think about some of the invariance about our code. And then we can make sure that these invariance are still holding true in production as part of our validation suites. All right, we'll skip this. So let's focus on validation. Yay! So how do we validate our jobs? So I want to be clear, these tools assume that you, at one point, had working software. If your software has always been garbage, it's not really going to catch that for you, right? It has the idea that you're in a good state, and what we want to do is we want to know when we go from the good state to the bad state. Or essentially, when you start McDonald's is perhaps not so good here. I love McDonald's, but maybe we switch from steaks to quarter-pounders with cheese. We would want to know this. And hopefully, this isn't the only thing, right? But it's important. And this is because our pipelines are no longer write-once-run-once. They're write-once-maintained forever. And also inherit that other person's pipeline. And oh, dear God, why am I responsible for this code now? And so this means that we have to keep these systems running and people want updates. I'm really curious how many people have something like this in their code. And what this does is this takes in some data, and it says, hey, can I parse this data? If so, awesome, let's keep it around. If not, whatever, it's fine. We'll just throw it away and we'll pretend that data didn't exist. How many people have some code like this? OK, so one person in your company does some very interesting things with data. And there's another person here, and someone in the back who's waving their arm in a very, what I can only assume, is a plea for help. And so this is really sad, but I think this is very true. Even if you might not do this at the parsing stage, a lot of us have data quality issues, right? Unless we're working on mness.csv, not all of our records are going to be the things that we want to keep around. We're going to have test users. We're going to have users who didn't fill out any profile information, and we're not going to necessarily want to include them. So it's normal that we would want to exclude some users. The only problem comes from when our wonderful friends upstairs maybe decide they want to switch from JSON to CSV, and they don't tell us. So our JSON parser just is like, well, all of these records are bad, but whatever, it's cool. I can train a model on the empty set. And you can. It's not good, but you can. OK, cool. And so we should maybe check to make sure that we have some data to train with, right? And one option would be that we could write a filter, check if it's valid, and then we can check and see what our valid count is and our bad count. And this is a very simple validation rule. It says that if my valid data is less than my bad data, my special business error handling logic goes there, and your Motorola pager goes off, or pager duty, whatever, pager thing. And that's fine. It's just maybe we want to do better than this. And maybe for those of you who work in Spark, you might think that this is inefficient. And that's because this is going to break our pipelining, and it's actually going to trigger events twice. It's going to have some pretty serious performance implications because distributed systems are terrible. And the big thing here is our optimizer can't magically chain everything together anymore because we have these sort of middle steps where we're having to check the counts of things. And we're blocking on this, right? Our pipeline can't progress and move forward. It checks to make sure that we're in a good state first. Also, personally, I prefer flat map as opposed to filters, but whatever. That's fine. So instead, what we can do is when we're doing things like this where we expect some of these things to be in a bad state, but most of it to be in a good state, we can try and keep track of the number of records we're rejecting. Or if we have other things like where we're, say, generating a recommendation table in advance, we can keep track of the number of users who we don't have any recommendations for, right? And if the number of fault-through users that we have no recommendations for, all of a sudden spite, that's a thing that we would want to know about. And we can do this with counters inside of Spark or Beam or Flink. And if you're doing this locally, you can just do this with variables. Variables are so cool. And we can still pretend to have nice functional code. And this part is important to me because I took the introduction to functional programming class, and that one is still with me. And I like to pretend that I'm writing functional code, even when what I'm doing is actually a giant terrible pile of spaghetti code. So this is what it looks like. We put our happy counter and our sad counter, and we go, OK, if we get really sad, we should stop. And the other thing about this is we can put all of these counters in. And then at the end, we can check that the counters make sense. And we can also check that these counters make sense relative to one another. For example, if we had some things where we were pressing our user records, and we had another thing where we were looking at the number of recommendations generated, if there was a really big difference, we could be like, oh, that's weird. Maybe these counters should be more closely related. So right. We can also, if you're lazy, and the idea of rewriting your code to do this sounds terrible, use internal metrics from our systems. And this is better than nothing if I can't convince you to actually instrument your code. At the very least, use the instrumentation layer that's provided to you for free. We can look at the number of records that we've read in and the number of records that we've written out. We can look at how long our jobs are taking to run. If all of a sudden, our model converges really, really quickly, it's probably bad. On the other hand, if our model just isn't converging and it's taking way more iterations, that's also probably bad. And I don't want to push either of those to production until I'm at the office and I've had a cup of coffee, and I'm not getting a phone call at 2 o'clock in the morning. And if you actually do any of these things, there's this JIRA ticket, which means that everything is terrible, but it's OK. Don't worry about it. So let's make some validation rules. What do people do in practice? So I surveyed some people and the results were depressing. Most people only check execution time and record count. They're like, well, my job took two hours to run yesterday. If I take about two hours to run today and I read roughly the same number of records, I'm probably fine. How many people think that you're probably fine if you do that? There is one very, oh, two very optimistic people. That's great. If you believe that, I have many things to sell you. I think it's important that we accept that some of our rules are going to be imperfect. Even if we just did the job execution time and records read count, they would occasionally misfire. We would find ourselves in the situation where maybe we got hundreds of thousands of new users from a promotional campaign. And then it goes, whoa, whoa, whoa, whoa, whoa. The number of records is nowhere near the number of records we were at yesterday. I'm not pushing this model to production. And that's OK. What you do when that happens is you come in the next day, you look at it and you say, yeah, this makes sense. Or ideally, you trick someone else into being responsible for this. And they come in and say, yeah, this makes sense. The other thing is if you start using property checking, you can start to take these property tests and you can add counters for these same things. And then what you do at the end of your pipeline is you check to make sure that these counters represent your property tests accurately. You want to make sure that you weren't losing records between stages unexpectedly. And all of this is, or not all of this, if we're checking if we were losing records, that's not a historical rule. But if we're checking execution time or the previous number of records or the number of recommendations, we often do this based on our historical data. So we save our counters out from the last time we ran the job. And we read them back in. And we make sure that things look OK, right? We make sure that these counter numbers look similar. And you can do this with just like a sliding window of the past K days. If you're in retail or a cyclic industry where you have like a quarter to quarter things, you're going to need slightly more complicated models that look a year back to see what stuff looks like. Or, well, you might not need them, but you probably will. And a lot of this, unfortunately, comes down to domain specific solutions. If you want to go beyond just counters and execution time, it often involves understanding your problem, which I know as programmers we hate. So we can turn our property tests into validation rules. Don't write the asserts as asserts, right? That's bad. But save yourself a note and exit within minus one so that your pipeline doesn't keep working, right? But the data is there for you to inspect and continue from later. So input schema validation, really important. A lot of businesses do have tools to make sure that the schema hasn't changed. But what I've noticed is what that just means is that people don't formally change the schema. They just stop filling in a field when they're tired. Because they're like, well, if I change the schema, there's this process. It's going to take weeks. But null is a valid record. And really, my downstream consumers should be expecting null. And it is Friday, and I would like to go home. And so this can only get us so far. We can check that the types look like what we expect. But that's not going to be enough. OK. Here's similar rules. And if anyone wants, I do have a project called Spark Validator, which exposes tools for making rules like these in Spark. But you can do similar things in other systems as well. You can validate absolute numbers, like the number of records read should be between two random constants. That's probably not actually a good validation rule. The number of records read is going to change over time, but be similar to the previous runs. And in Beam, for those of you who use Beam, or for those of you who just don't trust me, when I say multiple systems expose similar concepts, they have metrics counters. And we can keep track of the number of matchwords or unmatchwords. Or we can essentially keep track of the number of successes and failures for different parts of our pipeline. AKA, we call Fortran, and we log the number of segfaults we get. And this is what it looks like. Yay, metrics. That being said, if you do this in Beam, if you switch your execution engine, which you can do in Beam, the behavior of your metrics collection is going to change, because the metrics collection is left up to the runner. So just be careful with that, but it's OK. The other option is you could be like, wow, this sounds like a lot of work. My boss told me that I need a machine learning model by Tuesday. And my introduction to TensorFlow Book really does not cover a lot about this. And so the nice thing is there's tools from other people who have had to solve these problems. TF data validation is a tool from the TFX TensorFlow collection of tools. And we can see that we can do things where we generate statistics from our new data. And we can compare our statistics with a previous schema. And we can look and see if there's things which look wrong. And we can validate those manually before we push it to production. Hopefully, please just don't push your models to production without checking them. I wish people would stop doing that. Anyways, so this is a cool tool if you don't want to write your own stuff. And honestly, you don't even have to use TensorFlow to do this, right? You could have two separate jobs that run in parallel. And you could use the TensorFlow data validation stuff as a separate job, while you've got your own classic ETL, aka Salesforce. And it lives over here, and you've got your data validation code over here. And if your data validation code passes, this runs through. And then if your data validation code doesn't pass, you'll stop. And that's nice, because then you don't have to change your existing code. That being said, that will only catch data errors that won't catch errors in your own code. But of course, who would write errors in their own code? It's simple. We just don't make bugs. OK, so we can also look at the percentage of data that's changed, right? And we can do this on our inputs or outputs, right? If we have large shifts in the percentage of records that are different in our inputs or outputs from day to day, this is probably the sign that someone has decided to do a schema migration without telling us. And that's a good thing to know, so that we don't push the model to production until we track them down and find out what the schema migration was they did. It's kind of expensive. I mean, compared to the cost of recommending adult items to minors, not that expensive. But it's expensive compared to the cost of running my pipeline. So yeah, this will take some time. But you can do this as a separate parallel job, right? This doesn't have to be in line with your main job, right? So that's great. Data changes. We can catch those, right? That's good. But the problem is that software changes, too. And sometimes it's not even our fault that the software is changing, right? Maybe someone decides to upgrade so I could learn across the cluster without telling us. I've had that happen. It's great. And there's this really lovely talk from PyData London where she showed that sklearn had vastly different results just by changing the sklearn version that was used to train. And none of the parameters were changed. None of the data was changed. The results were completely different. And so if someone helpfully upgrades some of your dependencies without telling you, or if you upgrade your dependencies intentionally, it's important that you validate that your model behaves similarly to yesterday's model. The nice thing about this is that you can hopefully run the two side by side on the same data and make sure that you're getting the same results or you're getting the differences that you expect. And you could automate that if you have to upgrade your code very frequently. If it's a thing that you do infrequently, you can just eyeball it occasionally and hope for the best. Good software engineering practices right there. OK, cool. If you're working specifically on machine learning tools, we get some extra things that we can look at. Some of the things which become less of a good indicator are the output size. If I'm outputting a linear regression model, the size of my linear regression model is probably not a great indicator of the quality of my data. It probably doesn't vary a lot. Now, if I lose all of my features, then yes, I will notice. And that's a good thing to notice. But there are a lot of other situations which just won't have that problem. Yeah. So traditionally, humans decided it was time to update their models. They would go read a run book. Human would kick off their update. They would deploy a model to a small percentage of their users. They would validate that no one caught on fire. And they would deploy their model into real production. In practice, your stakeholders would find you, force you to update a model after you've been dodging them for a few quarters. They point out that this is really important this time. And performance review season is only a few months away. And then so you spend a few hours trying to remember where the guide is, or alternatively, where your wonderful co-worker who has since departed left the guide, hopefully just departed your company. And eventually, you realize that you forgot to write the guide. And so you just kind of wing it. You've got this untitled underscore 5.iPython notebook. And you think it's the right one because untitled underscore 6 had a lot of stack traces in it. So you're pretty sure it's underscore 5. So you run that. And it produces a model. And so you go, well, it's a model. We're good. And then you put that into production. And then you just kind of go fuck around for a while. And then you press deploy. And later on, maybe you get a phone call about the quality of your recommendations. And if you're lucky, it's that the, good job. If you're not so lucky, you have an exciting opportunity to learn how to do a rollback in your deployment system. OK, cool. So the other one is we could take the human out of the loop. Part of the problem why the human just goes to read Reddit or Hacker News is that most of the times, things are OK. And when we give humans a kind of boring job to do repeatedly that mostly doesn't need to be done, they just stop doing it. Because that looks pretty much like they're doing their job and they still get paid. So instead, we could have our software look at the graphs. Because at the end of the day, what we're doing when we deploy these small canary or 1% AB tests is we're just checking to see if our metrics are different for the users which we're introducing to the new model. And if they're different in good ways or in bad ways. And the cool thing about this is computers can understand numbers. Very exciting. And so we can have a robot roll it back and your pager goes off. And then your human will press override and deploy anyways because they were cornered by the stakeholders that have said that you haven't had a successful deploy for six months. It's really time that you have a new model. And at that point, your classic problems come back. But some of the time, this will work and be happy. All right, some people do fix test set performance tests. And this works great provided that the world doesn't change. But provided that the world doesn't change, you probably don't need to update your model anyways. And so the problem is your fixed test set is essentially going to get out of date over time and no longer be a good representation. I'm not saying don't do this. I'm just saying this shouldn't be the only thing that you rely on. It can be part of your toolkit, but not everything. Yes, cross validation. Yay. You should do this, please. One of the things which I noticed which terrified me is some really lovely people who were like, well, machine learning is kind of hard. And there's all of these weights. And I don't know what I should pick. So I'm going to use hyperparameter tuning. And the way I'm going to pick these weights is with cross validation. That's pretty fine. The only problem is, though, then they took their cross validation results from their hyperparameter tuning and was like, yeah, this is how good my model is. I'm like, huh. Do you see that we may have just moved our overfitting into the hyperparameters per chance? Maybe we should do something else? So if you are going to do this, Ford Pinto's didn't make it out here, did they? Renault Reliance? Do you have the three-wheeled car that falls over? Or is that just Don Mr. Bean? OK. America, being America, made a lovely car which had this wonderful habit of catching on fire. It was called the Ford Pinto. And essentially, if you use cross validation to fit your hyperparameters and then use those same numbers to tell you how good your models are, you have built a modern Ford Pinto. It will eventually catch on fire. But you might change jobs before anyone notices it's your fault, so it could be OK. OK, right. So we can get a false sense of security from doing this. I've got six minutes left, so I'm going to end with telling you not to be evil. I don't have any tools for you not to be evil, but there are a lot of really wonderful tools on the internet to help you not be evil. There are also a lot of tools on the internet that will help you be evil. Please don't use those. Yeah, in six minutes, I can't explain how to avoid bias in machine learning, but you should definitely go to some of the talks that are about that or failing that search on the internet. Serving everything is bad. This is part of the problem when you need to do your rollback. This can be kind of painful. Hopefully, your company already has a system that you can just hook into and use. Otherwise, there's a few different ones you can try and integrate into these pipelines. I'm trying to get Kubeflow to work with Spark. I have a PR out for it, but the tests don't pass. And for some reason, they don't want to merge it without tests. Or yeah, sorry, it's not the tests don't pass. I didn't write any tests. I have like a shell script called testit.sh, but that wasn't good enough. So that'll take a little bit longer. OK, yeah, updating your model, the real world changes. For those of you who do online machine learning, I am terrified. It's a really great way to just have your models degrade really quickly with no checks. Please periodically check that your models are doing the right thing, even if you're doing online learning. Just because you don't have a separate deploy phase doesn't mean you don't need to validate your models. You still do. Yeah. So in conclusion, your validation rules don't have to be perfect. You definitely don't want to pick ones that are alerting every day because then the humans will just press override every morning with their cup of coffee. You want it to alert maybe once a month so that the human comes in and is like, oh, last time this happened, we were going to do something really bad, and I should check this time. Try and make your validation rules specific and actionable. Number of input rows changed is not a great message. Table x, y, z, where it's the table you're pulling from, grew unexpectedly by y percent. Historical growth day over day is z percent is a much better, more actionable message that tells the user like, hey, this is the thing I want you to investigate, and this is why I want you to investigate it. And yeah, if you really want to do all of this with testing and you don't want to do validation, please add junk records to your tests and make sure that they're behaving well. OK. There's a bunch of related talks. If you're not testing your code, please for the love of God, test your code. There are a lot of resources. Jupyter notebooks are not a valid excuse not to test your code. OK. We'll leave it on. Oh, right. Sorry, I forgot the second most important slide. This is a bunch of books about Spark. I make money indirectly off of Spark, but I make money very directly off of this book. This is the most important slide. A high-performance Spark. It's available today. It is unrelated to the content presented here today, but that should not stop you from buying several copies. Indeed, cats love the box that it comes in, although I do encourage you to buy both the print and the ebook copy because I do get double royalties on the ebook. So good. The other project that I have that I want to be clear is not a joke because many people laugh is Distributed Computing for Kids. It's an introduction to Apache Spark for ages eight and up. And you can sign up for the mailing list at DistributedComputingForKids.com. I assure you, this is not a joke. The garden gnomes are going to go on a magical adventure involving word count and Instagram. I think it's amazing. But it's also my project. OK, right. I have another talk tomorrow at 4.10. If anyone wants to go to San Francisco on a Friday, it's probably a bit late to book your tickets, but I'll be there on Saturday. And if you're tired of nice weather and you want some more rain, I'll be in London in December. And yeah, cool. That's pretty much it. OK, thanks. Bye.