 Hello. Welcome everybody. Thank you for joining us. Today we're going to talk about some of the operation and scaling experiences that we have at Word Day, running from 50,000 cores all the way to 300,000 scene production. But first, let me introduce myself and the rest of the folks that we're going to be sharing the experiences today. My name is Edgar Magana. I'm a Cloud Operations Architect for Word Day, and I will let our panel to introduce themselves. Hi, I'm MTS Chaudhury, and today I'll talk about our architectural evolution as we support an increased workload on Word.Decloud. I'm Howard Abrams. Instrumentation has been a key focus in my work at Word Day. Hi, I'm Kyle Jorgensen. I'm going to talk about some image distribution challenges. Hi, I'm Sergio Di Carvalho. And I'll be talking about some interesting challenges we had scaling our APIs. So we actually have the core team of what we call the Project WPC, Word Day Private Cloud. So if there are some questions after the session, please reach out to us. So first, I would like just to do a quick introduction about Word Day. Word Day is a software as a service company. Everything is in the cloud. We provide enterprise cloud applications for finance and human capital management, as well as payroll, student systems, analytics, and more applications that we keep developing. So we would like to talk a little bit about our story. This is not the first time we participate in the summits. We have shared the experiences before. So everything started back in 2013, when the company made the investment to actually create an engineering team dedicated to build an opening infrastructure, let's use the new technology with OpenStack. So we started using OpenStack POC deployments. In 2014, actually the mid-2014, we formally started the process to go to production with OpenStack. We have fighter centers, so we wanted to deploy everywhere. So we wanted to have not just one successful deployment, and that's it. We wanted to have multiple deployments, all idempotence, and everything automated. So we were just in check for automation and configuration management. So we started developing some of the applications that we were running on bare metal in production. We've moved into virtualize in OpenStack in some of the data centers in 2015, 2016. We successfully went to production and we started integrating applications from the old bare metal systems all the way to the virtualize environment with OpenStack. In 2017, we decided to get a new version of OpenStack, a new version of CentOSOS, et cetera. So and we start scaling up because there were more and more services to go into production. Finally, in 2019, we're going to have 50% of all production workloads from WordAid into these OpenStack power clouds. So the growing in terms of services is being directly to this growing of the company, right? Our revenue, it's because we're getting more and more customers, obviously, and therefore we need to get more and more services as a services instance out in our cloud. So this graph just shows how these two things are related. In summary, we have five data centers, three in America, two in Europe. We have, in total, 45 OpenStack clusters. We have more than 4,000 compute holes, hypervisors, 300 plus 300,000 cores in production, 22,000 running VMs. All these numbers, obviously, keep changing and growing. And we have a total of more than 4,000 active VMs images. And it is important for us to talk a little bit because the image is because it's part of our use case. So we're not in a cloud provider system, right? Our virtual instances is for running our own application. So therefore we want to have the concept of immutable images. Once we have an image created, tested, validated, promoted to production, nothing changes in that image until it ends its own cycle. The cycle is every weekend. So every weekend we release a new version of our software for all the applications. So we have to destroy the systems that are recreated with the new version. And this needs to happen in a very short period of time just to accomplish with our SLA, with our customers. So with that, I would like to invite Imgis to start talking about how the architecture of OpenStack will change along this journey. Thanks, Edgar. So we started our OpenStack initial deployments with a simple architecture. We had a single OpenStack controller and a single software defined network or SDN controller. We used Juniper's OpenContrail. And this architecture has so far served us well, even though it's very simple as an operator getting into a trying out OpenStack. It's a good idea to start simple, prove that model before you try something more complex. And we still have OpenStack clusters that are running with this thing. But at some point, this cluster or this architecture does not support when you go beyond a certain scale, which brings us to what are some of the key drivers that we need that required a new architecture. The main, as Edgar pointed out, was our scalability. In the last two years, we are almost quadrupling our footprint of compute nodes. That's 400% just the number of compute nodes. And it's not just the number, the capacity, but it also increased the amount of API traffic that we serve during a very short timeframe. The other requirement that we had was high availability of our control plane. And we wanted to make sure that the control plane is always available, especially during the maintenance time. And the other third requirement or driver was that zero downtime upgrade for the OpenStack and SDN control planes. So when other applications want to do their maintenance or upgrade, our API services should be always available. And what that means is redesigning that initial architecture that we had. To support high availability, we obviously have to have every services replicated. So you put multiple controllers and you put a load balancer in front of it. We are using HAProxy for our load balancer. And we also have two of those. And we use keep alive D to make sure if one dies, there's another one as a backup. And to make upgrades seamless, like to go from one OpenStack version to another or other things, we found it easier to separate the stateful services from the stateless services. So what that means is you can easily take down one of the OpenStack controller or SDN controller without having any impact. And not only that, you can bring back another version pretty easily. If we had the database and the stateful and stateless on the same place, it would have been a little more complicated because then you have to take care of the database backup. So as you can see, this is, and we put all the stateful services, which in this case, MariaDB with Galera, ZooKeeper and Cassandra, those are used by Contrail and RabbitMQ, they're taken out separately and they go through a different mechanism for upgrades and all the states are separated. So with that, yes, I wanted to emphasize that this is not an architecture that's set in stone, as we are like trying out and building more workload, we are continuously evolving and our monitoring and logging system lets us see that how efficient this architecture is. So this architecture is good for our use case, but it is constantly evolving. So I would like just to say that with this architecture, how often do you think you can scale the stateless API services? How often we can? The stateless API services, so we can actually horizontally scale them, right? Yeah, so if we need to increase our, like if the API workload increases, we can always add additional controllers. It's very easy, we can just like, and we use Chef for deployment and there are roles, all you need to do is like bring up another server, like apply the role, there should be another, like you can horizontally scale up or down very easily. So with that, I will assume, right? So now you get a big picture, we have a lot of servers, a lot of infrastructure, and we need to keep an eye on that. So now Howard, would you explain us how we do the monitoring and logging on all these clusters? Yeah, as we reflected over the challenges that we had and what kind of tools and features helped us achieve our project, we realized that instrumentation was a key thing and I'm assuming that this is going to be somewhat redundant as most of you who have deployed OpenStack probably has quite a bit of instrumentation. But we've got some special challenges in that we don't have access to our production systems. Everything has to be completely automated and yeah, full automation. We also have to collect all the logs, we don't have access to those files, so they have to be collected so that we can filter them later on. We also have some of those challenges from customers like we wanted to do more than just say something's wrong, we also wanted to see hey, it's slow, what do you mean by that? So our architecture is slightly different in that each OpenStack node, we collect all of the logs from the system and ship them over to our own internal HAL stack. The reason why we are using our own is that we can actually take the logs not only from our OpenStack controllers and compute nodes, but we can also take them from the BM so that we can kind of correlate those together. We also run a number of send-suit checks, each of these checks we ship out. The alerts we grab and use a product called Big Panda which helps us with correlating those incidents and shipping them around to whoever on call. And then we also strip out all the metrics and we use a product called Wavefront to do the graphing visualization. Now for each of our issues that we run across, as a good developer you fix it, you write your unit test and integration test, but we also got into the habit of writing a check for each problem that we had so that we could alert if it happened again. This helped us out a couple of times when eight months after a bug we get alert going hey, something's wrong, but now our alerts go to Slack, it goes to all of our including pager duty which we're all really excited about. We have this check, we like to call it Dr. Hibbert and he'll give us a lollipop when we're all done, if it works, but it basically emulates what our customers do. So the details aren't important to anyone here, but it is just we try to be what our customers are. We run this test on each compute node in each cluster in each data center and then we just go and do it again. If it fails, we get a message on Slack which gives us links to some of the details. Those details can then be sent over, sorry, having a little technical difficulty here, can get sent to our on-call personnel so they have all the details that they need to address the thing, but one key thing is we also like to give ourselves hints on how to go search the logging system. Obviously, you've got a logging system where you can start to add lots of quarries to filter and look for that needle in your pile of needles, right? Now, you also want to extend this and go from just like is something dead to is something on its way to being dead or if it feels like dancing, right? There we go. We also realize that we needed to do a be able to flexibly create a number of different metrics and graphs from those metrics very quickly. For instance, if we wanted, in this case, this is from a colleague of mine who went through and was trying to evaluate whether we would use multiple processes for HA proxy and we were correlating these not just between HA proxy stats, but also from other metrics that we were getting and being able to kind of line them up to see what would happen. We also would get vague requirements like something's running slowly. Well, what's running slowly between these two clusters? They're supposed to be exactly the same. All the metrics that we would start to put together look the same, but as we started to kind of put them together, we could kind of scroll through until we could kind of realize that, hey, there is a difference. What's up with this guy? So to be able to kind of create these dashboards very quickly was very helpful. So, you know, we realize you just can't scale if we can't tweak the system. Basically, scaling at this level is like putting band-aids on a thousand different paper cuts. You're just constantly addressing these little guys and you really need monitoring for all this and also use something like monastic or something where you can query all the logs that you have across your system. Also, we realized that investing in a good visualization tool was quite helpful, where we could create not only focus graphs that we could keep under version control, but also to create things just at the spur of the moment to address particular issues. And yeah, there's also a lot of good blogs online on how to monitor particular services that OpenStack depends on. Thanks, Howard. So now, you know, you know, the architecture, you know, the evolution, you know, how we're doing and monitoring. So let's talk a little bit about the use cases, right? We have immutable images and we have to create these images along the week, which is a new version of our software and then pushing the production in a very short period of time. So, Carl, walk us through the challenges during these use cases. Right. So as Workday has scaled, our private cloud has scaled as well, and some of these challenges related to scale are related to images. So just to remind you again about our use case, we have sort of a unique control plane usage on our private cloud. We have this sort of narrow update window. This graph is showing basically API usage throughout the week. This is over a seven-day period. So it's mostly idle, and then we have this big spike in the middle. This one example I'm showing is Nova's scheduler API response time, but it's really representative of all of our OpenStack APIs throughout this time period. So I just kind of picked it arbitrarily. One of my colleagues likes to refer to this as the thundering herd that basically pounds our control plane once a week. The reason for that again is, as you can see, this spike correlates to this spot in the seven-day period where we destroy all of our VMs and we have to create them again with new images. So these horizontal lines are the different VM types and the count that as they are existing. So we don't kind of have a distributed sort of workload. This is kind of how Workday does it right now. You're probably wondering, why don't we spread it out a bit? Well, the short answer is like Workday is moving towards that, but from our perspective as cloud providers for Workday, we have to deal with the usage of our customers. So this is what presents some of the unique challenges right now. So not only this short period where we have to create and delete a lot of VMs, but we have large images. The worst offender right now is about six gigabytes in size and we have around 1700 of them deployed across our data centers. So that's about a few hundred per data center of this image type. So combining these two things together results in a bottleneck. So the problem is we have our glance endpoint on the open stack controller and we have potentially hundreds of compute nodes. Then during the VM boot process, there is the image download step. So each of these compute hosts has the cache and basically as you boot the VMs, the images first need to be downloaded into the cache. Use this dot to represent the images. Basically once the images are downloaded, then the VMs can be booted from them. So because of our use case, we have to do this download on hundreds of hypervisors for hundreds to thousands of VMs every week. So as you can kind of picture here, there's a huge bottleneck at the glance endpoint where we have to do all these downloads to all these different endpoints at the same time. The end result is very, very slow VM boot times. We have some deployment automation that actually times out after a certain period and some of these very slow image downloads were causing the overall VM boot times to reach hundreds of seconds and we're getting timeouts and basically customers were unhappy. Now we realized that basically one of the optimizations is that of course Nova has this cache on the hypervisors. So once it's in the cache, the subsequent boots are not too bad. So the optimization that we thought of is okay, what if we can put the image into the cache ahead of time? And because Workday, we know when our update window is going to be and we actually know what the images that we're going to use on a given week are, what can we do to like kind of combine those together? So our solution, we call this like image prefetching, putting the image into the cache ahead of time. We decided to extend the Nova API. This is actually not a brand new idea. It was out there. There were some blueprints, Nova blueprints in the community from a few years ago, but it never got implemented. So we weren't able to leverage anything from the community. So we decided to do it ourselves. So we call it image prefetch. This is sort of the workflow from an operator perspective. You submit a post request with the image that you'd like to prefetch across your cluster. And this actually goes and talks to a lot of different components within Nova. So goes into the Nova API, talks to Nova Compute, Nova Conductor, the database API, and the Lidvert D drivers on the compute host themselves. So this is non-trivial to implement. You've got to touch a lot of different pieces of the Nova code. The Nova API responds and we call it a job ID. Since this is an asynchronous operation happening in the background, basically we return the job ID to the user, and then you can perform it get on the image ID or the job ID. And we're tracking the process of this prefetch operation in the database. So we get the basically status from the database, return that back to the user. And in this example, let's say we have 10 compute hosts, okay, it's been prefetched on five of them, something like that. So the operator can know what's the status of this. And this kind of works for us because, like I said, we know our schedules and we kind of know our images ahead of time. So we go ahead and prefetch them. The before and after was clearly very big. So before, like I mentioned, we had Glance API response times spiking up into about 600 seconds. So that's 10 minutes simply to download really big images through the bottleneck. Afterwards, we're using the cache 100% of the time for these very large images. So we, on average, reduce the boot time by 300 seconds around there. And we decreased the failure rate, which made our customers more happy. Another kind of related but different problem is once we evolved our cloud to the HA architecture, we kind of indirectly ended up moving the bottleneck to the load balancer. So regardless of whether you're using, like, our prefetch operation or not, you have to download the images before you boot the VMs. So with the HA architecture, everything's going through the load balancer. The request to download the image comes in, goes to some Glance endpoint, response goes back, and goes back to the compute node. We realized this was actually pretty slow, or relatively slow, and we had to kind of dig it and figure out what's going on here. We realized it's actually the kind of back and forth through the load balancer, which caused it to be slow. So an optimization that we made here was, since we're running Apache in front of the Glance APIs on our open-set controllers, we can easily configure a redirect so that when these requests come in, we send a redirect back to the compute host and we say, okay, don't go back through the load balancer, go and communicate directly to the controller and download the image from the controller rate to the compute host. And you can imagine with many compute nodes going on at the same time, and then they can kind of, we can distribute the load across the multiple Glance endpoints. Now, this is actually supported in Nova configuration, these direct image downloads, but it's not quite as robust. It doesn't have the same failure detection mechanisms. So that's why for our use case, it made sense to use Glance, because it was easy to configure in Glance to set up this redirect for the image downloads. And some numbers to back it up. We did some tests basically with 20, 40, 60, or 80 concurrent image downloads with and without HA proxy. And in the worst case scenario, basically, it's six times slower going with the image downloads back through HA proxy. So the key takeaway is basically under heavy load downloading images can be a bottleneck. If there's any Nova developers out there, we'd be happy to contribute this API back to the community. With HA, there's some tradeoffs, you kind of realize we implemented the HA, put the load balancer, and then we kind of realize, oh, wait, maybe it doesn't make sense in all cases. So we had to kind of do a slight architecture change to avoid the HA proxy bottleneck. And again, like Howard mentioned, these API specific monitoring allowed us to get these unique insights and really dive into each API and figure out why certain times are slower than others. Thanks, guy. And as you can see, right, so there are some work that we're doing. We don't have the time to just, you know, wait for the patches to happen. So we need to do it and wait for the upstream work to happen. But I'm glad that actually you mentioned about the back to the community, because we've been working with the community to get these back to them. So without, now it's coming about, like, you know, talking about the images of specific use case, let's extend the use case about, like, how much pressure we're putting in the API, right? How much do you need to do it in a very short period of time, exactly for our use cases. So Sergio, could you talk about that? Sure. Thanks, Arigato. I'm basically the last thing between you and your lunch, so I'm going to try to be quick. So yeah, so over the years we had to deal with a lot of scaling issues, and I'm just going to talk about one particular case where we identified as we're booting 200 VMs in the same cluster, some of the APIs were getting really slow. And this is an example here of the Nova metadata API in one of these clusters. And you can see the average response time reaching about 14 seconds, which is quite a long time, if you think each VM will make about 20, 25 requests for metadata as it boots. So this is clearly a bottleneck. And luckily with all these metrics that Howard showed and all these tools we have, we were able to identify a lot of possible reasons for that. And one of them was database transfer rate. And now the orange line there is showing megabytes per second. And at the worst cases we're reaching about one gigabyte per second. And this is kind of strange because if you look at what the APIs actually served, not a lot of amount of data. We are not transferring images here, just really APIs. So this looked a bit suspicious. And so then we started to, hello. There you go. So since it looked like a database bottleneck, we turned on a performance schema in MySQL, which is a feature that allows you to track what are the queries that are causing more trouble. And we identified this query here as the top query by Rose Scent. And the difference to the second heaviest query was striking. It was definitely something there. I'm not sure how familiar you guys are with the Nova database and the tables. But maybe anybody has a guess what's the problem with this query here? Yeah. So if you look at the model in Nova, you have the instance table, which has basic stuff like flavors and honors. And then you have two metadata tables, instance metadata and instance system metadata. Instance metadata is more like what users add when the VM boots up. And instance system metadata gets a lot of metadata from the image that you use to boot that instance. And if you do joins like this, what you're going to get is a project of those metadata tables. So every row of instance metadata was being paired with every row of instance system metadata. So that kind of looked, okay, that's a bit strange. But it kind of shows you how you're producing that much data. In our case, VMs have about 50 rows in instance metadata and 50 rows in instance system metadata. And the clicker stopped working. Okay. So you would expect about 100 rows being fetched by each instance, right? But in actual factor, what we were seeing is about 2,500 rows, which kind of shows you that what the problem was. Luckily, with the help of the OpenStack community, which I have to say, we have really great people out there. And we talked to a few people on the OpenStack mailing list and they confirmed that, yeah, this is not supposed to be like that. It's a problem with the object relational mapping. So we filed a bug and this bug actually affects every Nova service since Mitaka. So even if you're running Rocky or Stein, you're possibly affected by this. So basically every single OpenStack deployment out there is sending gigabytes of nonsense data. Well, it depends on your case. And I'm going to explain why this might not be a problem for you. In our case, it was a problem. And we had to look at what can we do to avoid this and looking at the Nova code base, we came across this commit from February 16 for the Mitaka release. And this is what, you know, before this, the instance data, the basic instance data was being fetched first and then the metadata API server would realize that it needs metadata and go and fetch those tables separately. But with this commit, because the developers realized, well, you need the metadata tables in most of the cases, so let's just fetch them all the time. And but, you know, because we have this object relational model not really very accurate, then we introduce the problem here. And so, sorry, the good thing about this commit is that we, oh, sorry. Yeah, thanks. The good thing about this commit is that he gave us a very quick fix, right? So we could just roll back this commit to the older behavior and avoid this product of metadata tables. Of course, we had, you know, more queries executing, but because we are not producing that much amount of data, we thought it could be better. And indeed, our results showed that it was significantly better. So on the left there, I have the sort of a baseline test we reproduced in our development environment. And as you can see, we're not reaching the same levels of production load. So the amount of, so it's a boot in 200 VMs, but we're only getting about 700 megabytes per second from the database. And the API response time is not that bad. It's only 2.2 second in the worst case. But we found that this scenario was pretty good to, you know, test different options. And on the right I have, so on the left is the production code. And on the right is just reverting the commit I showed in the previous slide. And as you can see, we have the, actually less than half because the area underneath this graph is the total amount of data that you're pushing. So it's significantly less data being fetched from the database and also response time much improved. Okay. Okay. That looked good, but we still not quite there yet. And so we started to look, okay, what else can we do to make this better? And then we started to look, okay, how does this metadata requests are being processed? And as I said before, a VM comes up and it will start firing a lot of metadata requests, metadata requests go through the HAProxy to all the API nodes. And sorry. I should have trained this pointer before it. So for every request, the API servers were actually going to the database all the time. And of course, this creates a massive bottleneck. And anyone who has run open stocking production node, that's a very, very simple fix for this. And it's called memcached. And it's a bit funny because we've been using memcached for years in our classes, but this was a new set of clusters that rolled out to production and without memcached. And this kind of tells you the story. Open stock is a massive code base. There's lots of configuration settings. And depending on how you deploy, it's very easy to forget those little things. And we've got a production cluster taking production workloads without memcached. And this was obviously not good. So then we went back to the development environment and reproduced the tests with memcached enabled. So again, on the left is the baseline test that I showed before. And then on the right is the same test with memcached enabled. So as you can see, the database transfer rate was going down as well, not as low as before with the reverting the metadata preload, but still pretty good. And response time was actually excellent because of course, a lot of those requests were not going to the database at all. It's just been served from the cache. So that's good. We found two very easy fixes. And now the question was, how do they look together? So on the left is the results I showed before with reverting the metadata preload. And in the middle graph is the test running with memcached enabled. And then on the right, I have a graph showing the results of having both fixes in our test. And as you can see, it's still a pretty good improvement. You might not look very impressive when you're talking about 0.05 improvement, but percentage wise, this was actually quite significant because in a real production environment there's a lot more going on. So any improvement that you make here will be amplified. So that's a kind of a story that tells you a lot of oftentimes you have a production problem, and you think there's several possible issues. And none of them are that very impactful, but when you put them together, it creates a massive fire. So we started identifying a very heavy SQL query that was producing the product of metadata tables. We realized memcached was not enabled somehow. And we identified bottlenecks that could be causing the trouble as well in our HG architecture. But also in our very specific case of workday, our VMs had lots of metadata. We boot all of those VMs at the same time. And also not only in the number of properties on every instance, but some of the properties were huge. And because you have this product of metadata tables, and you have one column that has 65k, that gets 65k times 50 on the other side. So the thing is just getting amplified. And in our case so far, we rolled out on this cluster two fixes, one rolling back the preload of metadata, which is a simple two-line code change. We enable memcached very quickly with a three-line configuration change. And there's still stuff to do. I mean, this is not an end game here. As the clusters scale even more, we're going to have to push more and more fixes and improvements. And we are looking at different things that we could do in terms of improving the HG architecture. But also talking to users and see, do you actually need that much metadata? Is this the right place for the data you have? So it kind of tells you the story that you have to look at the whole picture, not only the specifics of one SQL query or a configuration change, but how the system be used and to tackle that case. So that's kind of it. Awesome. I guess everybody should be hungry. So as you can see as a summary, we defined the right architecture for our just cases. And then having in place the right login monitoring helped us to actually identify the things, fixes, and actually our rollout to production is very quickly because a continuous integration, continuous development, employment, infrastructure that we put in place. So we have time for one or two questions. So you need to walk into the micro here. You want to... So my question is with regard to the first suggestion where the suggestion was to segregate the stateless services and the stateful services. So when you say segregate, you mean by using NOVA cells or what's the mechanism of segregation? No, we are not using NOVA cells yet. Here the segregation was like the persistent or stateful. So RabbitMQ, for example, has some state information. Database obviously has state information. So all the databases, we consolidated it in one server and we took the other services, like NOVA API, Glance API, they run on one server, the stateless services because you can destroy, bring up another one easily. But MySQL, if you destroy, you need to get it back up from a backup. So that's what I mean by the stateful and stateless. Thank you. I have a question about the monitoring system. Can you give us some detail about your SKU? How many incidents and how many metrics are you dealing with currently? Lots. I guess I should have expected that and taken a look at it. But yeah, we've got... Let's see. We've got thousands and we're adding more as much as we can. Yeah. What do you think is the biggest challenge of the moving from the small scale to the larger scale regarding to the monitoring system? What's the biggest challenge? Yeah, the big challenge has been figuring out all the moving parts and which ones are, like I say, broken or sick and determining what's sick has been kind of a challenge because some things, big load comes in and it spins up more threads. It starts to be able to handle it. Is that something that you care about? Probably not. That's what Rabbit and other things are designed to do. What you're trying to look for is that basically the rate of change. So a lot of things that we've been doing is checking to see is the rate going up too fast from what we expect, not just a steady increase to some new level. Those kind of things, it just takes time to kind of look at it and reevaluate what's going on. It helps when something does eventually die and you can kind of look backwards, obviously. But yeah. Actually, one other thing there is before we used to have five, ten clusters and now it's 45 different data centers and when something happens and somebody pangs you and says, oh, there's a problem in Dublin and you're like, okay, which of the clusters in Dublin? And the person doesn't know even, oh, I have the host here. I don't know what cluster is that affecting. So keeping track of all the clusters and keeping track of all the issues, it's become a real issue. And just to provide a little bit more on the picture, obviously you have different levels of severity for those issues, right? You guys, you can agree or not, but once we have things in production, so once we complete that patch windows that maintenance windows, actually things are very rock solid. They keep up and running until the Friday. And maybe during the Friday when we tried to kill all those thousands of thousands of EMS, maybe one failed, so there's some glimpse here and there, but we can actually figure out things. On production systems, we've actually been very, very happy so far. You know, another one of the challenges has been once we've collected, you know, a few thousand metrics, realizing which ones are actually important because they're not all important and actually creating, getting a little bit more focused, like, okay, weed out all the things that are here, just these are the key metrics that we care about. So, one more time, so thank you everybody for joining us. If you want to follow up with us, so we're going to be around to there tomorrow, so thank you. Thank you, everybody. Thank you.