 Sorry about the short interruption. So I'd like to give a bit of motivation for the speech, for the talk today, before we dive into the nuts and bolts about what the CFS acceptance tests are and why one would want to run them. When we are doing development on the core software features, on these teams, we build various tools to facilitate our development. And to do the testing for these components and validate that they work. And one of our goals, which we aspire to and which we sometimes even meet, is to make it easier for people in the community to contribute back to the Cloud Foundry ecosystem and submit pull requests. So one of our goals is to make it easier for you to submit a pull request and validate that your software works. Because Cloud Foundry is a rather large piece of software and it can be rather complicated. So this is a great talk to be in if you want to contribute. Maybe you've gone to the Cloud Foundry GitHub pages and you've found a component and you've done some development with it, you've played with it a bit. Maybe you've put it on your local Cloud Foundry and you want to take that next step and contribute it back. This is a talk about the CF acceptance tests. It's also, you'll get some information if you're interested in learning a bit more how about CF releases are validated before we push them out into the world. And why are we talking about cats specifically, though? We chose to talk about cats because cats is a particular pain point for us in our development process. They're very slow to run. And we'll get into a bit of why that is. And so oftentimes, if you want to validate that your foundation is in a good state, it can take anywhere from half a minute to an hour and a half, which is a painfully, painfully slow development cycle for you to make a change and then wait for that long time in order to get a cat run. And sometimes, if the way cats is configured and the way your Cloud Foundry is configured are not in alignment, you can get test flakes, which is where tests fail spuriously for reasons unrelated to the correctness of software. And that can be very painful, too, as it increases the amount of time that you have to spend validating your releases. So to give a high-level overview of the talk, we're going to start out with a summary of what cats is, what is intended to do, and what it is not intended to do. Then my colleague, Mike Chu, is going to get into a bit of the nuts and bolts about how to set it up, how to troubleshoot your cats when they go wrong. And then finally, we're going to finish with a short demo where we're going to take a pull request that was actually contributed to the Cloud Controller. And we are going to just step through running cats for that pull request and how you would use them to validate it. So the first part is, what is cats? So cats is intended to demonstrate the platform, is capable of doing the things it is supposed to do. And so once you've run cats and you've had a green test run, this means that on the user level, most of the things that you expect your Cloud Foundry to be able to do to push applications, to bind services, to route traffic to the different components inside the Cloud Foundry, that it is proven that all those things can work. It is written and go. It's maintained by the release integration team, just a bit of trivia about the cat's test suite. And the one thing that should specifically be known about see if acceptance tests is is intended to validate releases of Cloud Foundry. And so this gets confusing sometimes because of the way that cats test that software. So usually what happens is I get an environment with a Bosch director somewhere. And then I have to deploy my software onto that environment. And then I run my cat's tests. And they make requests against that environment and exercise the behavior of that environment. So because of this, there is often a bit of confusion because it seems like we are trying to test that deployment. But the design of the test suite is intended to test the software that is underneath it. So I think the next slide is going to go a bit more into what I mean when I say this. When I talk about cats is not. Cats is not an SLI or a service level indicator for your platform. It is not intended to say that because I have run cats, I know I can handle this much traffic. Or my performance of my platform has these kinds of guarantees. I can handle this many apps. It's just supposed to validate that the software that you are running can perform a CF push, can bind a route, can do the basic things that a developer needs to do when using Cloud Foundry. One of the reasons how come it's not a good SLI is because during the course of running cats, your platform will be bombarded with an inhuman number of app pushes and other interactions via the API. And this traffic is probably never going to be realized on your platform out of an automated mode like you might find in cats. So even if you did want to or tried to use it as a service level indicator, it would be a terrible choice for that. The second thing is cats is not a smoke test for your platform. So you say, OK, maybe it's not going to tell me how good my performance is, but it can tell me whether or not there are any problems with my platform. And cats is closer to being a good test for this than it is to being a good test for an SLI. But it is also not a good idea to do this because cats can sometimes change some of the underlying configuration of your platform. So for example, the security groups test will change the security groups of your platform, which is usually something you don't want to do on your production environment. And so it's not a good way to validate that any particular deployment of Cloud Foundry is good. To put this all into a single mantra, cats is about validating software. It's not about validating deployments. It's an excellent tool if you're doing development work and you want to know, does my distribution of Cloud Foundry work, does the pull request that I'm going to submit break anything? But it's not a very good test for answering the question, how good is my Cloud Foundry deployment? So now I'm going to hand it off to Mike for the next section. All right, so now that we have our dev changes, we're ready to test our dev changes. So how do we get started? How do we set up and run cats? Cats uses a configuration file. It's a JSON file. It's called the integration config. And in it, we have some required properties that enables cats to properly target and push apps to your Cloud Foundry deployment. There's also some additional optional test runtime settings so that we can have more flexibility in the way we run this test suite. So as you can see here, we can specify user, space, org, and some timeouts. And we'll talk more about these timeouts in later slides. But most importantly, we have a list of test suites. These test suites you can choose to whether enable or disable. And so yeah, that's basically your integration config. Next time you are filling out that pull request, it says run and pass cats. All you need is this integration config, a running Cloud Foundry, a copy of the accept and test repo. And you're basically good to go. So we wanted to talk a little bit more about these test suites. This is just a very small subset of the test suites out there. In fact, there's over 20 test suites in cats. And this really begs the question of when I'm validating my software, do I have to test against all of these test suites? Like as average contributors to Cloud Foundry, our dev changes most likely does not touch all of these test suites. And so does it really make sense for us to run everything? A better question is to properly validate my dev changes, is there a smaller subset of suites that I can be running instead? And for that, it really helps if we categorize all the test suites. So first off, we have the default suites. These four suites are enabled by default. They're very simple. They're basically sending the checks to make sure that your deployment is healthy and functional in the most basic way. Next up, we have the special deployment suites. These suites, prior to running them, you need to have some sort of additional manifest or deployment setup. In other words, for example, you need to deploy an isolation segment or set up a credit hub before you can successfully run these test suites. But the third category is called, or quote unquote GA features. And this is just a list of all the other suites. This altogether lets us exercise all the basic user-facing intent features using the CFCLI. And so from the previous slide, we have some categories and we're now asking ourselves, is it enough to simply run the default suites? Well, like I said before, like the default suites are more of a standing check than anything else and it doesn't give us enough confidence that our dev changes didn't somehow leak into another functionality or another CF component. And so Tim and I spent quite some time and we came up with a more comprehensive list of suites to run. Our goal behind this list is that we want to encourage developers to enable as many suites as possible on their default Cloud Foundry deployment. And so we have these special case suites here and that is intended to be more of a enabled by on a case by case basis, depending on your deployment and depending on your dev change. But for the most part, we hope that this list of suites to run provides more clear guidelines for future contributors as to what do we expect when you submit that pull request. But basically, once you have your list of suites, you're ready to run cats. You know, you're excited, you're hyped, but it takes a really long time. And it takes a really long time because it has lots of tests, which creates lots of CF pushes, which means lots of waiting. And lots of waiting makes us sad. Developers feel like it's an expensive operation to make on your daily workflow. And as a result, they become discouraged from running cats. And so what we're here today to do today is that we want to convince you guys of some of our workflows and how we can achieve quicker duration and faster feedback. And so typically what we do is start off with focusing a test or specific suite. You go through the standard red green process, you know, watch it go red, make your change, watch it go green. This is a very quick and cheap way to validate your change. But once you're done validating your change on that specific test, you can now run the full suite. What we like to do to significantly cut down the runtime is to run it in parallel with multiple nodes. In the example here, we're running eight nodes. Typically the number of nodes depends on the capacity of our Cloud Foundry deployment and also the machine that's running the test itself. And just as a guideline, you know, your vanilla out of the box, latest CF deployment based Cloud Foundry can handle anywhere between six to eight nodes. In CI where we have larger deployments, we usually run cats with 12 or more nodes. And 12 more nodes significantly makes this test run much faster. But it also raises the question, you know, why are we stopping at 12, right? Why not run this thing with 30 nodes? Well, when you introduce too much parallelism, when you introduce too many nodes, you're susceptible to a lot of unexpected errors. And there's lots of ways you can deal with these errors. So first off, for every failing test, cats will print out all the CLI commands that it executed and all of its output. And so for example, in the screenshot we have here, we have a red test because it failed to a CF auth. And this is likely just because we passed in a incorrect user credential in our integration config. However, not every test error is this simple and straightforward. And in the case where we're running all of cats with 30 nodes, you'll get a lot of unexpected and inconsistent errors. Really common one is staging error. You'll see a lot of insufficient runs or insufficient resources. And that's because cats puts a lot of strain on your system and when you're doing so many concurrent app pushes at once, it can quickly drain the resources of your environment. And so when you find yourself running into errors that you don't really expect or errors you see sometimes but not in another time, then we suggest decreasing down your nodes, minimize a little bit of that parallelism or specifying longer timeouts in your integration config. This is so that you can allow cats to have enough time to perform those operations. Suppose you actually do want to have a Cloud Foundry deployment that can handle more than 12 nodes or you want to allow more parallel workload on your deployment. There are two known bottlenecks. First of all, it's the Diego cell memory, not running out when you have too many app pushes and also API nodes not having enough API VMs there to handle all the concurrent requests. And so as a general rule of thumb, scaling up these two instance groups will allow you to have more runs, faster runs and be less likely to run into those flakes. And we talk a lot about how we do cats on our local machines but cats are far beyond just our local machines. In fact, beyond our local machines, we have a team of concourse pipelines. And in our concourse pipelines, we run cats. There's a concourse resource called the cats concourse resource and it makes running cats extremely easy. Beyond the dev teams, there's a team called Release Integration that runs and maintain cats on a master pipeline and they run cats anytime between 10 to 20 times a day. And so they've really tried to optimize the way that they run cats. So we talked a lot about how a lot of this is conceptual and I'll hand it back to Tim to put some of this into practice. So to conclude, we're just gonna do a quick demo. This is an actual pull request that was submitted to the Cloud Controller. I'd like to say a bit of thanks to Alex Bleeze, Jen Spinney and Sam Gunaratne for contributing it to the project. So yeah, let's get started. And I've prerecorded this because, well, you'll see. So we're gonna start off by going to the integration config which we've been talking about. And here I'm just gonna search for true to find out which things we have turned on. So we've got apps turned on, detect, internet-dependent suite. We got the persistent app suite, routing, services, SSH, SSO, task test suites. We have a large subset of the suites turned on. So we're just gonna set our config to be the path to that file. And we can run our first test run against a Cloud Foundry that's deployed. So a bit about this Cloud Foundry. We're running it against a Bosch Lite, which is a lightweight development Cloud Foundry, lightweight way to deploy a Cloud Foundry. Right now we're running a vanilla CF release. So we haven't done anything strange or interesting to it yet. And this test run should establish a green baseline. So we know that the platform and the code that we're running against works with this version of the test suite that we have. And we passed. And so the reason how can we screencast it is because that took over an hour to run. And we don't have over an hour of your time. So after this, we're gonna check out a change that our contributors made to the CF acceptance tests repo. So most of the time when you make a change, probably just fixing a small bug or changing the documentation, you probably won't need to change the CF acceptance tests. However, this change is a change from the Open Service Broker API that the Cloud Controller needs to know how to handle. And so we want to make sure that in the future, the platform will continue to be able to handle this. So they chose to go through the time to contribute this change to the CF acceptance test. So they changed a bit about the JSON schema that is being accepted by the Cloud Controller. The point of this is not necessarily to go into the change that they're making, but to talk a bit more about the process. So we made the change to the test. So we're gonna focus this test suite so we can get a quick feedback app while we're doing our development on that change. Oh, sorry, fix a fast feedback loop on the change that we're making. And so we're gonna run it again. Notice I have three nodes and I just changed it to one node. Probably doesn't matter when I'm running just one test, but Bosch Lite because it is running on a virtual box VM and doesn't have many VMs in the Cloud like most Cloud Foundries do. It is very prone to running out of resources which is one of the flaws that CATS has. So I'm running it at a very low parallelism to put as little stress on the system as possible. So the test failed, which is what we expect because we haven't made the change yet. So now we have to go to the other place. So I'm gonna go into the Cloud Controller and check out the changes that they've made and there it is. They've made a commit that should implement this change so we can interface with the service broker. We make a release, we upload it to our Bosch director which takes a bit of time as well. And then we go and we find that release and we have to change the deployment manifest. This is all the Bosch bookkeeping to tell Bosch basically to change the code that it is running on some of the VMs. So I change the dev version from 11 to 12 and we do a Bosch deploy. So at this point in time we should have the new code. We have the updated test. However, we're not entirely sure whether or not we've got the right thing. We're doing development, development is uncertain. People make mistakes. Very often we don't fully understand the system we're working with. So we're gonna start out by just running the one test again to make sure that that red test goes from being a red test to a green test and at least the code does the very narrowly defined thing that we want it to do. And after a bit of time, it passed. And that gets us to the last test run which we're going to do now where we unfocus that test and we're just gonna run everything again at the end. So this is to validate that the change we made didn't break something else. In this case it's a rather small change. We're probably fine. But it's always good to do another run at the end to make sure that you're not breaking your friends who sit next to you and are helping to contribute. This last test run is usually best outsourced to a continuous integration system. So if you don't have an hour and a half to sit around and run cats again, you can say this is my change. I want to go try to run it and give it to an automated system that will deploy a Cloud Foundry with your change and do that change. So we did it. Hooray, the pull request passed the test. And it looks like when we go to GitHub to contribute our code back to Cloud Foundry, we can check that box and said yes, I did run cats and cats said that this thing does not break Cloud Foundry. So that concludes our talk. I hope that next time you run cats, your pain is at least a bit less. We are aware that these things are hard to run. This is a problem that we faced as well. So you can reach us on the Cloud Foundry open source Slack if you have any questions. Another excellent resource is the Release Integration Channel. They are responsible for this test suite and they have spent a lot of time trying to make it faster, trying to make it easier to use, both as people who want to contribute to it and as people who are running it to validate their Cloud Foundry deployments. So now we're gonna stick around a bit and answer any questions you guys might have. And yeah, that concludes our talk. Thank you.