 My name's Jen Spinney. I'm on the Diego team. I work for Hewlett Packard Enterprise out of Seattle, Washington. I'm Andrew Edgar. I work for IBM out of Regina, Saskatchewan, Canada. So as you can see, the Diego team is a multi-company team. We're also partially remote. Andrew and I are both not in San Francisco where the majority of development on the Diego team happens, but we make it work still. We do pair programming for everything that we do, just like most teams on Cloud Foundry. So with the remote development, we end up using a lot of screen sharing tools and stuff like that, and it generally works out pretty well. The Diego team is responsible for the new runtime of Cloud Foundry. We're not going to get really into the details of what Diego is, but there are a couple of talks earlier today that hopefully you went to and learned a bit more about Diego in. Andrew and I both did the Cloud Foundry dojo in June of last year. This is a six-week program where you're in person with a team doing pair programming, working on an actual project inside Cloud Foundry. So for us, that was Diego. And then we both continued on Diego after doing our dojos there as well, and we've been on Diego ever since. So this talk is about the testing strategies that we use on Diego. I think we do a lot of really interesting testing for the Diego project to make sure that it's a solid release when we do say that we have a new release of Diego. So we're going to talk about the different categories of testing that we do, the different scopes of testing. We're going to talk about continuous integration or CI for short. That's the entire pipeline to certify something as being able to be integrated back into the master branch. We're also going to talk a little bit about versioning because we make certain guarantees that when you do an upgrade from one stable version to the current version that we're deploying, that that upgrade has no issues with it and everything is smooth in the actual deploy. We're not going to talk about Diego itself. For this talk, we're going to kind of consider Diego as a black box system. It's just a complicated distributed service that needs to be tested. We're not going to go into the nitty gritty details of what Diego itself is actually doing. We're also not going to be talking about the details of concourse itself. We use concourse as our CI pipeline software and we think it's pretty cool, but the point of this talk can be applied to any CI system that you're using. It doesn't have to be concourse. We're not going to get into the details of concourse here. If you ever go to the pivotal office in San Francisco and you walk to the Diego area, you'll see a big monitor with this huge diagram on it. If you've ever used concourse before, you know that the little yellow rectangles are pulsating, giving you the sense that something exciting is happening, something's going on. Those are the actual active jobs that are running. Each of these rectangles represents a job and the pipeline generally goes left to right, so we start on the left and the jobs that pass on the left then trigger further jobs and further jobs and things move along in a pipeline. A common reaction when people see this diagram or see it in San Francisco is just to sit back and go, oh my god, this thing is huge. We get that reaction even from other people inside Cloud Foundry. We have a pretty big beefy pipeline and this is just our critical path. We have lots of other jobs that are running in the background and that are not considered critical path for actually deploying or releasing. We actually have at least, I think, three more jobs since the screenshot was taken. I think even five more jobs. We have a pretty beefy pipeline and so this talk is going to go through that pipeline and dive into what these different blocks are doing and why we consider them part of the critical path. Just as a general overview of how we do development on the team, the very first thing that happens when we have a story or a bug is we see it in the top of the backlog. The pair takes a look at it, marks it as being started, understands the requirements, the acceptance criteria of the story and then starts actually doing the testing and development to get a fix for it. Once they're ready to go, they do a get push. They push to the develop branch of Diego Release and from there it gets automatically picked up by our CI pipeline and the devs just walk away at that point. Assuming everything just went fine with their change, they just go on and start the next bug or story. In the meantime, the pipeline starts up. Now you have this left to right progression where things go through one step and then the next step and the next step and if there are problems along the way, one of these things will turn red and then we'll have to pay some attention to what went wrong. Then this is mostly automated. There are a couple of steps in here where the PM has to come in and validate that we actually want to ship a release or something like that. But other than that, from the devs point of view, they're done and their fix is at some point going to make it into a final release. One core principle of the Cloud Foundry teams in general is test-driven development or TDD for short. Just basically the idea that you write the test before you write the code. Specifically, you write a failing test first because if you don't write a failing test, it could be that if you write the code first and then you write a test afterwards and the test passes, maybe you just wrote a bad test in the beginning. Maybe the test would have passed in the beginning. It also helps you dictate, it makes you clear on what the intended behavior is so you don't get biased by your own implementation later and you shift your thinking about what the desired behavior should be based on what work you've actually done. So this applies to everything we do. The basic local development workflow, first you start off writing a unit test. You write a little bit of code to implement that unit test, to make that unit test pass and you keep iterating on that, writing more and more unit tests. And then you might need some component level or cross-component level tests for the feature you're adding. So you might add those and then make those pass as well. And while you're doing this, you want to make sure that all the tests are passing, all the other tests are passing so you're not breaking something else. And then once you've finished doing all your local development and local testing, that's when you can do a get push and that's when you walk away and the pipeline takes care of the rest. So throughout this talk, we're going to use the example of adding a new feature to Diego. We're going to step back in time and imagine that we don't have crash events in the BBS. So the BBS has this events endpoint that someone can hit and see all the events that are happening on the system. And we're going to imagine that we don't yet have an event type for crashing. And so crashing events happen when the app instance actually crashes and we want to make sure this gets flowed through the system correctly when we add it to the BBS endpoint. There's also the idea of acceptance. There's all this automated testing and automated verification, but we also have some manual acceptance that the PM does to make sure that what actually went in is the correct behavior according to the story that was implemented. We also use the acceptance environment for doing acceptance testing to make sure that in a real Diego deployment with multiple VMs on AWS, things are actually working correctly. That same environment can be used by the PM to go in and verify that the code that we added is actually doing the correct thing according to the acceptance criteria. So let's start actually looking at the pipeline itself. This is the very first thing that gets kicked off is an ego and the unit tests. The unit tests talked a little bit about them already. They're testing the smallest level of functionality. This is where a lot of our edge case code gets tested. Usually, a unit might be a function or maybe a little bit bigger or smaller than a function, but it's one single unit of work. We also have component level testing. So for example, in the BBS, you might have an events package inside the BBS code. And so for a unit test, you would just import the events package, test the function you want directly. So here you would make a fake crashing app instance, for example, and then verify that the events endpoint reported the crashing event correctly. But for the component level test, you'll actually compile the BBS executable itself and go in from the point of view of like an end user of the BBS executable. So you'll invoke it with command line arguments and stuff like that. In this example, we might not need to actually write a component level test because we might already have some tests around events like maybe this is a little too nitty-gritty. It's kind of on the line, but most of these component level tests are just kind of smoke tests to make sure that all the different packages and stuff are actually wired in correctly. But we're not doing extensive testing at this level. That's more at the unit test level. We also have a test week we call an EGO. An EGO is for doing cross component testing to make sure that if we have, for example, a BBS with an auctioneer that they're communicating correctly, this is meant to be a really lightweight test suite. So it's all running in a single container where all of our different Diego processes are running on a different port. And they all know about each other, but we don't need a full Bosch deployment. And so for this example, maybe, we would have the route emitter might be listening to the event stream from the BBS. And we want to verify that that communication happens correctly. So when the BBS now reports this new crash event type, the route emitter listens to that and does the correct action. Maybe it drops the route or whatever the expected action is for the route emitter to take. The cool thing about this is it's all just running in a container. So you can easily move it and run it where you want. We happen to actually use concourse itself to run these containers. So we usually are running a local concourse with a local host. So you can just run it directly there or you can run it on our AWS concourse instance that we use for our actual CI pipeline. You can also just direct it at some other team's CI pipeline if you don't want to deal with the resources yourself. So it's pretty nice to be able to have that flexibility and just run it anywhere. It's just easily contained. All right, so up till now we've been talking about the tests that run kind of in one small little container or on your local environment. But now we want to test a little more expansive. We want to test on a full blown deployed CF and Diego environment. So the next steps after INEGO and unit tests have passed is the creating of the Bosch releases. So we need to create a Bosch release from CF and a Bosch release from Diego deployment branch. Or from CF release we get the runtime past version of CF release to be able to get ready to deploy it into our acceptance environment. So one of the things is we have one main acceptance environment in our CI pipeline and that environment is called catch-up. So part of deploying to catch-up is we need to deploy a new CF release and a new Diego release. But we're also running a bunch of unit tests against that. We're running a bunch of different types of tests. So we want to make sure that we don't interfere with running tests while we're redeploying the environment. So we have a step in the pipeline which is called the catch-up environment. So we're waiting until there's nothing running and we're ready to be able to deploy a fresh environment to be able to kick off all the tests. So now we've got the catch-up environment. It's all ready to be deployed. We've got nothing else running. We've got some additional periodic tests that run against our catch-up environment and so they're not running so we're ready to be able to deploy. So the next two steps are deploy CF and deploy Diego. First thing, now we've got everything's deployed. Now the deploy steps may have caught some problems. If we've checked in some code that maybe a change to the manifest or manifest generation and now the manifest doesn't generate then maybe our deployment will fail and we'll catch that early. But now we have deployed everything to our catch-up environment and we're assuming everything's ready to go to start testing. So the first thing we do now the smoke tests are just a really simple set of tests get back to smoke tests. They're just a simple set of tests to do minimal testing. So we want to make sure we can push an app the app can run and everything is all running fine. So this will be able to catch us if we have done something severely wrong like we've checked in something or we've deleted some code so that nothing runs very well or we can't push an app. So assuming the smoke test ran we now move on to an acceptance test. Now we've got a couple of different acceptance tests. We've got the cats which are the CF acceptance tests and they live in the cloud foundry repository and then we have the windows acceptance test because we want to run a set of tests against windows endpoints or windows cells. So what are the acceptance tests? Well all the acceptance tests are written from the perspective of the end user. So we're going to run a bunch of tests using all the CF CLI to be able to make sure everything runs as a user would see it so that it acts as we want. So we're solely using the CF CLI. The cats as they're defined in CF can run either against DEAs or against Diego and so for us obviously we run it against Diego. There are a bunch of tests that can run specifically to DEAs or specific to Diego. We've got a bunch of tests in there around SSH being able to SSH into containers. That's something that only runs against Diego cells. So we are running our cats and our Watts against the cells that are running for Diego in the catch up environment. So a reason I might want to write another acceptance test is for example in our example where we had the new crash event if we didn't have a test that validated that I could do a CF events and see the crash event then maybe I would want to write a new test. But since we already have that test we can now validate that we're getting the events back as we expect the user to the event to occur to them. And so we're just really validating that from a user point of view everything's working as it should be. So one of the other major pieces and Jen talked about it a bit earlier was our validation to make sure that we can upgrade. And so we have a special test, a suite of tests called the Diego Upgradesability tests. And these are all run in parallel with the mainline inside catch up although they aren't run against the catch up environment. So what are the what are the dusts? So they're ensuring that we have a baseline. So what we do is against a Bosch light environment we will deploy a baseline. So that is an original release that worked with CF and worked with Diego. So an original version at a specific point in time. And now the goal is we will push an app and have that app running at all times and then we start a piecewise upgrade to the most current environment. Upgrade CF to the current environment. Update Diego in a piecewise fashion. So we'll update say just a single cell. We'll stop a cell, make sure things are still running or we'll then upgrade just the brain components like the CC bridge or some of the other components within Diego. So at each step we want to make sure that that app that we pushed at the very start is continually routable. Well there are some minor problems when we know that it won't be routable like when we're upgrading CF but in general most of the time we want to make sure that we can still route to that application. And then also at each step of the upgrade process we will also do a simple smoke test to make sure that everything is still running as we expected it. One of the issues with this this test is that it takes about an hour and a half to run. So it's good that it's running when all the other tests are running ends up finishing at about the same time as everything else. Although it's very long it has caught a lot of smaller errors that we found that we've never found in any other test. So it has proven to be very useful and it does ensure that when we release we know that somebody can upgrade from an older version of Diego to the newest. So now say everything's past and we're all really happy we get to the deliver state and this is an automatic step in the process that marks all the stories that were involved in this candidate build and marks them as delivered. Now that's an indication to our PM Eric that now he can start validating. Make sure that he's got an environment in Ketchup that has been deployed up to this release and he can validate whether those stories are actually truly working as he thinks with the acceptance criteria. So he'll be able to run his acceptance tests against that environment and make sure that all the tests or all the stories are ready to go. So if Eric has run his tests and feels that it's ready to go there's now in the manual step that he will click to say okay we're ready to ship. And this is marks the release as ready to go and ready to be shipped and committed and released to everybody else. So we do have a step that creates a draft release of the new Diego so it'll mark it as a new release version and it'll be in draft state so then Eric can do things like write the release notes and get it ready to be published. And then as we were developing all the developers have gone on and started doing a bunch of development. We have a step that will automatically merge things back into the develop branch from this release now. Okay. So that was really the main pipeline of RCI. We've got now everything through and released. Now we do have a bunch of other tests that are going on kind of in either different environments or at a periodic step and one of them is called Vazini. Now Vazini is a set of tests written by the PMs and it's again an automated suite to test acceptance. So Eric or the previous PMs of Diego started writing these set of tests that validate functionality and what we do is we just run them every 30 minutes against the catch-up environment to make sure that acceptance, we haven't broken anything along the way, right? We've developed and produced new functionality. We've got to make sure that all the old functionality still works. So that's another set of acceptance tests that is run periodically. Now we usually run that in CI against the acceptance environment but we do now have scripts in our Diego release to be able to run that against say a Bosch Light environment in development so we can also run those there or generate a Bosch deployment so we can run it as an errand in any really Diego development. Alright so there is another piece called our benchmarking. Now if you were around earlier and you went to Jim and Luan's talk, they talked about performance and performance benchmarking we do run against a different environment after we've shipped, after we've hit the ship or deliver, we now have a stable environment and we want to run performance benchmarks against it and so this is the step where we kick off a deployment and we deploy against a whole fresh new environment and we run a set of benchmarking tests and so these tests run probably after every delivery and to make sure that everything is still running performant like we haven't introduced a performance bottleneck with some of our new code and make sure that everything is still working as we hoped. So looking again at our pipeline it's pretty massive. We said we do have 51 jobs in the pipeline we do have six if not more Bosch environments that we are deploying to is that too much and because we know Flakes happen there are a bunch of external dependencies. We depend a little bit on AWS when we do deployments we depend on GitHub being available Docker Hub being available to test all the Docker tests so there is Flakes. We do have when we pair in the morning and decide who's going to work we always have a pair that their job is the builds are. Their job is they have to watch this pipeline all day and some days that's a full time job. We want to make sure it's always a balancing act between how much testing we're doing and how much testing we're doing in the CI and getting things out. Sometimes because we have issues it can take away at least a pair from being able to do productive work because all they're doing is watching the build. So we know it's a balancing act but in general we want to think that our top priority is the development team is to produce quality builds and that's why we have all this testing so we want to balance between how much testing we're doing and delivering code and I think we're leaning towards let's test as much as we can to make sure that our releases are reliable can be upgraded and perform as we expect I hope we've shown that some top of mind that the development team is to make sure that our releases are very reliable. I think that's about it and maybe we can open up to questions you're going to have to yell because we have the microphone but there's any questions? Yes? The question was around how do we organize for writing all the other unit tests like the integration test the benchmark test and a lot of the times we'll have a story to do some development around writing some benchmarking or improving the benchmarking originally when we wrote the benchmark test we had tests all around how the bulk loops work and then we needed to add some more additional simulation of having a bunch of rep and the rep load so that was actually another story and a pair would pick up that story to be able to develop it and so that's how we usually do it we need to write additional integration tests we'll do things like that it'll be part of the pipeline or part of the backlog as stories to do There's also some of the integration tests are also just expected as part of a regular story so it's like if you're working on a certain story it's definitely expected to write unit tests but you have to think about whether you're supposed to be writing tests at other levels as well so we have dedicated stories for specific test suites like the benchmark benchmark BBS test suite but a lot of these other things like a Nego and these other test suites it's expected that as you're doing development you create tests for that as well yep so the question was about how we actually trace a story throughout the actual pipeline and then associate that story like how we actually auto deliver in Pivotal Tracker in this case when we hit that deliver state in concourse it's called the tracker resource so we're just using that as it exists it's already out there so you can use it already yes so the question was have we run into any problems with concourse itself and you know every once in a while we do if you do get a lot of failing tests or failing jobs sometimes we run out of resources on some of the concourse workers and that's part of what builds our team does or pair will do is look and sometimes we've had to just restart the workers redeploy the workers in concourse we try to keep up with a pretty recent version of concourse and a lot of the times the concourse team is right in the San Francisco office so if we have a big problem we'll go ask them pardon me we have six workers so for Diego we just run our tests are just running against AWS because we're supposed that level of testing is kind of outside the scope of Diego's responsibility to figure out I think more of I don't know maybe Bosch level testing to make sure that it works correctly on the different environments but our environment is just a Bosch environment on AWS so the answer is no the Bosch yeah so we aren't really planning to expand it I mean our release is supposed to be certified as well it ran within an AWS environment the Bosch team definitely confirms that things run and deploy in other environments we are working with you know some other teams IBM we're going to spin up a performance test against a software environment so we do have you know we do work with other external teams to do a bit of that testing the question was why is benchmarking done after the release and we want to make sure that we're not testing against you know things that may not even work right so we want to make sure that we're doing a benchmark against something that is very close to be released right we want to say okay this was a candidate everything passed we're ready to go we're going to run the benchmark test against to validate at the final step yep we're right at time so that's good all right thank you