 So, welcome, Kristi and Michael. Hello, everybody. I'm Kristi Wilson. I'm a senior developer at Demonware, and I'm the team lead of the Test Tools team. And I'm Michael. I'm also a developer at Demonware, where I focus mainly on test automation and general quality stuff. And we're also from Canada, by the way. So first, a bit about us. We're both from Demonware. We work in the video game industry. We do online services for those video games. And if you want to learn more, come see us in the vendor area afterwards. Today, we're going to take you on a journey. It is one of the many tales in Commander McFluffles and the quest for quality. Today's tale, once upon a system test. There are many stops along the commander's journey. The tale starts in 2011 at Demonware, where testing needed improvement. Before the commander could improve testing, the commander had to learn what testing was. With that newfound understanding, the commander could understand some tomes containing best practices for system testing. And the commander picked up a couple allies along the way. Pi test and Dockerpi. Before we go, we want to give you some practical takeaways that hopefully you can use right away, that are tailored specifically for people who do more development work or more operations. So on with the story. So we're going to start the commander's journey by going over the current state of testing at Demonware. So back in 2011, we had just a gigantic, monolithic platform. I mentioned that we did services for online games. What I didn't mention was that it's actually one gigantic service. And in order to test our features in this gigantic service, we would do things like test and production or test manually in our local development environments. And we even tried to ease the burden of spinning up new test environments by making really complicated bascripts that were really unmaintainable. Now it's 2016, and Demonware has caught up with the microservice craze. So instead of having one monolith, we have a whole bunch of microservices with complicated dependencies between them. The sad thing is it turns out it's actually easier to test the monolith than it is to test the microservices. So to deal with this additional complexity, we now have a team dedicated to test tooling. And instead of relying on just unit tests, we have unit integration and system tests. So to go on with commander's journey, next they had to find out what testing actually was. But we actually found it very difficult to define testing in that way. So instead, we're going to focus on why we do test instead. So why do we test? First and foremost, we test in order to increase our own confidence that our software actually does what we expect it to do. When we write a test, we're actually codifying the intended behavior of our application in code. And so we can go back, and it's a good way to see as our software evolves how it's supposed to work. And also, as we continue to run these tests, we can easily catch bugs that maybe are reintroduced as time goes on. We also want to clear up some common misconceptions about testing. Some people think that we test in order to find all of the bugs in our software, but it's actually pretty much impossible. No matter what you do, there are going to be some bugs that you don't find. The only bugs you're going to find are the ones that you already know how to look for. Yeah, so to illustrate this, you see behind the commander there is a pink bug with a bleeding heart. For those of you who don't know, that's the Heartbleed bug. And as Christie was saying, it's unlikely that we'll actually be able to catch this bug before it affects us, just because when we write our tests, we already have the bugs in mind. And as you can see, it's conveniently placed behind the commander's gaze. Another misconception about testing is that testing improves the quality of the software. The tests themselves don't actually improve the quality of the software. By the time you run the test, the software already has whatever bugs it's going to have. If you want higher quality software, the place to do that is during the requirements gathering or the design. But testing gives you information about the quality of your software. And software that isn't tested is usually viewed as lower quality because there's less information available about it. So time for example. Suppose we have a simple service. Let's say it's a cat matchmaking service to go with the cat theme. And the goal of the service is to help cats find other cats to play video games with. So the cats themselves will talk with the service using a client library, and the service itself will store state inside of a database. So how should we test this? As we alluded to before, ideally you have at least these three types of tests. Unit tests, integration tests, and system tests. So unit tests. We're going to use unit tests to provide almost 100% coverage of the library and the service itself. Unit tests are the fastest tests to run, they're the easiest to write, and they're the easiest to maintain. So we're going to cover pretty much everything with unit tests. We're going to shoot for 100% coverage. We're probably not going to make it because it's not really reasonable, but we're still going to go for it. We also have some integration tests. In this case, they will test the interaction between the service and the database. Because up until now, we've been testing each of the components of our service in isolation. And now the system tests, which is the main thing that we're here to talk about today. So system tests test the entire system from the perspective of the end user. They're the most valuable tests because they actually use your system the way the user does, and they're the most likely to find bugs. On the other hand, they're the hardest to write, the most complicated to run, and the slowest because of all the setup that's required. So for our cat matchmaking service, we're going to rely mostly on the unit tests and the integration tests to do our coverage. We test all of our tiny, well-factor components with unit tests. We cover the gaps between the service and the database with the integration tests, and then we add just the sprinkling of system tests, just a couple happy path tests, and maybe a few error cases. So with the commander's new found knowledge of testing, they were finally able to decipher those ancient tomes that they just found laying around. The tomes were riddled with phrases like ship it and docker docker. While the commander was taken aback by this arcane terminology, our intrepid hero carried on anyway, and in the process learned about some best practices for system testing, so the best practices. The first one is that you should be giving your tests a fresh test environment, like whenever they run. This will help avoid dependencies between your tests. So for example, it should not matter what any individual test does because you get a new environment for each test, right? And as we'll see in a bit, if you have a dockerized environment, it makes it even easier to achieve this ideal of having a fresh test environment. It's also important to make sure that your tests can easily run both on your build servers and locally. So you want them to be on the build servers so that the continuous integration is making sure that they work over time, but you have to make sure that if there's a problem and somebody needs to debug something, they can really easily run the test locally as well. Another important detail is to restrict the environments that you support. If you start using docker, you might be under the impression that docker runs the same way everywhere, but it actually runs very differently, say if you're using Ubuntu or if you're using something like the docker beta for Windows, it actually behaves quite differently. And then if you allow people to use all of those environments, you end up supporting a lot of obscure problems that are just specific to their environments. So it's more best practices. As I mentioned, your test should be running on a fresh state, but they should also be cleaning up after themselves, right? Because for everything that your test leaves running after it's completed, it just puts extra burden on the person actually doing the testing, and will probably make them less likely to run your test in the future, or want to run your tests. Additionally, your tests should both fail fast and informatively in order to reduce the time it takes to identify a problem and also to react to it, overall tightening the dev cycle. Just a quick note about Glue Code. If you're writing really well-factored tiny bits of functionality that do one thing and do one thing well, at some point, you're going to have to bring all those things together somewhere. We often refer to this as Glue Code. So this example code here is just using a bunch of other modules and calling into them. If you've written unit tests before, you know that if you want to unit test this, you have to create a whole bunch of mock objects, and then you have to sort of model these complicated dependencies between them. The test that results from that is often very hard to write, it's really hard to maintain, and it doesn't really add anything. So for this kind of code, we recommend skipping the unit tests altogether and just covering it with system tests. So as the commander went along their journey some more, they came across two allies that promised to help make system testing a lot easier to do. The first ally was PyTest. So for those of you who don't know, PyTest is an alternative Python testing library. This is in contrast to the built-in unit test library in the Python standard library. You'll see that when you write tests with PyTest, you'll find that there's less code overall, they're very minimal, but also PyTest comes with a lot more features built in by default. However, they are optional, so you can use them whenever you'd like to or if you'd like to. The main thing we'll be talking about today is PyTest fixtures. In PyTest, a fixture is simply a nice way of defining some setup and some teardown logic for some state that your test requires. And PyTest will ensure that the setup and teardown are called in that order for each of your tests, which is very important. And as we'll see, when you system test, you generally need to set up a lot of states, right, because you might have a really complicated application that you're testing. So, time for example. So on the left here, we have two green boxes. These are the setup and teardown for the fixture, and on the right, there's a test. I mentioned that PyTest will make sure that setup is run before your tests and teardown after your tests. And by default, it'll actually do this for every single test you write. So it makes it very easily, easy to achieve that clean state ideal. You can also change when PyTest will call setup and teardown. So in this example with the yellow fixture, the setup, instead of being called before each test, it's actually called once before any of your tests are run, and teardown is also called once after all your tests are run. And you can even actually combine these two together to create something more complicated setup if your application requires it. And now a little bit about Docker. At Demonware, our services are fairly hard to set up and run. So to make this easier, we put them into Docker containers. When we started to write tests that use these containers, at first we wrote complicated bash scripts that did the setup and the teardown, but it wasn't very maintainable. So then the commander found their next ally, Dockerpy. So Dockerpy is a Python library for using Docker. The interface, however, has a one-to-one mapping with the REST interface, so it's a little bit clunky, and I'll demonstrate what that looks like. So in pseudocode, I'm gonna show you some code that you would use with Dockerpy to create a client object, pull an image, create a container, start a container, and then remove a container. If you've used the Docker command line at all, but you know that steps two through four are usually just the Docker run command, but you don't get that same convenience with Dockerpy, and you have to be more explicit. So first, we're going to create the Docker client object. Also, if you're interested in using any of this code, there's a link at the bottom of all of our slides that goes to a GitHub repo that has all the example code in it. So especially when the examples get a bit longer, if you actually wanna take a look at it in more detail, just go to that URL. So with this example code, right off the bat, you can see that this would only work on a system that has Unix sockets. So restricting the environments becomes pretty important. The other caveat is that we're passing this flag to automatically detect the version of the server so that we don't have to keep the client and server in sync. Next, we're going to pull the image that we want to run. On the Docker command line, if you don't specify a tag, it'll default to the latest. The Dockerpy does not do this, and instead it will actually pull the entire repository. So you have to be explicit about the tag you want. Another caveat is that the Dockerpy will often not raise exceptions in cases that you think it would. You think if it failed to pull the image, it would raise an exception, but actually you have to parse that out of the response yourself. So that's something that's important to be aware of. Then we're going to create a container. The important detail here is that we're adding a special label to it. So what we do with our tests is we add the same label to all the containers that we start in our test, and then we can do some fancy things like dumping the logs from all the containers after the tests are over. Then we start the container, and then when we're done with it, we stop it and remove it. Yeah, so time for more concrete example of Dockerpy and PyTest working together. So I've replaced the generic set up and tear down in the green boxes with create container and delete container. And an even more concrete example, suppose that we have a web service which is indicated by the yellow box here that has no state by itself. It stores a state in the database somewhere. So we want to test that. So we'll spin it up once because it has no states, but it does store its states inside of two database containers, Redis and MySQL. So we're going to have a second set of fixtures which get set up and tore down for each individual test that will give us new databases each time. Here's an example of a simple PyTest fixture. This does what Christie was describing earlier. It creates the Docker client, starts the container, and then it tears down the container at the end. The main thing to note here, if you can see it, this is also in our repo is the yield. In the yield, we actually are returning the IP address of the newly started container. And it might not be apparent here, but we're actually able to use that IP address inside of our tests that uses this fixture. PyTest also has a very elaborate hook system which lets you modify the default behavior of PyTest. And we actually use this to dump the logs of all of our containers at the end of the test run. So this particular hook is the log report hook which is executed whenever PyTest wants to dump the test report somewhere. And in the event that the test run has failed, we'll actually want to go through each of the containers that have our special label and we'll dump out all of the logs from that to send our output. And we were pretty impressed by this. So if you've used Docker at all, you might be wondering about Docker compose. Could you use Docker compose instead? It seems like it gives you very similar functionality to what we're doing. So yes, you can. And it works really well, especially if you want to use exactly the same setup for every single test that you're running. So if the cluster of services you're running for each test is the same, Docker compose makes a lot of sense. If it's dynamic, if you're doing things like changing the volumes that are mounted or changing the port mapping or doing anything more complicated than something like Dockerpy, it makes a bit more sense. If you do decide to use Docker compose, it still fits in really well with PyTest fixtures. So you can have a fixture that does the Docker compose up and then does the Docker compose down. And then you can also use Dockerpy to inspect some of the containers and get information out of them if you need it. So this is another example of what that would look like. So this is a fixture that does a Docker compose up, then yields the IP address of one of the containers that started and then tears down the cluster. Again, the example code is up in our repo if you're interested in using it. We also encountered a few important gotchas along the way. One of them is that Docker has no notion of a service actually being able to receive requests. So sometimes tests will fail because the service in the container is actually still starting. So you can get around this by having an executable inside all your containers that you can call from the outside that says whether the service is ready for requests. And you can use backoffs. And there's a couple of Python libraries, backoff and retry, which make this really easy. It's also important to make sure that your containers start up as quickly as possible. Something that we've kind of learned the hard way. The slower the containers start, the slower the tests will run and the slower people get feedback on their code and the slower your development time will be and this will lower the overall quality of your software. So time to wrap it up sort of. So the commander has had a long and arduous journey but has gained a lot of knowledge along the way. So next they'd like to share with you some of the takeaways they've gotten for both the Dev and Ops perspectives of testing. So you might be thinking that was kind of cool, but what do I do with it? So we're hoping that we can give you some specific things that you could try when you're back developing. So if you do more development work, it's really cool to know how to write tests and writing tests is great, but sometimes it's also even more important to know when not to write tests. If you're going to use system tests, use them sparingly. That being said, if the next time you have a feature to develop, try some test-driven development. Try starting with a system test. If you don't have any system tests with the software that you're working on, try introducing one for each piece of software that you own. Make sure that it can run with as little setup as possible and that it runs as quickly as possible and then add it to some kind of continuous integration system. If you already have tests, take a critical look at them. Do you actually need all the tests that you have? Are some of them retesting functionality that the unit and integration tests already cover and can you remove them? And can you make them any faster? And the same things to apply to, do you opt-minded focus out there as well? First of all, you want to know why not to write system tests. So for example, you probably do not need them if you want to just test some one-off scripts, right? Because by their very nature, you do not care whether these one-off scripts keep working into the future. In contrast, if you do have tooling and other scripts that do need to work in the future, then yes, you should definitely have system tests. And you should start by having at least one system test which will exercise enough functionality in your tool to prove to yourself that it works. And as well as with developers, you should also run these regularly so they keep giving you value. Generally, for ops tests, they're like two categories. The first one is tests that involve services that you can run. So for example, before we had a fixture that starts with my SQL container, that's something that we can run locally. And for that, we recommend using something like the commander described earlier, which is PyTest and Dockerpy. Now, for tests that require things that you cannot run, like those in Amazon Web Services, for example, you can still use PyTest, but there are some questions you should ask yourself first. So for one, is it feasible to have a short test in this external environment? Is it gonna cost you a lot of money? And also, is it easy enough to clean up after yourself in this external environment so you don't run up excess charges or anything? And if you're comfortable with your answers to the questions, then yes, you should definitely run tests for these types of tools, but use them sparingly. So in conclusion, system tests are great. Definitely write system tests. Don't write too many system tests. If you have services that you can run in containers, try checking out PyTest fixtures with Dockerpy and or Docker compose, it works really well. So as the commander's tail comes to a close, they are very content with all the knowledge that they have gained across their journey and they're looking forward to bringing that knowledge back with them to their own castle. Thanks for listening. Time for questions. Hi, thank you for the talk, very interesting. I have a lot of questions, but I will try to do just one. When do you test? I mean, are you testing in continuous integration? Do you have dedicated environments? Do you test? Do you do system tests in the developers' machines? All non? Yeah, because for us, usually we try to test as much as possible. So while you're developing the feature that you're working on, you will run the tests. Ideally, you would catch any failures in the unit test stage because they're faster than system tests. But definitely, we want everyone to be running these tests all the time. And we have them run in bamboo, for example, all the time just to make sure that they're run. But ideally, your developers would also be running them too. So we have a team that's dedicated to our build infrastructure. So we've been using mostly bamboo. And we run all the system tests in bamboo on bamboo agents. And we're slowly migrating over to Jenkins now, using agents that run more in the cloud. But we're trying to make sure that these tests will run on developers' machines as well. Do you have more questions for the demonware booth later? Hi. So I'm interested in what's your ratio between unit tests, integration tests, and end-to-end tests. That would be the first part of the question and the second part. Why not only use end-to-end tests? OK. You want to do the first part? OK. So the ratio, I would say, so it depends on, this is more of an ideal that we're going for with most of our new software. We also have a lot of legacy software that is not, it's basically all unit tests that aren't really unit tests. But what we're aiming for is we'd have, say, like hundreds or even maybe thousands of unit tests to a handful of system tests, like 10, or less than 50 system tests to 1,000 unit tests, something like that. Basically aiming for that 100% coverage and then just testing some of the client-facing end points with the system tests. And for the second part of the question, which I believe was, why do you not just run system tests? So as you've probably seen, to spin up your entire software stack, it's very expensive a lot of the time. And it reduces the turnaround time for when someone's actually working on something, right, to test it and make sure it works. So that's why you want to use system tests sparingly, although you are right in that. They give you the most benefit because they actually use your software the way it will actually be used. So that's pretty, that's the bottom line really, is how much speed do you want to sacrifice? And usually what we'll do is we'll have more unit tests in order to catch things as early as possible because those are really fast to run before we get to system tests. The other thing is it depends on how many paths there are through your software. So if you have a lot of like branches and then those have branches, then covering all of that with system tests is basically impossible because of the number of cases you'd have to cover. But if you use unit tests for those code modules, you can make sure all of that stuff is covered and it might be completely infeasible with system tests. But some software is better suited for system tests, like we also write some software specifically for like automating, like testing and deployment. And for some of that stuff, we have pretty much only system tests and no unit tests at all. So it really depends. Hey, thanks for your talk. One question regarding data fixtures. So you spin up the containers, but how do you manage getting the data into those Percona or whatever you use for data stores to then be able to test the flow? So I think what we usually do is we have one fixture which spins up the database container and then a second fixture which depends on that fixture which will actually insert the data into it. And you can do that pretty easily with PyTest. So pretty much you just use Python to insert the data before you actually run your test. Right, so in our example fixture, we had it just yielding the IP address of the container, but before that you could do some other setup if you wanted to. And then some other, we can, if that is not fast enough, in some other cases we build, we have like a base image that has the database in it and then we will regularly build images on top of that that have the data that we need for the test in it. And then the test will just start the container that has the data it needs already. Thank you. I got a question. Have you tried this approach with virtual machines in non-dockerized environments, like background, vSphere? So are you asking if we've used it without Docker? Yes. I don't believe we have, although I don't see any reason why it wouldn't work. It just might be more expensive to spin up a whole new virtual machine versus a Docker container, but you could easily plug in something like Vagrant, I guess, in place of DockerPy in our examples. Or like Bottle for EC2. So I have a system that's actually pretty similar to that and I have this like small technical problem. My fixtures automatically download any images that you need. And I have this problem that you run tests and then nothing happens for like 30 seconds. Can I write a plug-in, pop up, turn my fixtures into a PyTus plug-in so I'm able to just pop a message? Yeah, because it's all of your streams, so that's the way? Or did you do something like that, actually? Or... I think your team did that, right? So we've also had that problem. I don't think we have a great answer for you. In some, a lot of cases, we have the logic so that if the image wasn't pulled, we pull it, and then we have, on our build agents, we have a previous step that will always pull the latest image, so it's not, and then we assume that when people are running it locally, they've sort of done the pulling themselves if they want the latest image. It's not a great answer, though. I think there's a lot of opportunity for somebody to write a really good library for using Docker with PyTest test fixtures. So if there was something like that, that would be great functionality to provide. But yeah, I would recommend looking at the hooks. There might be something that you can add, or maybe just output from the fixture itself, because you can always output anything you want. It's just that sort of muddies up the PyTest output. Thanks. Any more questions? We got time. Thanks for the talk. It was very enjoyable. And my question is, how would you approach a situation when you have a kind of system you described earlier in the presentation, when it's a giant monolithic with very poor coverage, and it has only few system tests, but with very fragile? How would you solve the situation? Oh, you say the tests themselves are the fragile part? Sorry? You said it's fragile. Is it the system that's fragile, or is the tests the first? No, the systems are fragile because they're based on some hard-coded information. I guess for that, I guess you would start with system tests because you can use those to easily verify that your thing is still working. I guess that doesn't really address the flakiness. That's sort of an ongoing issue in the whole testing space. But if you start with system tests, you can test based on actual customer requirements. And then as you start to refactor and improve the rest of your code base, you can start writing unit tests and integration tests for those. But start with the system tests so that you know that your software is still overall working. For the monolith that we mentioned at the beginning of the presentation, what we have is we still have all of the legacy tests, which were kind of like a weird mix that would reach straight into the internals of the system and call things. So what we're trying to do for new services is we're kind of kind of isolate all of those old tests. And then for new things, write unit tests and write integration tests and write system tests. So kind of like slowly transition over to that and delete the old tests as we go. But it's not an easy solution. Hi, me again. So I'm interested in how do you handle tests that have a lot of dependencies and are extremely flaky? So are you just repeating the test if they fail like three times and then say, hey, that really failed? Or do we have some other cool strategy? So that particular strategy you mentioned of rerunning the test when it fails has actually caused us a lot of problems. So several years ago we started doing something like that. And because of that, the problems with the tests have kind of been mounting up over time. So at the moment, we're actually at kind of a crisis point where we have to do something serious about it because we've been ignoring these flaky tests for so long. So I would recommend trying as hard as you can to remove the flakiness, like change them to be as deterministic as possible. Often you can achieve that by going with the unit tests, like try to figure out how you can write unit tests that remove the, whatever it is that's flaky. Like it might be a random element or something about the file system or something time related. So use unit tests to control that part and remove that from the equation and then that sometimes makes the test a bit more dependable. But I would say definitely not the rerunning of the tests. Yeah, though there is actually a pie test plugin for rerunning tests automatically. So you could do that, but try this approach first. Last question. Thank you for the talk. I'm interested about the organizational way of testing it. How many tester developers do you have compared to the developers that are working on the project and how do they, I mean, interact, they jump? Yeah, so in general, I think we try to have the developer writing the new feature, actually writing the test as well, because they are the experts on the feature. I think in some cases we have tried to pair programming where someone else will write, let's say, the system test because that's less dependent on the internals and more about the general feature requirements. But usually we do try and have the same person write most, if not all the tests. We have about 120 developers in the whole company or engineers in general. Michael is the only one who is explicitly software engineer in test, and then I'm on the test tools team and there are four of us all together, so we work on the tooling specifically, and then Michael's trying to help some of the teams that have larger testing concerns. But in general, we're trying to encourage people to all be skilled in writing tests so they can write their own tests and deal with their own problems. Okay, so thanks.