 There's that one. Hey everybody, I'd like to thank the Pi Texas organizers. And also thank you for coming to my talk after lunch. I know it's a little late, it was a little tired. Let's talk about some testing. So my name is Ali Sibji. I'm one of the organizers of the Chicago Python users group. And when I'm not doing that, I'm writing back end code for numerator. It's a market intelligence company. So I love testing. Writing tests gives me the satisfaction of a job well done. Tests help me confirm that the change I made works exactly as expected. Tests also give me confidence that my change didn't break existing functionality. It wasn't always this way. I used to hack together spaghetti code around chunks of answers I found on Stack Overflow. I didn't really care that I solved the problem, only that I solved it enough to cross it off my to-do list. If I ever went back to the project, I was afraid of making changes. I didn't want to upset that delicate balance. Things worked. If I changed it, would they still work? Probably not. And then I read Code Complete. And this book changed the way I approached software development. Steve McConnell made me realize doing things in a methodical way is a lot more important than simply just solving the problem at hand. I became deliberate about my approach, and I started breaking problems down into smaller and smaller chunks. After ensuring that each of these smaller pieces worked as exactly as I intended it to, I can put these together and create an entire system. And with tests, I can ensure that everything I did worked exactly as I expected it to. So tests allow us to move fast because they give us confidence in our code, confidence that we're producing expected results, and confidence that we're able to modify our code base without affecting existing functionality. In short, tests make our code more agile. I think it's important to see testing in the context of a real application. So I got a startup idea. It's called word count as a service. So you send in a URL. It's going to calculate some stuff and send you back language statistics. So right now, for MVP, we're only going to do one endpoint. It's going to be a top word endpoint. And so you're going to send in a URL. It's going to return a response with the most common word and the number of occurrences of that word. And so we're playing for world domination. We're going to build this MVP, get some funding. I'm not too sure what. And then we got IPO to be great. So the architecture of this application, it's a Flask web app. It receives requests. It's going to forward it to the after processing, to the right business logic module. That's where our secret sauce is. Once that's done, it's going to send back results to the endpoint. Flask is going to save it, or my web app is going to save it to the database. And then it's going to return a response to our client. So this is our secret sauce. I'm going to download this web page using this little known library called Requests. I'm going to extract the text using beautiful soup. And then the standard library's collections.counter is going to help me count the words. And then I'm going to return the most common word and the number of occurrences. So let's see that in code. And so here I'm just going through some Flask configuration. I have my database model. So I'm going to have the URL, the word, the number of occurrences. I have a helper method to save to the database. We'll come back to that in a bit. These are my Flask routes. So my web page, sorry, my top word routes, it parses a request, gets our URL. It sends it off to our business logic module. So let's take a look at that. So here we're going to download the web page using requests, extract the text. So I'm just going to be pulling all the things out of the paragraph tags. And I'm going to flatten it out to get a list of words. And then I'm going to count the words using the collections.counter. Going back, we're going to save to our database. So here we just have a create an instance of a model, set the attributes, add the record to the session, and then save it. And then we're going to return back information to our user, pretty standard web application. So now let's talk about some testing fundamentals. So we all have a common vocabulary going forward. So a system under test is a system that's currently being tested. This could be a line of code. It could be a method. It could be the entire program. And testing refers to the process of entering inputs into our system under test, and then validating outputs against an acceptance criteria. If our outputs are OK, our test passes. If our outputs are not OK, our test fails. Hopefully we have enough contextual information to find out what we should do to fix that test. So there are many different kinds of tests, but they can be categorized into two broad categories, black box testing and white box testing. Black box testing refers to testing in which the tester cannot see the inner workings of the system they're testing. And in contrast, white box testing is a technique in which testers can see the inner workings of the system being tested. As developers, we wrote the code, so we're going to be doing white box testing, but that's not to say there's not a need for black box testing. We wrote the code, we wrote the test, we have those blind spots. So we can think of all of our tests as a feedback loop that lets us know that our tests are working and producing expected results. If our tests are passing, that's great. Make a commit, make a PR, go on to the next feature. If the test fails, we have to figure out exactly what's going on. And the simplest form of testing is manual testing. And this is something like refreshing a web page or doing something in the REPL. And as the name implies, manual means you have to do something yourself. The developer has to do a specific kind of action. And this is not really sustainable, because if there's one universal truth in programming, it's that developers are efficient. We need to make testing as easy and as barrier-free as possible. Otherwise, developers won't write tests or they won't run their tests. So this is why we should write up our tests as code, and then we can automate the process of running our tests. So what we'll do is we'll use software to control our test execution and the comparison of the result with what we expect the result to be. So some benefits of automated testing are you have faster feedback loop. We can reuse and rerun tests without too much effort. Automation increases efficiency. Our testers can actually do things like exploratory testing, which they're actually paid to do. And finally, if we have automated tests, we can incorporate them into our continuous integration pipeline. And so what this means is that when you make a code request to master, it has to pass tests before people emerge it in. It's really annoying me. Can you still hear me? Yeah? Very cool. So this leads us to the most famous diagram in the testing world, the automated testing pyramid. So the pyramid presents a way of thinking about different kinds of automated tests when we're creating a balanced test suite. So the pyramid of, so what we want is we want a lot of fast, small, and cheap unit tests and a fewer number of big, slow, and expensive end to end tests. And so it's a really good rule of thumb to start with. It's not really set in stone, but if you want to start with something, this is a good place to go. OK, cool. Thank you. So unit tests are the most granular level of tests. And they allow us to confirm that individual units of code are working exactly as we intend them to. A unit is a testable part of your code. It doesn't really matter about the size, but generally, it's something that's like a public function that takes in some sort of parameters, does some sort of calculation, and returns a result. This is the most granular form of testing. And we should make sure that all of our unit tests are independent. This way, when they fail, we know exactly what's going on. So going back to that example, when we want to calculate that we want to find the top word, how do we test this function? Well, we can create a list that we know the attributes of. So here I created a list. I know who appears three times. I'm going to pass through that list. And then I'm going to compare the output to what I know the output should be. So integration tests refers to combining multiple parts of the system and testing them together as a group. It also refers to testing at the service boundaries. If our application makes a call to, say, a database, a file system, or an external API, that's the kind of integration test we need to ensure that works. So if our program does anything of substance, anything interesting, it's going to have a lot of components that are interacting with each other. So integration tests provide us confidence that those integrations are working the way we want them to work. So here's a function to save information to a database. How can we test this function? So the easiest way is to create an object, pass it through our function, and then query our database to make sure that object that was put in there is the object we expect to be in there, fairly standard stuff. So end-to-end tests are tests that we write to make sure our application means defined business requirements or functional requirements. We usually run these through the UI, so they're a little slower. But we can also test the subcontinuous layer, which is the layer just below the surface. And if our end-to-end test pass, that's great. It means our program needs business requirements. If they fail, we really don't know why they failed. And so we have no idea where to go, so we could probably need to look at some of our unit test or integration tests. So in our example, our interface is a JSON REST API. And so a functional test would be creating a test site with flask hitting the endpoint and then making sure that it's actually a 200. Did it return an OK? There's one more form of testing that we should talk about called regression testing. And when we want to fix a bug, we want to ensure that when we fix this bug, it never comes back into our program. And so good bug reports include code that we can use to reproduce these errors. So let's take this code, make a test case out of it, and then we have a failing test. Get this test to pass. And instead of throwing the code away, let's keep, instead of throwing that test away, let's keep that test in our test suite, document it with, say, an issue number. That way, any time we run our CI, we can ensure that that functionality doesn't regress. So we've been talking about this, let's be explicit. What are some of the benefits of testing? With tests, we can ensure that our program works exactly as we expected it to. Tests also confirm that the changes we make didn't break existing functionality. Tests help us identify bugs earlier in the software development lifecycle. The earlier you find a bug, the faster you can fix it. And I think this is a little bit controversial. The test force, if you write tests, it forces you to write better code. It shows that you've actually thought through the problem, and at the very least, you've used your own API. If your API is clunky, you have nobody to blame but yourself. So we can quantitatively measure a test in a few different ways. So the one way is a test ratio, and this is the number of lines of test code over the number of lines of production code. A ratio for one means we have more test code than production code. Does this really mean anything? I'm not too sure, but it's a metric people use. I think test feed is a little bit of a better metric. You wanna have a test suite that's run really fast. That way, when you're iterating on features, you can keep running your tests to make sure that you're not breaking anything. Test coverage measures how much your production code is covered by our test code. This could be a really interesting metric. We can come back to that later. But just be aware that no matter what metric we're measuring, if people know you're measuring it, they're gonna change their behavior and affect that metric. For instance, 100% code coverage. Is it great? Yeah, sure, but does it really mean anything? Not really. So just to recap, test coverage measures the percent of our code base that's executed by our test suite. It tracks the lines of code that were exercised, but it doesn't really measure the quality of our tests. It's really great to find untested code in our code base. So if we have untested code, and we're actually using that code, we should write tests around it. If we're not using that code, we should delete it. I don't know about anybody else, but I like deleting code a lot more than writing code. And so there's a lot of testing on coverage utilities in the Python ecosystem. The most popular one is coverage.py. So here's a sample pytest and coverage.py report. So at the top, you got some configuration for pytest. Each of these represents all the tests that are run. One dot represents a successful pass. And on the bottom, we have our coverage report. So here we have all the modules we're testing, the number of lines of code in each of those modules, as well as the lines of code that are not covered along with the percentage. Coverage also produces HTML reports, which are a little bit easier to read and they show you exactly what's going on in your program. So how can we gain test coverage? Going back to that end-to-end test we wrote, let's just take out our assert. Though we'll run our tests, we have the same test coverage where it's just like that. So this is something that you should watch out for. We really don't know how good these tests are. So there's something we should do in code review. You should at least make sure people are inserting something and not just running empty tests. Just a little note, coverage.py right now lets you know what lines of code are being tested. Starting in version five, you're going to find out what tests are covering what lines of code. There's currently an alpha release out, try it out, make blood reports. I'm sure that would appreciate it. So before we dive into test frameworks, we're going to have to go over a few more definitions. So when we want to run our tests, we want to run it under a known state so we can ensure repeatable results. So this is where test fixtures come in. Fixtures set up our test environment and then they return it to the original state after the test is run. A test case, that's an individual unit of testing. And what we want to do is we want to check specific input results in a specific output. A test suite is a collection of test cases. We can use a test runner to orchestrate the execution and reporting of tests to users. And your test runner, if the test fails, it's going to tell you why it failed. There's going to be a stat trace there. An invariant is a condition that's always true. And so we can use the assert built in keyword in Python to add invariant to our code. So here I have assert is instance. Is this list one, two, three, a list that returns true so nothing happens? And on the next one I have assert, is this list a tuple? It's not, so it's false. It's actually going to raise an assertion error exception. The test framework is the execution environment for automated tests. We use our framework to hook into and drive our application under test. Our framework also tells us how we should have defined our exceptions. And the framework usually has a runner which executes the test cases and reports the results to our user. So there's two big testing frameworks in the Python ecosystem. There's Unit Test and there's PyTest. So Unit Test is part of Python Standard Library. It follows the standard X unit from a small talk or Java style pattern. And since it's been around and used in many languages, there's a lot of books and a lot of documentation about it. But at the end of the day, you're writing Java kind of like testing. A lot of boilerplate. PyTest on the other hand doesn't feel like Java. The tests are really easy to read. They're really easy to write. It also has a lot of great documentation. My favorite part of PyTest is the plugins. And PyTest runs Unit Test weights. So there's really no reason not to use PyTest. And so for all these reasons, I love PyTest, so let's explore why I love it. So PyTest allows us to use the Python built-in assert keyword. There's no more messing around with assert helpers. And when an assert fails, there's a lot of detailed introspection about what failed. So here I'm comparing a list of zero to nine, which is just what range 10 does, to a list from zero to eight and the number 10. And when it fails, it tells me at index nine, the number nine does not equal the number 10. So you don't have to go back to your code. You can actually just see it in the PyTest output. PyTest features a fixture model that's just very helpful in running these kinds of fixture-based tests. So fixtures are functions that PyTest runs before your test and then it runs after your tests. We can import fixtures into test functions by specifying them in as input arguments for our tests. What PyTest does is it searches the current module and it's gonna go out to the plugins or the coughtest.py files. We can create fixtures of other fixtures and this allows us to compose complex test objects quite easily. So this is a fixture to create an item in the database. So here I'm creating something, adding it to the session, passing control back to my test function and then when the test functions run, I roll back that transaction. And so I got my setup, I yield with the generator and then I have my tear down to set it back to the original state. We inject this fixture into our test function as a parameter. It's a little bit of magic, but PyTest is a little bit of magic but it's really easy so I sort of prefer going with that magic versus something like Unisest. But one of the problems with fixtures is that they don't accept input arguments. So we can take advantage of Python language features to make fixtures a lot more flexible. So here what we'll do is we'll create a wrapper function or a factory function and just return the fixture that takes arguments that you want. I could write a blog post about this so there's a link that you wanna read later on in your spare time. So PyTest has markers that allow us to add metadata to our test functions and we can run those specific markers with the minus M selection of flag. And markers are really good to identify certain types of tests, say tests that are slow, that you don't wanna run all the time, but maybe those are the tests that you wanna run all the time in your continuous integration pipeline. The implementation of markers was revamped in Python 3.6. Before there used to be a lot of errors so if you are experiencing errors I really suggest you upgrade. There are a few built-in markers. There's a marker to skip tests, there's a marker to skip test conditionally, and there's also a marker to mark tests that should always fail. So we wanna test our functions across many different kinds of outputs. We can easily write it right a loop but PyTest is only gonna treat that as one test function. But if we use the pytest.mark.paramerize decorator, we can enable argument parameterization. So here we have three specific tests, but just be aware that this is spelt a little differently. It doesn't have the E, which might be the British or Canadian way. Not too sure, but it always catches me up so just be careful of that. So yeah, you see here we have those two tests that passed, the test that failed because 54 doesn't equal 42. So my favorite part of PyTest is the hook-based plug-in architecture. And there's a hook for pretty much anything. So if you really wanna customize PyTest, you can read the documentation and find out what you need to do. We can put all of our local plugins in the top test.py file, inside of each of our packages or sub-packages. We can also go out and get a lot of third-party libraries. We can invoke our test runner using the PyTest command at the command line. And what that's gonna do, it's gonna go out and find all tests that are labeled test underscore or underscore test.py. We also have ways to run individual tests, and I've included a link for best practices on Test Holder Structure. PyTest has a lot of command line options, so I really recommend checking out the man page or the help. And also be aware of all the different kinds of exit codes, just in case if your test fails, you wanna know why it fails, we can fix it later. So now let's look at a lot of testing tools that can make our life a little easier. Of course, number one is PyTest. Some useful PyTest plugins include code for code coverage. I like FreezeGun for freezing data time during the test. You can test Jupyter notebooks with iPython notebook extension. There's also one for XTest, which allows you to run tests in parallel. So TestDoubles are objects that allow you to replace our external dependencies with objects that we create and that we define, and that allows us to run our code through specific kinds of scenarios. I'm not really gonna go into too many details about this, but there's a list of a lot of great testing utilities. The most widely used is unitest.mock, but Lockheedow, which is a port of the Java library, that's also pretty good too. But not everybody likes mocks. I don't like mocks. I don't know anybody else who's like mocks. So there are a few alternatives for TestDoubles. vcr.py records request and responses as XAML cassettes, and then we can replay these cassettes for deterministic tests. This was actually based off the Ruby vcr library. I have a blog post about it if you wanna check it out. You can also test using man in the middle proxy. This is a little bit weird and requires a lot of setup, but there's some links if you wanna make the follow-up. I saw a talk at EuroPython 2018 where this company spun up Test infrastructure for pretty much every single test using Dockerpy. It's a little bit too much for my use case, but it might be something that you need to do. We can use Selenium to do browser-based testing. So here we can actually code up our interactions with the browser, add something to the cart, check the cart out. In my opinion, these tests are a little bit brittle, a little bit slow, but sometimes you have high-value features that you really have no choice but to test with Selenium. But if you go a layer below to the API level, we can use Postman to test all of our APIs. So what we do is we create a contract between our front end and our back end that defines how our API should work. It's really easy to get started with Postman, but in my experience, it's a little hard to scale. There are tools made for those purposes. A tool like Tavern. So Tavern allows you to specify YAML-based test specifications and then you can run them using PyTest. Underneath the hood, it uses requests, but it's also got a plug-in architecture. So if you want to hook in your web framework's testing library or your testing client, you can easily do that. And if you do that, please make a pull request. It'd be an awesome feature to have. So more testing tools. Docker Py can be used to ensure your containers are built to your exact specifications. I couldn't get Kano working, but there is an article if you want to give it a shot. Data science testing. If you have not read the pandas testing utilities, I highly recommend doing so. There's a lot of great assertion helper for data frames. Unguard is built around those kinds of utilities. It's a decorator-based library for data science. And great expectations is built for more data pipeline type of testing. I think it's a little bit loose, but it might fit your use case. If you have a Postgres database, I recommend checking out PGTab. It's written in Perl, but it's actually really good. I'll just a little note about property-based testing. This is when you check an object has the same property before and after an action. So this could be something like serialization, deserialization, and hypothesis is the go-to one in Python. Mutation testing, that's actually really cool. What that does is it goes through your program source code, the AST, and it modifies it. So say you have a conditional that's like, if x is greater than five, it's gonna change it to be if x is less than five. And if your tests still pass, maybe you probably don't have the best test. And I also like the name Cosmic Grade, it's like flipping bits. I don't really have a lot of time to talk about this, but there's a lot of ways to write Robo's tests. There's two patterns to structure a test. There's the Arrange, Act, and Assert pattern, and there's the Given, When, Then pattern. Pretty much the same, just use different words. When you write tests, ensure they only try to test one thing. This way, anytime a test fails, you know that exactly what's gonna go on. And just like in everything else in programming, you have to be pragmatic. Pragmatic in what you're testing, how you're testing, and when you're testing. People talk about test first, test last, who cares, just test. And the last link is about unit of behavior. We talked a little bit about unit, like what is the size of the unit? It could be a public function, it could be a private function. I like the definition of unit of behavior because it sort of defines what your program's actually doing. I actually wrote up a pretty long blog post about testing. Maybe you go on those too fast, if you wanna take a look at that. Martin Fowler has a great Wikipedia on testing. That's pretty much it for me. I'm not gonna be taking questions now, but I will be outside, so yeah, just come find me. Thank you so much.