 I'm Alex Martelli, you may know me, I'm best known as the owner of Python in a... Okay, apologies in advance if the presentation is not up to my usual standards, but I'm not used to being chained in one spot. I'm a walker, but there is no moving mic. As I said, probably best known for Python in a nutshell. The third edition is just out, co-ordered with my wife Anna Martelli Ravenscroft and Steve Holden used to be the organizers of the very first Python. But this is outside what I cover in Python in a nutshell. Actually, I would say the Python here is only used for examples. And what I'm trying to show is just about as useful in any programming language you may be using. Because in any programming language, a typical software system can be seen as a directed acyclic graph in which there's a lower layer of modules or services or components, call them as you will, that provides some functionality but don't depend on any other that you've written. They may be interfacing external entities, of course, like a database and a domain name system in this example. There's middle layers, which both depend on some modules and are depended upon. And there's top layers that are not depended on but depend on any other subsystem. So as long as you don't have any cycle in your directed graph, any directed acyclic graph can have its node classified in these. If you do have cycles, you have far bigger problems than I can hope to address. Because the point is the arrows are dependencies. If you have a cycle, it means A depends on B and B depends on A and you're in hell. And thus completely refactor everything, break the dependency cycle. That's much more important than anything else I or anybody else can teach you. So leave the conference and go do that and break your dependency cycles. It's really, really you want to do that. If you don't have dependency cycles, this will always hold. I've had some questions about, wait, multiple top layers? Well, duh, it is 2017, I'm told. So of course you will have an API and a web interface and perhaps a local graphical user interface and a local command line interface. So you will have multiple top level. I certainly hope so if your system is rich and complicated enough. So the next issue is, okay, so we have that thing. Why do we test it? Okay, unfortunately in 45 minutes I cannot compress a few hours' worth of explanation of why testing is the crucial discipline in software development. And I would recommend you go online, find any other talk I've ever given and most other talks given by other people and get those to understand why you really want, why you really have to test. We won't be covering them today. What I'm covering here is the how. Not the why, but the how. The most antique, traditional form of testing. Distinguished tests into white box, meaning tests that are written in full knowledge of what's inside, the components being tested, and black box which is supposed to only use the external connections made available by the components. That has been dropped since a long time in professional practice. It's not a very useful distinction. However, how we do things in the modern way looks like the old way with new names, and it's not much more useful. We nowadays tend to have unit tests, which are really white boxy, typically looking a lot inside the components they're testing, and they're written by developers, for developers just to ease development. Nothing wrong with that, but that's like one extreme, and then we only have the other extreme. Integration tests which are end-to-end, so they do have to go from soup to nuts, I think is a British expression for a complete meal, and will often have stuff that cannot really be automated and therefore need a human being in the loop. If you need a human being in the loop, by my lights you don't really have a test. You have a separate step of your software development and delivery cycle, which I like to call quality assurance. Use a different term than testing, because for my point of view, testing has to be automated. So for a complete end-to-end thing, you can automate tests when the top unit is an API, a common line interface, a web page using Selenium and similar tools, if it's a graphical unique interface running locally, there are some tricks to do that, but you'll never do it right. And meanwhile, what about all the other things we'd like to automate and we'd like to use in a continuous integration environment so that something gets fully integrated and released and deployed only when all tests pass. If there have to be human in the testing loop, you just can do that. Humans are unreliable, not repeatable, very bad at mechanically repeating a series of operations, very slow, very costly. There's a million reasons you must not have humans in the loop of testing. I have a completely different proposal. We have a software system composed of components, modules, services, micro-services nowadays, whatever. It doesn't matter. All the dependencies are all we're looking at, which naturally forms layers. Why not structure our tests in the same way we inevitably naturally structure our software, assuming we make it modular at all as opposed to one big million line program, which I hope none of us would do. In this view, then, of course, we have unit tests. They have to be very fast because they're running all the time. They focus strictly on a component or module or service internal logic so that at the limit you can mock out every dependency. I think they need to be fast. Essentially, above all, it's the top priority for your unit test is make them fast. Then, building upon that, and we'll see how, we'll have higher layer tests, but not on a single big jump from unit tests all the way to end-to-end takes forever test. We'll do layers and layers of testing, as we'll see. You can see this as a pattern language of testing structures. Pattern languages are most understood in this community for design purposes, but they also apply to a lot of other human creative activity, and one of them is testing. In a sense, we're talking about how to design but also how to execute the test. Sometimes I get interesting objections at this point about what you mean fast, above all. I think of fast when I'm doing production stuff, tests, I don't need to be fast, right? Yes, you do. In a modern integrated development arrangement, your tests, your unit tests, should be running all the time in the background. As the system sees you've saved some changes to a file, it should reload that and every dependency and every test that can be affected and rerun them all for you. In this, if that's the setup you have, and I hope so because it really multiplies your productivity, and just about any ID is able to do that today, if you have that and your test, set of tests that has been modified, it takes 10 seconds to run, then if there's any problem, you're alerted within 10 seconds of saving the problematic code. So it's still top of your mind. You can see the error and probably say exactly a duh. See immediately what you did wrong, fix it, and proceed. If it's five minutes, you've lost mental context. You've moved on to another task. You now need quite a bit of time to get back in your mind to what was I thinking when I wrote that, and now you're losing all of that you've done after. Seriously, an order of magnitude impact on your productivity because you didn't think that your test should be first of all fast, fast, fast. Oh, but integration tests certainly can afford to be slow. Well, I have a very recent case study showing why not. Python 361 released candidate one. So what was it three months ago? Something like that. Okay. In the speaker notes, I have the URL to the discussion on Python committers about what was going on. Essentially, Brett Cannon had to announce he had turned off the gating of integration tests on the continuous integration of 361, released candidate one, because they were taking forever. So actually integrating a pull request was getting so slow that the releasing question would have probably come out around 2023 or something like that. If your integration tests aren't fast enough, you might as well not have them. That's how important fast is for your integration test. Okay, there's a difference here because if you are well funded, rich, have general sponsors, you can be running your integration test on a million machines. Well, a million would be a bit of an overkill, but on a lot of machines. So as long as the slowest one is fast enough, the others are running in parallel and everything is rosy. Most open source projects don't have unlimited amounts of machines at hand. So we get charity by Travis or whoever and we can use, I don't know the exact number, Larry Minow since he's been such a sponsor, such a release engineer, release captain for so many releases of Python, but I think it's a single digit number of servers, not anywhere like enough. So you need to be fast, fast, fast. Now, everything that applies to other forms of automated testing still does. I wish I had a couple more hours to recap everything, I probably bore half of you, but so the first thing is, all tests must be reproducible. That seems obvious, but people keep running into problems. One of them is, oh, but I have a human there and get it out or it's not an automated test. But one example, but my module uses random numbers. Well, then make sure that you're able to inject a fixed seed so that your test will actually be doing the same sequence of random numbers. There's some delicacy there because maybe on some cases you're calling the generator five times or another path seven times so the same seed may not actually give, but if you're using random numbers presumably you know all of that and can ensure, keep it under control, much more common. Oh, but my code does something different depending on what day of the week it is or what time of day it is because it needs to do something, if it's between 9 a.m. and 5 p.m. Monday to Friday, do something different out of hours. Well, great, but then you have to somehow fake out the time and make sure to test both, in this case, to behavior so that your program should have in office hours and out of hours because otherwise if you're just letting the time be whatever time you happen to run the test on that's essentially random because you may be running them any time, all the time and many other excellent mandatory quality of tests whether layered or not the fundamental things apply. Now, let's start with some bit more concrete to show what I'm talking about. How do we test this database adapter? Well, the first, maybe I'm just testing my own logic then all I need to do is mark off the external DB component. Incidentally, there's a beginner's talk about mocks this afternoon which unless you're completely familiar with them mocking and patching is highly recommended. It's kind of a prerequisite of this talk but even getting it after is better than not getting it at all. So mocking is fine as long as you are certain you understand a hundred percent of the behavior characteristics of that external DB but there's a second possibility which when feasible can often be better. Use a fake also known as an emulated form of DB. That's local, that's controlled, local so you don't have to pay for network traffic pay or slow down things. It's totally under your control we'll point out some details. Made it in memory because you don't need gigabytes and gigabytes for a test so you can make a smaller version but the crucial thing it must respect all the same semantic constraints as the real system you'll be running in production. What's a semantic constraint? One example that's unfortunately common among various databases after close, the close method has been called on a connection any other, no other method must be called on that connection. If any other method is called then a runtime error will be raised. If this is the way the real DB behaves then it's absolutely crucial that the fake emulates this behavior. So it keeps track of whether the connection is being closed and if it's being closed and some other method gets called then boom, it raises that exception. A mock, of course, will not do that not naturally unless you know you have to specifically watch for it. If you do specifically know make sure your mock does have that because other future maintainers of the code may miss that subtle semantic constraint and if your mock has this completeness it will help. But this is an example of a general problem of mocking. Mock, stuff that you write to help your testing reflect the same understanding of the external system that your code reflects. If you understood that close must be the last method called on the connection then you won't call anymore in your code and your mock will check and give an error if you do. But if you don't understand that the test will pass anyway because the mock will not do the check. So there's a common mode of potential failure between the test not catching something and your code having that defect. The only real solution to that is the fake which will get to again and again there's another talk specifically about verified fakes which I strongly recommend. Okay, incidentally, fake in addition to having to respect all the semantic constraints of the thing it's faking may add others. This is a typical example for something like a database. The fake could say no more than 32 megabyte of data just because it's in memory and that will make it should be enough for testing. Now, given this set of constraints set of approaches this is where I use Python for examples and I assume that the mock module has been taken as a package. I'm not going to repeat this line it applies to all my successive example. I'm using mock first because mock patch is such a great way to temporarily substitute something for a real component and then take it out away automatically. I do that. Unit test.mock offers many ways to do it. I always use in this example the with statement because it's such a natural way to say do this temporarily, end of the with block undo it, whatever. So I'm mocking. I'm using auto spec in the mock talk I believe prefers the spec set and other things. These are specific to unit test mock and well worth pondering and that's why I strongly recommend this afternoon's talk but essentially it makes a lot of sense and I'm going to put in place something that will emulate most anything and for the details of the behavior you set its side effect field. In particular here I'm setting the side effect of the cursor of the connection of the fake database. So FTB connect cursor a big chunk of code presumably split into functions methods whatever which exercise every meaningful path of the application code. Well the code I own, the code I write. Now if what I'm doing is a second layer so using a fake instead of a mock then the typical structure is make the fake with appropriate parameters patch it to say new equal in mock patch instead of auto-spec true which sets an existing object in the other and then populate in this case the database by actually executing for example sequence statements on a cursor of this connection and then the same body of test as before because we've set up exactly the same situation except the fake instead of a mock we can proceed and for full integration test well I presume I start an instance of the database presumably locally like for example on the machine I'm using for test so I can connect so the connection can use Unix socket which is faster than a network socket that I have to use if it was actually on the net and populated somehow maybe just by executing stuff or importing a dump or something so it's got an initial situation and then the same body of test that I was using in unit tests which is where the novelty applies the body of test is a core reusable part of the test for a certain component it exercises all meaningful parts and that must include simulated errors incidentally if you're using mocks or other forms of spies will briefly summarize all the various test doubles later but you can also check what calls there have been, what arguments be careful of not falling into the trap of white box you don't want your tests to have exactly the same structure of the code so that if some innocuous change to the code causes the test to fail that's not the purpose the test gives you confidence that the code is still working so they must not reflect so if it's indifferent whether A happens first and then B or vice versa make sure it's indifferent in your test the extra checks are optional mocks are also always spies so they make it free but you can always wrap anything with a spy to just allow those checks if you're really keen about them so the big point in any case is there's a difference between mocks and fakes and there are many other kind of test doubles unfortunately the classic article in the matter Martin Fowler's this URL is very Java oriented still as I mentioned at the start these concepts apply to just about any programming language you might want to use except of course that to give examples in Java considering every variable must be at least 45 characters long and with several capitals in the middle will take more pixels whatever they from my point of view rather than the important fine grain detail between a dummy and a fake and a mock and a stub and a spy and so on is who owns it who maintains it who releases it a fake the way I'm using the term is something that is maintained and released by the same group who maintains and release the software being faked if I am part of an open source group maintaining and releasing a database I will have a fake version of the database ready for testing of an example of being complete SQLite which comes with the standard library is a perfectly usable database for reasonably small storage like a few gigabytes of stuff but it also has a colon memory colon special word to use instead of the file name it will make the database in memory which can be useful for very small databases but more particularly for tests it's not complete as we'll see it's not the old you'd want a fake so again there's a talk later on validated fakes later this morning which I strongly recommend because it will go deeper into what I can barely mention so a mock is very flexible can simulate anything but exactly because of that it can simulate something you think should exist but it's not what actually exists the fake as a fast limited emulation of the exact set of things that do exist they both should and this is where SQLite will short as a fake of itself be able to simulate any error that is they should be able to be set so that instead of giving a result they will raise an exception a specified exception that's trivial with mock you just assign the exception to the side effect instead of the result but for the fake the fake must have been presented this way or you can kind of hack it by wrapping a mock around the fake for the sole purpose but it gets kind of grotesque anyway the reason is that certain errors in particular which are crucial to be handled correctly are almost impossible to simulate to verify that your code makes any sense except for example what do you do if the CPU catches fire well you presumably catch the CPU on fire exception and proceed to turn down but the point is how do you test that because it takes a lot of CPUs to be burned and it's hard to automate too if you really need to do that by far the best is the CPU on fire and then you check you handled it now moving to a middle layer module what changes well for the pure unit test you can mock out the low level module on which it depends in this case you have fewer risks presumably all the modules and components we're drawing are owned by the same team so there is good understanding around so you don't really risk hopefully you don't, you just need to get your mocks reviewed by the specialists who know LL1, who know LL2 there is however interesting alternative for a mid-layer what if I use the actual LL1 and LL2 in this case there are no further dependencies so no further problems well it works if they're fast enough so you are not too sure if they are that's what time it is for measuring the speed of a specific fragment of code if they are fast enough you don't need to do the mocking that makes, let's work for you if you need to verify at the start you need the same amount of work but then if the actual modules are fast enough, ta-da you don't have to maintain them so you need to go forward remember there are some priming that you need to be able to do in your low levels which include simulating errors as well as whatever else is needed for speed, like equivalent to the colon memory thing in SQLite and this is the schema as again there's a prepare with side effects and then the body or prepare with priming and then the body what about a high level like picking one well then you have several potential chains of dependencies and you can do a pure unit test by mocking out the mid layers you can do a second layer by using the actual mid layers and mocking down below using the real one there's many possibilities of mixing you have to pick a subset because if you try to do every possible combination you suffer a combinatorial explosion and don't get extra useful coverage for your effort so I'm taking I'm picking one set of mocking versus actually and faking and this is the code for that single case so this time you see body of test only once but that's because this is only one of the many layers I recommend so what do you use well the decision depends a lot on the characteristic of your code again mock it probably fastest and least accurate actually it's least work fast enough to be designed to be primeable for speed and other things like faking errors fake is probably best if you're using software which releases it incidentally need not be open source for example Google cloud platform services are all being released with emulators on the side so that you can run your test locally without necessarily I know because I need tech support for Google cloud platform and I really annoy my engineering and program manager colleague by saying every time there's a bug in a customer which could have been avoided if they had run test I go to my colleague and say see he couldn't run test because you didn't release an emulator for this service and that service when can I see it because otherwise I can send you the problems and so on and so forth anyway and one of the choices is to control the complexity sometimes it's not obvious when you say DNS domain name system for most people it means okay I get foo.com and translate that into an IP 1234 that's known as the A record in DNS but DNS has a million other kind of records from the C name to the TXT and maybe you need the TXT record to get the ownership of a domain and so on in which case it's not a trivial mock anymore as would be if you only needed A or quad A records before finish I have been asked a very interesting question but does it apply to load testing well there's a whole thing about measuring performance so it takes the whole afternoon today so you may want to go that but if you really need low test for performance unfortunately it doesn't apply the layering concept doesn't apply fully because you can't really measure based on the layer except end to end so there the end to end is needed you can take correctness for granted it needs to be tested by separate tests and low test but speed if you need to measure precisely you need the end to end you don't want the retest correctness correctness must be tested separately you want to exercise the slow parts the heavy computational or IO parts you can give boundary if you're lower level models you depend on have a service level agreement the kind of 90% of query is completing less than 30 millisecond that kind of thing and you need to guarantee something similar to your users there is an approach which gives you a worst case estimate essentially you can use the intermediate test to measure the actual time spent in your code and count the number of calls to the external services incidentally if the external services don't give you service level agreement then you cannot offer any in turn any single call to one of those could stop forever and this is about the body of test for that on the other hand other question is but can I use this approach when what I want to test is a refactoring of course you can indeed if the refactoring by incidentally refactoring means changing the internals of the code without theoretically changing any of the externally observable behavior if it's all within a module this is the base case all the talk applies entirely to testing that module the you may need to tweak test bodies the body of test just to maintain coverage because maybe some things have gone away and you don't need to test them anymore or something for moving functionality between modules the first thing you do is you change the code and check that the unit tests of those modules at least the one from which you've taken things away fail this is the typical approach of test first but it's automatic because you already have the test before you do the refactoring remember never refactor code without test is what Michael Feathers called legacy code always put some tests in place first but it's unfortunate to have to deal with legacy code so you make the test fail you run the test they automatically fail now you added the test bodies and potentially module mocks and fakes and now they pass check and then the intermediate levels showing that higher level the intermediate tests in the version using the actual lower levels showing the higher level modules are not affected and everything is happy finally one problem with the unit having to be passed is that sometimes not often just checking that a condition was actually satisfied can be two times consuming to fit in the very short time I want my unit tests to run in when that happens what I've done sometimes is dump like in the state of the whole system at the end check only was fast to check and leave a nice blob from which the whole system status can be reconstructed as if I was doing snapshot and restart and then in the background asynchronously just run background jobs which continuously check for sanity whatever is very slow and long to check so well for me that I've started doing snapshot when performance affords even in production runs a production run goes nobody complains everything seems to have gone well but I have a snapshot there and with some probability some random sample I sanity check afterwards once in a while this will let you catch a problem that was just barely hidden didn't hurt your users but you're quoted before it's quoted your users which is by far the best thing and this lets along to question and answer the everything including the speakers note which I've been talking about is in that PDF on my website yes do we have a mic for the questions? one small question what do you think about possibility of using like real database assimilated database that's better? what do you think about possibility to use real database, assimilated database but real database is in memory so when you run unit test you don't start anything you just create new database in memory put few records there and after it will be fast but at the same time it will give you a new database I'm sorry but I mean is there a material difference between what I've written and what I should have written if no then there is no error if yes then of course I can detect it so you need maybe afterwards we can sit down with piece of paper and you can show me an example because I just cannot conceive of one it seems logically impossible if the checks are correct if the checks take too long see the last slide you do them offline okay so if your software is fast enough let's say small enough whatever that means but it still has layers would you say that skipping mocking and faking is a good idea and just doing integration test if they run fast because it kind of gets down a workload of writing mocks and fakes to stay nailed here if your code does pure totally CPU bound computational issues then the speed is constrained by your CPU such code is normally best move to numpy or similar libraries which you assume are correct because no Eric Cramon's most famous quote given enough eyeballs all bugs are shallow with a million users of numpy bugs don't have a very long life there and this has a little advantage that by taking it for granted that numpy is correct you can mock it out without anomalies and your whole test will be correct if as most programs your is IO bound then is where mocking and faking and so on makes a big difference because a lot of that IO may go I don't know to a magnetic disk well if it stays in memory instead that will be faster it may go over the network if it stays local that will be faster and so on and so forth a little bit by little bit you can easily gain order of magnitude by sufficient simulation and thank you how much longer do we have so one, two more questions so it's lovely to hear that your teams produce good fakes how common would you say this is because my experience with fakes sorry my experience with faking things is that the fakes do not exist elsewhere and you burn a lot of time failing to fully understand the system would you say that it is becoming more common you mentioned that your teams you push them hard to produce fakes I want to ask about integration tests what do you think in what detail would you include online distributed services in your integration tests it doesn't seem feasible with CI but then you need to use mocks or some fakes for your online services so I'm sorry the acoustic is a little hard if I understood correctly you're talking about real-time software with some kind of real-time constraints let's assume your DB is on another server somewhere it requires internet connection it's something to run end-to-end integration tests on CI although I'm still not sure I've been pointed the question I believe that generally the more real-life constraints the real software has the more the test will be living in simulated universe fantasy universe things can go well or badly in a simulated and controlled way I normally get asked this about IoT Internet of Things applications where indeed the big deal is how do I deal with a million teeny gadget all over the place well you don't well you do in your code I hope but not in your test even the so-called end-to-end test are you going to have a million rumbos going around a huge room probably not there will be some level of simulation inevitably otherwise the test will be how do you test if my software controls Rocket putting men on Mars how do I test that well not end-to-end because then you have to get them back from Mars real problem