 four o'clock instead of three o'clock. Yeah, that's better. Good morning everyone. This talk is about testing anyone not testing. Whoa, brave. The other track is kind of you don't care about efficiency so you can go to that one. So Kieran is talking about testing but he's taking it actually beyond unit testing. And we will hear about a bit of testing interface on hardware devices and things like that. Kieran is working for TED Electronics here in Christchurch? No, I actually work for Yellow Pages. Oh Yellow Pages? Well okay. I used to work for Tate. Oh he used to work for Tate, so now Yellow Pages. Yellow Pages being lucky there. So yeah, welcome Kieran and all about testing. Hello. So is this a talk about testing? Of course it is, but I think the point is it's not a talk about testing in the way that we normally as developers think about testing. It's not about unit testing and integration testing and what's good practice. It's about what happens when you have complicated interfaces and the testing is more from a testing, test analyst point of view, a testing department's point of view than a validation that what you just changed didn't break something. So that's also in the name. We're using Python to test, not testing of Python code necessarily. So what happens when your target isn't a piece of Python code? It's a bit of hardware. It's anything really. So what's the problem we're trying to solve? Originally when I was thinking about this talk, I was considering it in the problem space that I know which is hardware testing. So I worked at Tate and Cisco testing hardware devices, but I read an interesting article recently about a software shop who essentially came up with the same stack solution to the problem. And really it comes down to a single question. What happens when your tests start taking too long? And obviously this question changes depending on your context. In the context of unit testing, it's probably minutes. If your unit tests are taking five minutes, then no, you're going to start running them and you're going to have a problem. In the case of hardware, I think this question gets answered a lot quicker because if you imagine I have a Nexus 5 here, if I was to perform a factory reset on this, does anyone know how long that would take? I actually don't know the answer because I didn't want to do it. But I suspect it was about 10 minutes probably before you would have a usable system back. And from a testing point of view, you should probably be doing that every time because you want consistency. But it's completely impractical if you've got 500 tests to spend 10 minutes resetting one device. So why do tests take too long? In the classic unit testing case, I think you just simply have a lot. In the article I was reading, they had thousands, 10,000s, and it just gets to the point where it takes a long time. This kind of comes from both the unit testing case. If you have a system that's complicated enough to have thousands of unit tests, you're going to have a lot of external systems that you're going to be interfacing with. And usually in this situation, we cheat. We set up something that has a fixed login that you hard-coded into the code that it uses for its testing. But you get to a point where if you have a system which has to provide you with that resource, then you have to manage that resource, you have to configure it, you have to make sure it's in a known state. The second point is that insurance consistency can take a lot of time. And this is especially true in the hardware space. It's probably the biggest problem with automated testing of hardware devices is they're completely inconsistent. So what's the solution? Obviously it's to run things in parallel. We wanted things to run faster. But what happens when we try and run something in parallel? So I'm going to take you on a little bit of a journey, I guess, a bit of a story. We started with some unit tests that take too long. So what can we do about that? We can obviously spin up some Docker images maybe since we like Docker. We can SAP the code on there, split up the code somehow, maybe a sweep per image, and we can run them, which is all nice. We can check the outputs maybe, see if it worked. But we probably want to know the results of those unit tests. That's the point of running them. Did things pass, did things fail? And we probably also want the logging output because we're good people, so we're providing logging and debugging of what we're trying to achieve. So we're going to add a custom test runner to what we're doing, because we want to be in control of putting those tests, of what tests we run and what files get generated and where we put them. But if we stop approaching this as a tester and start approaching it as a developer and start approaching it as a tester, then we get to the point where the results are quite important because that is the output of your work essentially. So the business is going to come to you and say, is this a regression if you find a bug? And you're going to have to say, I don't know, because I don't have a record of this particular test running on X version or X hardware at some time in the past. So results becomes a service, it's no longer just some files you pull off and look at because you're not just validating your current changes, you're validating the entire solution that you're trying to test and you really want to record those results and be able to filter them based on platform inversion and all those sort of things. But let's say I'm testing my phone and I get a test case that says, I need to fill up the 32 gigabytes of space on this phone. What happens when I fill it up? Does it give me a good error message? But there's a 16 gigabyte model of this. So you get to the point where suddenly your test case has to define the fact that it can only run on a 32 gigabyte model and not a 16 gigabyte model. So we suddenly had requirements before we can even schedule anything. It's no longer a simple case of just making some workers, putting your code base there and running a command. It's put the code, analyze the code base, figure out what your tests need and then you have to have some sort of resource allocation system which can take in some requirements to a certain resource matches that, give it out and then you have the added complexity of that because you get to the point where you're supporting services because this isn't just about an XS5, I need a server that's plugged into this probably to be able to talk to it. And maybe I want two phones because I want to make sure if they can make a phone call between them then your entire test runner becomes very complicated because you have to take in an arbitrary resource definition, decide what it is, make a driver for it, make sure it's in a state that you care about and is acceptable for your tests. So we're gonna add test framework to the list. And for me, this is basically a stack which would provide a good level of automation. It would be, I have made a couple of these, one was a lot worse than the other one and the second one wasn't great either I think and obviously part of these, every part of this stack can be satisfied by existing services in some cases. Although I am surprised that in general I haven't found many generic solutions for these. And I think, I've certainly seen this belief that automation is free. I'm not sure if from a developer's point of view we see it but certainly at a managerial level they see this, we're spending lots and lots of money every month regression testing our products and they think I will write a script and I'll be for there forever and we'll be able to use it. But I think as a developer you can appreciate that if you have a stack that is that complicated it's not free. There's a lot of maintenance, there's a lot of work involved with trying to set something like that up and maintain it over the course of your product especially as the developers. When you're on the other side, the developers that are breaking things and changing things and making your life difficult. So I wanna talk about history. What do I mean by history in this thing? I mean that, Greenfield's is really not that common and actually you usually walk into a project that already exists with a lot of legacy stuff lying around. And this is a process thing that you try and fix your technical debt. But from what I've seen these projects are generally driven by the testing departments, not the developer departments which is probably something that would be good to fix. And the problem with testing departments is they don't have developers. So you end up with testers getting thrown in the deep end saying automate this, it can't be that hard. So they choose Python because it's easy. And it is easy to learn as we've been hearing recently. But it is still very hard to make a complicated software system that works well in lots of situations. And the amount of times that I've walked into a code base and seen a test case which is a good two, 300 lines long which is an exact copy of another test case in the same file times 50 with three or four lines changed in each test case is really problematic when you get to the point of you wanna change one thing about how the setup works but the setup's distributed across 50 files in your code base. So I think the other point about testers becoming the people who have generated this is they're also the people who write your test case is a lot of the time. Which is the right place for them to be actually because they are the domain experts. In general, your testers know the product the best as far as the customer is concerned. And they're the best people to decide if that feature is acceptable for your customer or not. So this is obvious, but I think I still have to find myself, sorry, I still need to remind myself of this quite a lot when I'm trying to design something. Just because something's cool doesn't necessarily mean that it's what you should be doing. And often it's this argument about what's the right abstraction between the right design decision between trying to abstract everything out so you don't repeat yourself to becoming incredibly complicated and you cannot understand the code base for the sake of a few lines. So, how about some Python? We're here to talk about Python. I've got a few examples of some solutions that I've come up with for some problems that I've hit. The first one is about requirements. This is annoying at best. It's simple in the case that we're talking about before there's only two options. But for some reason hardware companies seem to be prolific in making lots of variations of things. So, for example, at Tate, we've got some ex-Tate people here. They have five, six, seven, eight products probably. Each product would have five or six models in it. And from a tester's point of view, they wanna say I can run on any series of this product or I need this particular model or I just care about having this platform or this hardware variant or... But from a technical point of view, being able to consolidate those into something that you can throw at a resource management system and say, do you have one of these? It's quite hard. So, we're using Python. So, we're using an object. So, this is a representation that a developer, a tester will care about. They wanna, next is five, that's 32 gigabytes. This is their requirement from my example before. But the system really cares about models probably. Well, you need some level of absolute. So, having something that could translate the top one into the second one is quite nice. And basically, this was just a JSON-backed tree structure. So, it could enter at any point in the tree and if it had enough information, it could expand outwards from that point. And obviously, it's a model. So, I mean, it's an object. So, you can do cool things like, can you compare them? Can you all them together? Because obviously, the next is six came out and it only has a 32 gigabyte and a 64 gigabyte model. So, they are both applicable to our test case in this example. But from a technical point of view, what we really care about is, it's any of those actual models because that's what we can throw out our resource management system and say, do you have any of these things that you can give to me right now? Test frameworks. What do I mean by test framework, I guess? I think all knows unit test, all knows as a runner, a unit test is a test framework. But when you get to the point of, you've got 10 different resources. So, at Cisco, we had a test case which was, can you load a video conferencing box up to its maximum port count? The maximum port count is 104 HD calls. It takes a lot of hardware and resources to be able to, in real time, have 104 HD calls being up on a box. And you'll probably give up 10 different things you've got to orchestrate to be able to do something which, on paper, is incredibly simple. Take it to max port count, can you add one more call? But from a testing point of view, it's a lot of pain. So, you want a framework that really helps you and supports you to do what you would expect. I want to be able to write, well, I want to be able to set up method and it's a method to configure my environment. And I want to be able to write a single test case which lets me test what I want to test. So I need to have everything available to me. And I think simplicity is a key point here as well. Coming back to the point from before about, we need it to be easy to understand and we need it to be usable by a lot of, a large variation of people who are competent and not so competent in Python. So, super. We all love super. It's a pain point, I think. But it's a pain point because multiple inheritance is a complex beast. And unfortunately, when you have lots of things which are very similar, multiple inheritance is quite a nice solution to that. But it's incredibly easy to forget to call super. I do it a lot more than I would probably admit to most of the time. And I spend a lot more time debugging, missing a super call than I would definitely like to admit. So, what can we do about that? What if we turn our setup methods into decorators rather than having a call super, rather than having a setup chain which can go all over the show in a really non-obvious way? Why don't we just make it explicit that this is a setup method and that we need it to run that X point in a chain? So, when I implemented this, the setup sequence for the test framework for a classic test case was probably 15 or 20 methods. There's a lot of stuff that's happening under the hood to set all these resources up for you. So, in our base class, we can have some nice descriptive names rather than just setup. It doesn't really mean anything to anyone. And then the user just says, this is the setup that I need to do for my test. And obviously, we can use the power of Python and the inspect module to then load this class into memory and make a little function that'll tell you exactly what will be run for your setups. And it'll tell you where you can find them if you wanna know what they do and on what lines. And obviously having good method names here gives you a lot more visibility and to probably what that method should be doing. Debugging. So, use a debugger, it's the usual go-to. But a lot of the time, you're not debugging the code that you're writing, you're debugging the scenario you're trying to test. So, why didn't the 104th call connect? You don't know, it ran overnight. You got some logging, doesn't seem to tell you anything. What you really wanna do is be able to stop your test case. At that point, so you can investigate your environment, you can try adding another call to the box. See, just play around with it basically. Because there's a lot of value in that if you had to go and manually try and get that up to 100 port count without your scripts. It's a good half an hour's work to collect all the resources up and whatever. So, IPython gives us a nice little embedded concept. This is also true of the normal interpreter. You can, if you call that from your Python code, you will end up in an embedded Python, IPython shell. So, there's an argument to the second part of that call that will give you the depth, which is very nice. It'll basically let you go back up the call stack to a point that you called it from. So, you can have the context that you're called in. So, let's have a, let's have an example just to prove that it works. So, I've got an object here. We've got a local variable that proves that. I can actually see the local variable. We've got an interrupt point, which always interrupts, but in my case of the framework, you can essentially call the test runner with telling you, telling it where you wanted to interrupt it. And then it will keep running when it finishes. So, so usually you would call it like this, and it would go off and do stuff, and you'd go have a coffee, and then suddenly you'd come back and you'd be in a Python shell. And I added a nice little thing that tells you exactly what your context is, so she can get a bit lost sometimes when you're doing this. But you can see, I do have a local variable in my context, and I can do whatever I want. So, if I needed to talk to one of my resources, if this was a proper framework, I could go resource.whatever. And when I quit out, it just goes back to doing what I was doing, and everything gets cleaned up, everything does it as it was. And I can run that on a server. I can, yeah, it's completely flexible. And yeah, so that was all I had. So any questions? I did the running. Just in your entire stack, was there a place for testing the tests because they're not unit tests? So you've got a whole test framework and people say, oh, that's testing, so that's enough. But actually, do you also have to, you want a unit test framework for those tests? Absolutely. Using those resources. Yes, I would, for that stack that I put there, each of them I would consider to be a completely separate service that should be tested in its own right. So, most of them, the ones I've written, were web frameworks. So the results service, you post results to it. You post to the resource management system your requirements, and it will give you back something at some point. So yeah, absolutely. You should, but I think for that case, the normal unit testing frameworks that we have, like Django test and unit test and knows and all these things are what we know, what we understand, and we can easily use them, reuse those. That's, yeah, that is a problem. But basically, you've got lots of moving parts when you've got a system like this, because you've got lots of versions for everything. So, generally, you would have, well, in my experience, you have a concept of the device under test being on your testing version and all supporting devices being on your known good versions. So you can split that round, obviously, and put everything on your known good versions and put your infrastructure on your testing versions and run your infrastructure against known good releases and see if you get what you got before. All right, more questions, make me run. Hey, Karen. Are you aware of any open source test frameworks available for testing specifically for testing hardware? No, which I am quite surprised about. I'd love to learn of any if I could find some. But I think in this space it's, the requirements are so varied and so different across hardware. I think that it's very hard to make something generic because when you're making software, you're always trading off against something and what trade-offs you make completely depend on what features your framework ends up with. And the problem with testing is you can't just turn around and say, no, I can't do that. I mean, most of the time you want to turn around and say, actually, you can do that. So I have to kind of design it like that and you end up in a situation where you have very specific frameworks designed for your specific use case. So generalizing them can be quite hard, but I'd love to see someone try. Any more questions? Well, I have one because I'm interested in the whole question around maintaining your tests and not have them as second-class citizens in your code. If you have any tips to share around that, for example, if you're not a sole tester, but actually a team of testers or multiple people touching on your tests, what are your tips and experiences? I actually just removed, I did have a package of testing for a project I'm working on at the moment and more, a web-based one. I just moved it back into the code base for these were service tests, essentially, what I would call service tests, which are end-to-end tests of a large API framework. And the reason I did that was so you could change them in parallel and the diff had the service tests in them. But I think it's a process thing to try and maintain them as first-class citizens in that context. I mean, unit tests, yeah, I don't know. I do see what you mean. It is there's this disconnect between testers who want to write test cases and developers who just kind of want to validate something that they are currently working on and don't want to regression in that way. And actually moving to a case where you, your tests are, as you say, first-class citizens and have a lot of thought put into them as opposed to just being what it is, a test exactly what I did is definitely valuable, I think. But I guess I can't say I have much experience with how to make that happen. Excellent, any more questions? Any hands? No, then join me in thanking Kieran for his talk.