 All right, welcome to Journey to the Test. Experience isn't getting serious about testing in the Apache Traffic Server project. So there we go, intro slide. So I've been professional for a little over 40 years. A tech lead at the Edge Group of Ryzen Media who are responsible for developing the CDN which is a traffic server based. I joined up the traffic serve community back in 2010. I've been effectively a tech lead and architect for the community for most of that time. All right, so what was the state of things when I started? So when I joined the ATS community, testing is done by a set of hand rolled unit tests that use a little bit of infrastructure framework that did accounting and would generate messages if something failed. We had some internally supported tests called regression tests which were built directly into the executable binary and you would run by firing up the traffic server binary giving it specific arguments and saying run these regression tests. So in the early days this was okay but the testing was sporadic at best. We got a few new unit tests in during this time but really we didn't get any new regression tests because regression test framework was obscure and not documented and really hard to write tests. And particularly if you had to run traffic through the traffic server you had to compose with very bizarre synthetic clients and servers and effectively it prevented this from getting moving forward. So as I said, initially that was kind of okay but as always everything gets more complex and also the community was growing at the time so we were moving from a very relatively tight knit set of developers who all knew each other to a more diverse and broad community where there wasn't always a direct fit personal contact between them. This created a lot of testing situations where since there weren't really any new tests but we were adding new features by new people we were putting untested stuff into production. For instance our H2B2 support has had a number of problems and we're still struggling with that five years later. Obviously we've made a lot of fixes there but that was not an experience what we really wanted to repeat. And there were also some doubts about how similar our regression tests the internal ones that run inside with their synthetic setups were to actual production use. So you're testing the software with itself and you can get into holes where the bugs cover themselves so to speak. So the community was concerned about this we said, well, obviously we need to do better testing. So a couple of efforts were made to do that. The first one was a project called TSQA Traffic Server Quality Assurance it was built by James Peach who was one of our star developers there it was basically a batch scripting system that was back in 2013. And it had some issues it had a performance issue in that every single test would build a brand new traffic server. Now this is really nice because it meant we could test traffic server with different build options. You can set various ways to have TLS or not TS or any of that but still really slow any for any significant number of tests. It was also kind of fragile in that as a batch scripting what it really did was let you in effect make notes of manual testing that you did of things you set up to run traffic through traffic server. So all of the infrastructure you needed to actually test it you had to hand write a shell script commands do all the process management set up your background tasks and all of that inside the scripting in bash. And if you needed different configuration files you'd have to roll those by hand and use cat essentially to write out the configuration files that you need. So this was not really very successful. It was a good attempt but in the end I went back and checked the source log we ended up with seven different tests for this which is not a roaring success. So this is known at the time and after about a year another developer named Thomas Jackson came in and he's tried to learn from this experience. And so he write a Python based infrastructure one because he's good at Python and two well it's a common language and this is intended to replace the TSQA stuff. Now this is clearly an improvement over what we had before because you're using Python you could build these Python classes to allow the infrastructure work for you to fire processes to adjust configuration files and begin because it was Python it was simple to go and get a Python package that lets you create an internal HP server that you do internal HP clients or call curl to do this kind of testing so that became much easier to get those stuff fired up. The internal HP server was set up in such a way that based on the URL path you could call different Python functions and then have those Python functions generate return stuff. So clearly better, clearly better still not really a success. We had some initial activity that came up and people were at a few tests and by few I mean in the end I went check this again and a total of 12 tests were written using this system. So again, we did better and it was a good sign but still really not getting to the level of testing that we thought we needed. So what to do? So well, testing's important, we got a test we decided we're just not gonna give up we're gonna keep doing this, we're doing better so let's try again. So in 2015 about a year later when it was clear that this wasn't really taking off my boss at the time Josh Blatt and I started a long-term effort to improve the testing and the real goal of this, let's be clear is not to do testing, that's not the point the point is to improve the reliability of the software and ultimately that's what we want to do. So we said, well, these were good efforts why did they fail? They were done by bright people, talented people and they didn't work, what went wrong? And the number one lesson I pulled out of this was adoption is key. The real thing that failed here was that James wrote tests for James thing and Thomas wrote tests for Thomas's thing but really didn't get any cross fertilization or adoption up for that. And so he said, well, that's gotta be the number one target here we've gotta have multiple people writing tasks it's gotta become a normal thing to write tests. So which means that the framework that any framework you build must not just test the system thoroughly it must provide something that developers like that developers want to use that provides them some features and encourages them to use it to raise the bar to getting in. So the thing we decided was that simple frameworks these are easy to write and they look like they do stuff they look like they test but they don't provide that level of infrastructure that developers need in order to actually be encouraged to write tests. And so we wanted to do that. And the big thing we decided here was that the framework must automate developer tasks needed to build tasks. As we talked about with the new TSQA one of its big improvements was better processing configuration file management and that's since that meant that it was easier and more automated to set up processes to adjust configuration files. And that was a big step forward and we want to say let's do more of that that was clearly better. Let's try it in a bigger way. And we think that was the one of the critical element to getting better adoption. So that's what we looked at. So third time is a charm. So the set question is what can we do better? What do we want to do? Now one of the things when I talked about working out this plan we decided we're gonna commit longterm. We're just gonna whip out something in a couple months and say, well, let's see if that works. We're going to do a real design, a real, make it a real project with priorities and actual devoted developer time to get this working and to be serious about testing. And that was a big change. So we said, well, what did we want to do? What does it want to achieve? So we decided there are two big things we wanna do. We wanna do unit testing and we wanna do full up testing and I'll talk about that a bit more in a bit. We wanted network isolation. For us, this was a very critical feature because we're testing a proxy, a proxy that we use for our own CDN. And really it would be a bad thing if during testing requests that looked like production traffic leaked out and hit actual production servers. This would get us yelled at a lot. So that was a very important thing. And even if we avoided that, doing a full up testing of a traffic server can saturate the network. Literally, even on a 10 gig link, you can set up traffic servers to literally saturate that 10 gig link. And that can really annoy anybody else using that same network for their work environment. So we wanted to be able to say, well, if this is a problem, we wanna be able to take the whole SaaS system, put it in some lab with isolated lab machines and just run it on that lab network and that require any external connectivity. This way we don't break our production servers. We don't annoy other people trying to use a network. Another big goal were is repeatability. So great, your test run says there's a failure. Well, how do you debug that? Be able to run the same test again with the same setup, obviously, is a critical part of that. We wanna be able to say, if we find something, if developer A finds something broken, he can say to developer B, okay, this is your code and here's the test I ran and what I did to get that breakage so that you can put your debugger breakpoints in, you can put in logging messages. You can do other things and depend on getting to that same point, that same status in your test. Now, obviously you can't always do this. There's always race conditions, there's timing things, there's cross thread stuff but we wanted to get close. You wanna at least have the, as a base, generally be able to get to the same sort of state that you had when we detected the problem. Another issue we had, which is particular us, I'm not sure how general this is, that we're testing a proxy. So we need to test not only things that are from the client point of view to say I send a request to the proxy and get an answer back. Does that look right? It's critical to be able to test the stuff between the proxy and the upstream because most of the work that the proxy actually does for us as a CDN is sanitizing, regularizing inputs, right? So we're gonna be able to say, well, we have TLS in the outside but we're gonna talk non-TLS to our upstreams or we're gonna use a different key to go or certificate pair to go to upstreams or we're gonna cleanse HTTP headers or we're gonna reject certain requests that have wrong form. We're gonna do routing on the layer seven stuff. There's a lot of work the proxy does before we ever talk to the upstream. And it's obviously critical to be able to say is the proxy actually doing that correctly? So that's a very different thing. It makes us aware that there's a lot of the standard kind of testing things for web app specifically don't work for us because we can't do the upstream testing. We also wanted to be able to do hand rolled tests and production simulation. So obviously production simulation is a big thing for us to be able to simulate production traffic that actually putting it into production. But to me, the hand roll test were a critical element because again, as we said earlier, we want developers to use this. We want developers to actually look forward, well, at least not dread, using this thing to help them with your own testing. And so one of the things to be able to do that is to say, well, you wrote some new feature, you wanna test it. I want it to be sufficient easy to hand roll a test in our system that you decide to do that rather than actually doing it manually or do it manually and then quickly translated into the testing system. So the ability to hand roll these tests up and stick them into the test base was critical to me. And finally, a key element was detailed diagnostics. And so we'll talk about that in more detail in a bit. So let's go back and explore a few of these in more detail. So unit and full up testing. So for complex system like traffic server, there's a blurred line between unit testing and when you do full up testing and it's really hard to judge that a query. And so we realized from the start that we're simply gonna have to be able to do both of them. And if they're independent system, that's fine. They're really very different outlooks and viewpoints. So we have a lot of things like, I'm gonna just talk about TextU. It's a class like string view that does in place string manipulation stuff. The details aren't really important. But because it's a nice leaf library, unit testing works great on TextU. We can say, okay, we have all these input strings and we wanna do these manipulations and we expect these results out. So it's very simple, very easy. And the best thing to do is unit testing on that. So that's great. And it'd be great if we could do that for everything. But because of complex system, we have things that sit on top of TextU and things that sit on top of that and things that sit on top of that. At some point to actually get to the point where you can test this stuff, you have to link in large chunks. And one of the problems with some of the unit tests we had had before is that they had very complex linkage requirements just to run the tests at all because of different things and how they work. And you begin to wonder whether you're really testing something that looks anything like a production environment. Or if you are, why not just go to the production environment? So yeah, so we needed to do the full up testing as well. One of the key demarcations for me in looking at this is when I have other processes, when I need other process to do the testing, then I wanna say I wanna do full up testing. At that point, it just gets too complex to unit testing. So, and diagnostics, this was a big critical feature or a lot of the other previous efforts really didn't give you a lot of information when things went wrong. I mean, I've seen stuff, 37 or 42 pass, test pass. Great, that's really useful. So one of the things we looked at from the very beginning is to get very detailed information about what the test was doing at the point of failure. So both for repeatability and to understand what was actually happening, which test, what place in the test, what input we'd sent, what things we'd seen coming out, what exactly was different about what came out than what we expected. And so that was a critical element and when in doubt, more detail is good. It's annoying to ignore excess data, but it's disastrous to not have the data you need. So we wanna err on the side of detailed diagnostics. I'm not gonna talk much about MOCs. I really just do not like MOCs. I've tried them multiple times. I've never had good luck with them. Maybe it's just a personality defect for me. But really we looked at this a bit and decided that the mocking type stuff is simply not for us. Now some of the stuff we did ends up, people might claim it's like a mock, but that's a matter of interpretation. We didn't use any mock frameworks and had no intention of doing so. So actually it is, that was the grand plan. How do we actually do this, right? So for unit testing, what we decided was we looked around for a few different things. As I said, we looked at MOCs a bit. Well, we decided on what's the catch unit test for C++. I'm not gonna go very in-depth on that. There's good documentation for it. All I'm gonna say is that the adoption of this was very rapid. Within a few months, this was taken up by the community. It's completely solid in use and people just generally will write catch-based tests or any sort of new libraries or little features that they put in. In fact, we've had community members go out spontaneously and take those old hand-rolled unit tests that we had and we've been converting them gradually over time into catch-based unit testing. So there's very few of those left. Almost all of them are now converted to catch-based tests. So this was a blinding success. This was great. If you're doing C++ unit testing, catch worked for us awesomely. So another thing we wrote was a traffic.plugin. This is based on some work we've done internally for some security testing stuff. We put a simplified version out as an open-source project. This is actually in the traffic server repost. If you build traffic server, you get this plugin. And what it does is it captures sessions from a user agent point of view. So you say, I'm gonna capture one out of end session so it rolls dice on every session startup from a user agent and if it gets lucky, it says I'm gonna record that session, record all the transactions in that session. And this gives you rather accurate model of what your production traffic actually looks like. And this all gets captured out to files. You can adjust the capture rate during runtime if it's overloading your system or the system. You're not seeing any load so you wanna get more thorough capture, get more data faster. And the content, the original version capture the content for the open-source one on the other side that really wasn't useful. It captures the sizes so we can synthesize essentially random bytes for the content. In practice, it's extremely rare that you care about the content at all for testing the proxy so sizes are adequate. So that's what we did there. So let's talk about the testing framework we eventually decided on. So this is what a full-up testing process suite looks like. We have a client on one side that drives traffic in the traffic server. We have a server on the other side that accepts requests from traffic server and then sends responses back. So we have the full-loop into the proxy out back from the upstream back to the user agent. We have another process called the micro DNS process which handles the IP address resolution. And so we'll refer back to this and we go into detail about what these various pieces are. And so remember this architecture diagram of this is what we're aiming for. So the micro DNS server, this is an extremely simple name server that we wrote because it was just so trivial it was actually easier to write it by hand than to use a library. All you do is you give it a list of IP addresses and whatever the traffic server asked for and amount of wet name, it just picks the next address off the list and hands that back. So this lets us easily direct all the proxy traffic to our server without having to make changes in traffic server and particularly not making changes in traffic server configuration. So I was very concerned about highs and bugs here whereby the fact that we changed the way the proxy is handling traffic, how it's doing mapping, how it's doing layer seven routing that that covers bugs and we do things in a way that doesn't let us discover actual things. So I wanted those rules that we're using the configurations using to be as close as possible of actual production configurations. And the micro DNS lets us do that. So even it talks, the only configuration change is to have a talk to this name server instead of the normal ones. And then all the IP address come back as the test server addresses and it just connects to them. So and this also lets us, we put the list in so if the test server is getting overloaded, we can put in several of them and around Robin between them and get better performance that way. So the micro DNS has really been a very handy little tool. So another project came out of this is called proxy verifier. You saw in the diagram we had a verifier client, verifier server. So what we wanted here was to have one configuration that drives both side of the process so we make sure they're always in sync. And we call that the replay file. So this is really a description of the traffic that we want to drive through traffic server and various other things that we'll talk about in a minute. So it records sessions and inside each session the set of transactions, the transactions had the user agent request, the proxy request, the upstream response, the proxy response. And these are described in a way that either are used to send the data or to check the data proxy sent out. Now this actually was open source this year so it's available for other people to use. There's a URL for the open source for this. For this, this is not required for just traffic server. It doesn't require a procedure plus all those written in C++ but anytime you want traffic to pass through something this is very handy. And in practice actually, I frequently use just the server part to set up very specialized web servers and it makes it very easy to say, okay, when you get this request in I want you to send this response and get these lists. When you're testing precise behavior of your proxy to your other sort of web application it can be really challenging an HPD or a Lighty set up in a way that it responds the way you want. With the Verifier Client I can just specify, okay, on this request, here's the headers I want you to send back. So the way this works is we have the Verifier Client, the Verifier Server, we have a replay file. As I said, we have the four elements in there. So it starts off by the Verifier Client finding transaction looking at the user agent request for it and sending it up to traffic server. That arrives at traffic server. Traffic server then processes that as part of the test system and sends it off to an upstream server which the micro DNS is pointing out the server over here. The server then uses data inside the proxy request to figure out which transaction is actually being processed here. And there's a variety of ways to do that. We made this very flexible by default. It uses the URL. You can also specify a particular header field. So a lot of the tests actually say, we're going to attach a header called UID and use that to indicate which transaction is being used. You simply generate a distinct UID for every transaction, put the header in there, and then the Verifier server can look that up. You can look up other headers. You can use the URL host, the host in the request, if you want to do that. So we made that really quite flexible. You can do whatever you want. Then the Verifier server does look up. It then uses the proxy request data, not only to do the look up, but also you can set verification options on there to say, okay, I want you to make sure header X has value Y. I want you to make sure header X is there or make sure header X isn't there if we're testing sanitizing to say, well, I want to make sure these headers from a user agent never make it through to the upstream. Let's validate those headers were in fact removed. So it does that. Once it's done the verification, it goes into the same record and says, okay, here's the upstream response. I'll send that back. That thing goes back to the traffic server. Traffic server processes that if an upstream had actually responded that way and then sends its response back to the Verifier client which knows which transaction on. So it looks at what its proxy response records as well. Let me check against that proxy response and make sure that in fact, the things that are supposed to be there are not supposed to be there or the values are supposed to be are all there. You can check status codes and say, well, I expect that to be a 403 or a 502 or a 200 or a 204 or whatever and make sure that we're getting the status and the headers and whatnot that we want back. And this is really very nice because if you actually do this kind of testing there's a lot of variance in the headers you get back and writing regular expressions or doing generic fashion to make sure you get just the headers you want and that they're doing that is really frankly kind of painful. And one of the things this does for us is that you say in the replay file say, well, here's the headers I expect back. I want you to verify this one and this one only and I want you to make sure these are equal and not equal. And it makes it very easy to check exactly what you want with a lot of additional complexity. So we've been very happy with that. So let's talk about the overall framework. So great, so we have all the pieces we have the architecture and we have the process layout that we want though the process is to run that. That's great, but we don't want to do is make the developers continually every time they're at test do all the setup of all these processes to get everything running. That is a serious pain, it's easy to get wrong and it's really pointless to have to duplicate that every time. So we went out looking for a framework to say, well, what framework can we use to make all of this work? That's easy, fairly easy for developers to use. So serendipitously about the time we were reading serious about this another developer who used to work at Intel named Jason Kenney joined the team and he'd actually done quite a bit of QA and testing in Intel. It had written this Python framework called gold testing framework which we call AU test because AU is gold and we said, well, great, this looks like basically it's close enough to do what we want. We have the developer at house he can do any extensions that we want. He can get his bootstrapped onto using it and he can explore this for us. So we to a large extent decided on this because we had the in-house guy and frankly, the other frameworks that we looked at were roughly the same equivalent. There wasn't a strong impetus for one or the other which is why the having developer in house tipped us this way. So what AU test provides for us is the test management. So we can say, we can say just when we describe the test we're gonna say how the test wanna work. It then does the process management for us to set up all the processes and then run the test client processes as needed, look at the results, do that out. It provides very detailed diagnostics which is really nice. Jason was in the same page as I was about how testing systems are supposed to work. So AU test was already providing very detailed diagnostics about what it was doing and why it was doing it and what happened when it did it. So we want this and so this is the top level full-up testing framework. So we do full-up testing, we fire up AU test. It looks at the test files that we have and then it goes off and it runs, sets up all the processes and run the tests. So the way it works is you have test files as always, these are Python code. Each file creates one more test, although in general we tend to stick to one test profile although there are some that have more than one test in them. And the tests are defined as saying having a independent process management system. So each test sets up the processes it wants and then does test runs on it. So each test is divided into these runs to say when I set up the test I set up this set of processes, these configurations for traffic server. I run a bunch of commands to do each of those being a run. Each time I run those, AU test will then check the output and verify whether that run was successful or not and generate diagnostics if it's not. One of the nice things that Jason put in AU test is if everything was perfectly, it doesn't tell you anything, which is awesome. It just says, hey, everything worked and that's great. If something goes wrong, however, then it provides a lot of detail about what exactly didn't go right. So all the logic in the test files is descriptive, not imperative. It doesn't run the test, it reads the files. The test files really generate data structures. And then after all the data structures are done, then AU test goes through and actually executes those data structures against the traffic server. Now, one thing I wanna point out here is that AU test of itself is incapable of testing traffic server. It's designed to be specialized for a particular testing framework, which is a job that we did. And this is primarily Jason's work and that he wrote extensions for traffic server that convert AU test from a generic framework into a traffic server specific testing framework. And this is all part, again, of the developer automation stuff. So when you want to set up a traffic server in your AU test, there's simply a Python command that says, hey, set up a traffic server for me. And it goes and does all the hard work of updating the AU test data files, test data structures to set them up in a way that when they run, a traffic server comes up. There's a specialized one for micro DNS. So you say, okay, well, I want a traffic server and I want a micro DNS and I want to verify our client for me. And then those extensions in conjunction with AU test arrange for all these processes to fire up for you and be okay. And it tests, well did the process come up? Did it crash? Is it working? Is it ready to run? And it does all that work for you. And this is a big, big deal in terms of getting people to actually write tests. Because now, so they have a traffic server as a one liner in your AU test file and you decide just want the traffic server. And it also has extensions to do all the configuration files. You can say, okay, well, I've got the configuration that has all these two double knobs. Here's the knob that I want to set. Here's the value I want to turn that knob to go in and update the configuration file for me and they call that work. And so that's really, really nice. So, and this is a key part of the ease of use. It's one of the key themes I want to talk in here about and this whole talk is about, is about how techniques for making this easy for developers to use. And so that's really a key part. So, what are the results of all this work? So this was actually, as you can see from the dates on here, this has been multiple year effort that we've worked on. Basically from 2015 to about 2017, we did a lot of the infrastructure work, built stuff, last about two years, two or three years is when Jason's been working heavily on the testing for him to gain the extensions up and then probably, actually better than that. So probably about a year to get the basic infrastructure, probably 2016 or so I guess I should correct myself. It's been a year or two getting all the age test stuff up and running and fortunately, even though we said developer time was full time for anybody, we did have quite a bit of developer time and then we've had the community working on this building tests. And so actually, I'm pretty happy with this. So we've got 178 test files now I just checked last week after our summit, which is more than automatic to than any other ones get. We're still adding AU tests. We have tests written by a good number of people in the community. I would say probably at least half the communities actually worked on a written one test, which is just a fantastic rate of adoption compared to previous efforts. We've got the community agreement. We've committed to using AU tests, the utility's been demonstrated off that people are saying, well, although it's got problems and we need to work, needs to fix up, we're gonna do that. We're gonna keep with this system and we can see the potential in it. We can get this to where we really want to be. The opinion of myself and certainly other senior developers is that we've had much more stability in our releases, particularly with ATS 9, compared to some of the earlier releases that really drove this process getting running in terms of internal adoption inside Verizon Media. The process has been just vastly smoother than the last release upgrade that we did, which frankly was, well, it wasn't a disaster, but it was really, really unpleasant. This has been just difficult, but just so much better than previously. Very happy with that. According to Jason, our code test coverage numbers have gone from 12 to 58% with these tests. So that's great. Obviously we're still moving forward. We're writing new tests, but overall, I think this has been a, it's clearly been a big win. It's just so much better than it was. And we have a clear path to move forward and continue to improve this. We're able to actually test complex scenarios. Internally, we have a, in Verizon Media have this proprietary plugin called Carp, which basically shards the cash across multiple ones of traffic servers. So if you have a pod with say 16 traffic servers in it, we can split the cash 16 ways. We get 16 times the size of the cash in any individual traffic server, which is great. And so what the Carp plugin does is when the request comes in, it looks at request decides who is the host for that particular request content. And then either if it's local, does it locally or routes it over to the other, some other one. This is a plugin that has lots of subtle things you imagine doing real time, cross thread, cross process coordination is of course always challenging and getting everything to talk about having machines come up and down. And one of the things we were able to do with AU test that we couldn't do before was actually test some of these complex scenarios with multiple traffic server running and running traffic through multiple of these at the same time. So that was great. Previously, you know, without AU test, we, the only alternative would have just tested production and hope, and hope we can debug fast enough in production. The offline testing has discovered numerous problems. And in particular, when we've seen problems in production, we're able to bring that in-house, replicate it in the in-house testing and then fix it in the in-house testing rather than doing testing the fixes on production boxes. So that's been really great. We're very active in production simulation, that's going well. So, you know, we're finding bugs with that too, especially using ASON, the GCC address sanitizer, so that we can run. That's hard to run in production because although it's not huge hit, it is a pretty significant performance hit. But if we run that in our simulation boxes, then no customer actually complains about things being slower, even if it's actually running slower. And that's been very handy to find a lot of bugs. So, we've been pretty happy with that. So, now we're gonna get into conclusions. What have we learned from this whole process? So one of the things that I learned was sticks don't work very well when you want people to write tests. So, you can do about 10% stick, but you need about 90% care. You really have to have something that, at least if not dreadful and hopefully even attractive to people to use. You simply can't beat people in, particularly in an open source system into writing tests. What you'll do is you'll simply drive developers while they're just aside, it's not worth it. They'll get upset. It'll cause stress in the community. We've seen that before. We've done much better with the current testing system in terms of people find it useful just for testing in their own. Certainly I found it that way. I'll talk about that in a bit. Yeah, there's two of those slides. So overall, I guess if you want one lesson, one thing I've learned is it's about ease of use. The testing system has to be relatively easy to use. It has to be simple to set up other tests. And of course it's never finished, right? I don't wanna say that we've built this perfect system and everything is great and wonderful. No, we've got a system that's enormously better than it was before. We're doing far more testing. We're getting much higher reliability out of our production releases. But it's still not where we wanna be. So we're still moving forward. And as I said, at least we have a clear path now to say, okay, here's the things we can fix in our testing system, which will give us better and more thorough testing. And so we're getting better at that. We're clearly, we've gotten much better and we're continuing to improve that, which is really what you wanna aim for. That's what you can realistically achieve. You know, every little bit helps. Again, ease of use. You don't just look for the big splash, the big picture, even little things add up over time. So don't be afraid to, you know, take them and say, well, let's do this, let's do that. We just had one simple change. I'll talk about in a bit that's actually had, it took to help for half a day to write. And it's really made a significant improvement in how things are using. And so you wanna be able to write tests. If you're doing this, if you're driving one of these projects, you wanna write tests, you wanna talk to the community, you wanna find the big pain points and work on those first, right, to get those out first. And that comes to the next point, which is premature optimization. So there's a lot of chicken and egg here in, well, you wanna have a testing system out and you wanna get it used, but you don't wanna have it really painful to use and have people not like it. So how do you do that, right? If you write something really simple, then you have a hard time as we saw with the TSQ and the new CSQ reaching that takeoff point where people started improving it. But if you go too far, you might lock yourself into things that just aren't gonna work. So this is honestly, this is a hard thing to do. It's a hard choice to make. I like to think that I made good choices here based on the results, but it certainly wasn't an easy thing where I just said, oh, obviously we're gonna do this and then that. We're gonna stop here. There's a lot of thought went into this and what I would say as well, just accept that, right? This isn't something you're just gonna say, well, here's what we wanna do. You need to really think very carefully about what you can do to get that critical bootstrap level without locking yourself in too much. Flexibility in a framework is one of the things that we've liked about AU test is that we're sufficiently flexible. If we see some problems, we can adjust things. But you gotta start with some level of, minimal level of ease. You're gonna have to accept that you're gonna have to vote some serious design and development time to getting that to that point. To get to that, to get over that first hump and get people to start adopting it and then you can move forward. As I noted, yeah, find out what really is annoying people about using the testing system. Talk to the community, right? We're open source. We're supposed to talk to each other. We're supposed to discuss these things. And if you're driving this, you have to accept that people are gonna have some brutal and vicious complaints about your testing system. And that's just the way it is. Accept that, roll with it and say, well, what is really the problem here? What can I do to get a different set of complaints? Because you're not gonna get rid of the complaints. But at least you can move forward on them. And don't let the perfect be the enemy of the good. This is something that I struggle with personally and I've struggled with during this whole project is to do things that aren't quite right, that aren't quite perfect, that are not, you know, I don't wanna say shoddy, but really not what you would want to do. But there's, again, there's this trade-off in saying it's better to have something that's decent two months from now than something that's really awesome a year from now. Because maybe the awesome thing isn't gonna turn out to be awesome, right? And again, you have a balance point where you wanna make something that's good and decent and usable, but don't delay it because you have this perfect image in your mind you're striving toward, better to get stuff out there, get it in the community, get some feedback on it. So, and this is the hardest thing. And, you know, I don't wanna tell you this because you're not gonna like it, but well, there it is. Lead by example, if you're driving one of these testing efforts, if you're doing this kind of thing, you're gonna have to write tests. And if you find that you can't, you simply can't do it. Well, you had to say, well, why can't I? What is wrong with my framework that even I, the guy in charge of this, can't write tests? So, don't get frustrated, get informed, get creative, say how can I overcome these so that at least I am willing to write tests in this framework. Because I guarantee if you're not willing to, nobody else is going to. And another thing you need to accept that we discovered, I should have known this from the beginning, but again, of course, the most painful lessons are usually things that look like they should have been really obvious when you started is that you're not gonna have a lot of people writing brand new tests. That's simply the way it works. You say, but gosh, Alan, you've got almost 200 tests here. Where did those come from? Well, what happens is, of course, and this is obvious in hindsight, what people do is they copy tests. They say, I need a test for X. And I wanna look, what they won't do is start from scratch, look around for existing tests and see if they can something that looks very similar to what they wanna do. Then they'll copy that test and tweak it a bit. And this is simply the way it works. And you're gonna have to accept that. So, one of the things you'll need to do is be able to write tests from scratch. And always when you do this, think of them as seed tests, right? These are initial colonizations, initial efforts in getting stuff out there that people are gonna copy. And so keep that in mind when you're doing your tests. Take some extra care. If you don't, it will come back to haunt you. And again, I know this is asking a lot about if you do this kind of vector work, but again, it sets an example for the community. Again, we're open source. Leaders simply have to do better or you're not gonna inspire your team. You can't force them to do stuff. And the more examples you can get, the more new tests you can get, you can enter into what I think we've achieved here is a positive feedback loop now where other people are writing tests, other people are branching off tests. We do have some people, the few dedicated ones who do in fact write brand new tests and that is awesome. And we want to encourage that, but we have to accept those people are simply gonna be cloning or updating existing tests. And that needs to be easy. And the best way to make that easy is to actually have those tests that they can modify. So you need to achieve that critical mass of ease and use and get started. And really, if you don't do it, no one else is gonna do it. So accept that going forward. All right. So we do have a few things I wanna talk about some of the stuff. If you're building this, things to think about putting in early, we've gotten by without them, but we've really missed them. One of the things I talked about that was easy was when you're doing ready-inch text on the processes, that wasn't well done. It worked okay, but not well enough. And we'll start putting in just arbitrary timing delays to make sure things were up, things were ready. And that really has just led to flakiness in the tests themselves. And that's a real, real problem. The tests get flaky and people just say, oh, well, it's probably just the test that's going wrong. If you don't have confidence in the testing system, then they'll just disregard it. So that's really a critical bit. And we made just a simple improvement where we can check files for whether certain text phrases are in there and we tweaked the traffic server itself. It generates these at various key stages so we can say, well, we can watch this file when this text shows up, we know that this is ready to go. And that took one of our developers half a day to tweak that up. It's just not really complex in Python. And it's really making a big difference. The tests are cleaner or more reliable. So these simple things, and again, we learned this because we got together, we just had a summit recently and we had a long frank and for me, painful discussion, but I accept that about what's really going wrong. This is where things came out of it. And the point is that for that pain, what they get is I get the ability to get the community on board to say, well, if we make these changes, here's the things that can fix this up and move forward. Another thing I think we're lacking that we'd like to do is a grouping mechanism. So the AU test framework has the ability to run particular tasks. You can do reg express filtering on the test names to only run certain tasks. And that's okay, but we really don't have a good naming convention. We probably should go and fix that honestly, but we don't. But it's still be nice. And this is one of the things catch does really well is to be able to put tags on your tasks and then be able to run tasks with those tags. And so this would really be a big help to say, well, this is a spoke test. It doesn't really test very much. It's just to make sure that basic functionality is running like for instance, traffics are actually fires up, plugins actually load, remap configs actually load, which surprisingly has been really quite useful. It's easy to get something linking wrong and this lets you test that early on. So that would really be nice. And again, that all speaks to ease of use about saying, well, I can run just really, if I'm a developer, I just run some very basic set of tests to lose. I can run tests that are specific to me. I can run regular basic tests and we can have much more thorough tests, release candidate tests that you just simply can't run on PRs and whatnot because they're just really huge and expensive. So, if I wanna grind for six hours with production simulation traffic, you're not gonna do that on every PR that comes in your open source project. But it's worth doing on a release candidates. So really, that's about it. That's all I wanted to talk about. Left a little bit of time for any Q and A if anyone has any questions. All right, here we go, we got a question. Oh, literature on the, oh, golly, no. So much to know if there's any good literature on testing and honestly, no, I can't put out any recommendations for that. We didn't really look at a lot. We ended up doing really was looking more at testing systems and reading up on those because we had the design first. We said, well, this is the design we want. These are particularly needs we have. We then went out and looked at testing systems and say, well, what testing systems can do the things that we need? Give us the details that we want. So it was really, we spent most of our time doing design and then researching actual testing systems out there. That's one of the things to let us to catch. It did the stuff we need. We adopted it and it worked great. Not sure how much the literature would help. I mean, this talk really, what I focus on more is on, some of the features, I didn't get a lot into the technical details. I'm not sure how useful that is, frankly, for other people looking at this. I think the flow of the open source and the thematic elements are much more important to what you're doing. So that's what I would say there. All right. So I guess I was really throw in detail in my talk. I don't have a lot of other questions. Again, almost all of this is open sourced, all the traffic service stuff, everything we've talked about there, the proxy verifier, AU test, all of these are open source projects. All the traffic service stuff, you just go to the traffic server repo, everything is there, all the AU test extensions. So, yeah. All right, five minutes left, all right. So I guess I'll just riff for my last five minutes then for the other test. So I thought I had this slide in here, but I'm not seeing it now. Oh, there we go. How did I skip that? So I'm gonna talk about a little about transaction box and testing. I'm not, transaction box itself, I'm not gonna get into detail. And then it's a very complex plugin. It's an exploration and using YAML to do large amounts of configuration for traffic server, it's open sourced. You can look it up if you want, I doubt you care, frankly. Oh, I heard something on the talk. Yes. All right, so, sorry, sorry, technical glitch. Anyway, for what's interesting here is I ended up adopting AU test. So, lead by example, yes, I can necessarily use AU test, I did it for traffic server. And when I had to test my own, basically, personal project of a very complex piece of software, an interactive traffic server, I ended up going with AU test and using the same infrastructure that we built for traffic server. Well, you know, no small part because it's traffic server plugin, you test it in the same environment as traffic server and that's fine. So we did that. The plugin itself doesn't really lend itself to unit testing. So I'm not really using unit testing on that. I have some separate libraries I built that I unit test those using catch, of course. So we've experimented with AU test extensions for proxy verifier, learned some stuff here in terms when I'm eating my own dog food in terms of easy use and how to automate stuff. We got, I think, much better here. We're gonna be whirling some of that stuff back into traffic server. Again, the theme here that I want you to take away from this little thing is use of stuff, incremental improvement, ease of use, see what the pain push are, see what you're doing manually all the time and then try to automate those. Flexibility, you have to keep that because you don't wanna get too specialized. And again, there's a lot of balance here. It's part of being an architect, a designer here of doing this. I really liked, I really love proxy verifier, especially as I said, having the server and be able to have this very precise control over the responses that's been really handy. So, really, that's all I have. Yes, what is it? Oh, let's see, anything else I can give you 30 seconds of? Yeah, so the replay file for this, this is actually YAML, it can be JSON if you like. It's the traffic dump, basically Alpes in the same format. One thing, if you do use traffic dump, you wanna be a little careful, we have a post-processing sanitizers that say, okay, we've dumped the traffic, but it was live production traffic and there's some personal ID, personal identifying information issues there. So we have a sanitizer process that takes the output of traffic dump and sanitizes them so that they can be kept long-term inside our testing framework. After all that's been purged. For instance, we strip, we rewrite cookies. So if you have some session cookie, we convert it to random text instead of the actual session cookie, that kind of thing. So if you're using that, doing production simulation, you're gonna wanna have your own sanitizer and that's okay specific. I don't know, there's nothing I can tell you in general about that, is if your personal identifying information and clean it out. Again, because it's JSON, it's very easy to do. So yeah, the proxy verifier, pretty performant. We've had it up to about 36,000 RPS for a single client server pair. So we generally do about three to 5,000 RPS on most of our servers. So we can easily simulate production traffic using this setup at production level speeds. So we did have to rewrite it in C++. We tried for Python, we simply could not get that level performance. We never really got above about 1,500 RPS in the Python version. So we got more than order of action improvement by going to C++. So that was a key thing. So, right, I was hoping for more questions, but yeah, I'm doing a technical check. Thank you, thank you all for attending. It's been a wonderful experience here and hopefully I'll be again, it's a future OSS.