 I'm Nicole van der Hove, and I'm not actually Dutch by blood. I'm Dutch by nationality, though, and residence, which is what counts. I'm a developer advocate at Grafana Labs, and today I'm going to talk about emergent load testing, one of my favorite topics. I've been a performance tester for over 12 years, and so today I'm going to be telling you what emergence is, and then I'm going to raise some questions and maybe have a few answers for how we could apply that to our testing, especially load testing. I say maybe because while I do have code for some of these things, it's always difficult with a mixed crowd, so consider this presentation a little bit more conceptual. It's more of a testing approach than a fully fleshed out framework because I don't believe that many of those can be applied to everyone anyway. So this is my definition of emergence. Emergence in general is the evolution of a whole beyond its parts in unexpected ways. We're going to go over this and we're going to have examples, but in general it's called an evolution because it is something that is iterative. We're familiar with this as engineers, I'm sure, and it is also important to note that these changes come about after generations. It's not something that just happens. We can't always plan for emergence, but we can do a few things to cause it to occur. So the whole refers to the entire system, so this could be an actual application, or it could be nature, which we'll see in the next slide. The parts refer to individuals or components within the system, and then the changes that are happening here are going to be unexpected because just like nature, things sometimes happen, we can plan for things, but other things that are unexpected always come up and we just need to, as engineers, be able to cope with them. So let's look at an example in nature. This is such a cool picture, it's not my picture. Ants are really fascinating creatures because if you take an individual ant, they have very simplistic brains, they're insects. They're driven to do certain things and that's it. They're not really thinking about the philosophical implications of it. And it's actually a bit misleading to think of an ant queen because that word doesn't mean for ants what it does to us. Ant queen, ants don't have leaders. The ant queen is just another ant with a different role. And when we think of a queen, we think of some sort of leader, whether that's an actual leader or a political figurehead, it's still some sort of leader. They don't have that. In an ant colony, nobody's actually deciding what the colony is going to do. So how do they figure stuff out? Somehow they managed to do things like make bridges with our bodies. I mean, I think that as humans, if we were told to do something like this, I think we would struggle. And that's with our supposedly superior intellect. So ants as a colony somehow managed to display a characteristic as a colony that they don't display as individual ants. So it's a bit difficult to be able to pin down exactly what causes this to occur. But there are some things that we can point to that do facilitate this weird phenomenon called emergence. First, emergence requires a system of organized complexity. A super simple system or a super simple question is just gonna need a super simple answer. So nothing really needs to emerge. This is just logic and we're following it through to its natural conclusion. It also has to be organized though, because if it were completely random, then there's nothing, there are no patterns that we can point to and it's just going to be anarchy. The second thing is emergence requires diversity. Diversity in terms of sample size, there have to be a significant amount of ants, a significant number of ants, before you can say that one characteristic that they display can be abstracted to the whole. But also a diversity in approach. In an ant colony, there is a queen bee, a queen bee, okay bees can be part of this as well. But there is a queen ant and a worker ant and a lot of other types of ants that all have these specializations. They don't all have to agree what they're doing, they're just doing their job. So we have to also try to capitalize on that. And emergence also requires network communication. There's no ant that's shouting and giving orders to the other ants so that they can make this ant bridge happen after all. Each one is constantly communicating in different ways through pheromones and stuff, but they're not communicating to everyone. They're communicating to the neighbors that are right next to them and those neighbors are communicating to their neighbors. And so there is like the spidering out even though they don't have something that's as sophisticated as our communication networks. The other thing is that emergence requires feedback. It has to go both ways. An ant tells the rest of the colony where food is, but it's also getting signals back about where ants have been already and where it shouldn't go again. There's also agency. Now this is the difficult one because ants do have an element of agency. Each one gets to decide what to do on its own. It is not a carbon copy. It's not a machine. There is some play in their responses. And lastly, emergence requires serendipity, the possibility and intellectual capacity to be imperfect. And you might think like, why would we even plan for something to be imperfect? Well, for example, in the case of testing, why would we try for a perfect test when our systems aren't perfect? In fact, it might have the opposite effect if our tests are super clean and only test for the expected. So with the ants, they might not all react in the same way and that imperfection can facilitate emergence. So time to apply this to software. This is made appropriately by AI mid-journey. What if software could be evolved and not written? We always think of software as something that we write. What would that even look like? What would emergent code be like? But you might already be thinking, isn't that just artificial intelligence? My answer is no, and here's why. There are two reasons. AI is supposed to be explainable or at least good AI is supposed to be explainable. Meaning even if you jump to the result like this image here and you don't know what it looked at to get there and it's a surprising result, you could still, if you wanted to, you could look at the code and still be able to see why it made the decisions that it did. There's a decision tree, there's inputs. Every step is a logical progression. That's not how it works with emergence. Like when you look at evolution, like in the Galapagos Islands, there was this batfish. It's a fish that's already called a batfish, so it was already starting from not very much. But there's a leap in evolution that it made where suddenly it evolved red lips. Why would a fish need red lips? In the bottom of the ocean, we don't really know. That's what I mean that emergence isn't explainable the way that AI is. And the other thing is that when we say AI these days, it is so vague. We just call everything AI that's even remotely clever. Sometimes it's just like a particularly cool bit of code and it's like, oh, that's AI because it's smart. That's not actually what AI is. And that's also why we can't say that all AI is emergent, but some AI can be emergent. Imperative code is probably the kind of code that we're most familiar with. It's mostly what we use today. It is sequential and it's instructional. So it's a series of steps. Just follow A to Z or Z if you're American. And you go, if this, then that, else that. It's all very logical and all computers just go through it the way that it was written. There's also, it's also very precise. There's no room for ambiguity. The computer doesn't sort of think like, oh, well, I don't really feel like doing that. It just has to do it. So for the Trekkies out there, it's like the Borg. The Borg are an incredibly complex and very, very powerful enemy really, that nearly wiped out the Federation many times. But they do have a weakness and that's the Queen because the Queen is where it all comes from. They might be complex machines, but they are still just drones. And that's why you see the Federation going into their actual spaceships and being able to do to wreak havoc there because they're not really thinking in that way. They're just following their instructions. So they're still controlled by a central mind. This isn't emergent. Not that it's bad, but it's not emergent. The next kind of code that already exists is declarative. We're seeing this more in our environments. It is goal-oriented. So this is, the declarative code more describes what the end result is going to be. We're not really saying how we're gonna get there. It has a defined output. There's a state that you said at the end. That's the intended result of it. Yet it does have limited instructions and maybe on purpose because we don't necessarily know what something, what needs to happen to get to the end result. So the way that I think of this is like Zelda, Breath of the Wild. There's a puzzle. And you know generally how the puzzle should be finished or should be completed, but there are actually a lot of things that you can try out, like a lot of different solutions. So this is a little bit more open-ended and it is closer to emergence, but it's still not quite there. Because remember that emergence is about new and unexpected behaviors. This is still not new or unexpected. So what kind of code could facilitate that? Here's where I need you to take like a leap of faith with me and suspend your disbelief because I'm proposing something that I'm calling generative code because you need to have a name with something when you propose it. Generative code is something that is given a sufficiently large sample of input and then it autonomously produces a large range of output. So it's a little bit more vague. There is no set and goal. There is a starting point, but it's more like a range. This actually isn't new. This isn't my idea. For example, the actor model in computer science describes an actor that can make decisions on itself, on its own and formulate its own responses, but can also talk to other actors. So generative code provides input right up front, like here's where you are. This is the situation that you're in. And then it determines a range of affordances. So this is a term that I'm borrowing from design actually. When they talk of an affordance, they mean ways in which we can interact with something. So let's take a chair. An affordance of a chair is that it can be sat on, but if you ask my nephew, he looks at a chair, he tilts it on its side and it's a pretend train. That is an affordance of a chair, not the one we were expecting, not the one we planned for, but it is an affordance of the chair nonetheless. So it is broader than requirements. It's broader than SLOs. And generative code is open-ended because weirdly, there's no intended end result. I mean, you just intend that there is one, but what that is exactly, we don't know. And that also makes it a little bit more realistic. So if imperative code is like the Borg and declarative code is like Breath of the Wild, then I'm gonna say the generative code is more like a tabletop role-playing game. I played Dungeons and Dragons several times a week. I just came from a game last night. And the cool thing about it is there's always this part in a video game where you're like, I just wanna climb that mountain. And you climb it and you're like, oh, there's nothing here. There's no code here. There's no graphics here. You just kind of hit a limit and that's it. Role-playing games aren't like that because it's based on improv. So you have a boss there and it's not like you just have to fight it. It's like, yeah, the boss could end up dead. You could end up dead. You could charm the monster. You could charm some other dopey knight to come and fight the monster for you. You could kind of distract it and then sneak away with a treasure. It's so open-ended that the possibilities are really exciting. So imperative code explains what the goal is and how to get there. Declarative code sets the what, but not the how. And generative code doesn't set either the what or the how. Instead, I think that it more says where. These are the boundaries of play. These are the things that we can play with. These are the affordances. To some degree, our code and our systems are already displaying emergent behavior. Things happen in production that we would say are explainable, but not expected. And there's still like some sort of internal logic. But when I look at something like this, I consider myself primarily, first and foremost, a tester. And so I don't consider myself as much of a developer or a platform engineer. So what I look at, what I think about when I look at this is I think, how am I going to test something that's so emergent and so all of this like black box stuff, how do I pin it down enough to be able to test it? And the problem is that when you look at emergent systems and you look at how we're testing now, testing hasn't caught up at all. Testing is still imperative, almost 100%. Our systems are increasingly emergent like this piece of art. And when you zoom in, you might think that it's just like little bits of black and red paint. And when we test it, we just say, what is the ratio of black to red paint? That's kind of missing the point. That's useful. Maybe that's one of the things we want to check, but it definitely needs to, it needs something. It's lacking that je ne sais quoi. There's something the soul that isn't really translatable. We're still treating tests like they need to be precise, instructional, sequential. So how could we encourage our tests and because of my interests, our load tests to be emergent? Here's my plan. First, we could define the affordances of an application, the things that we can and want to test and also the things that our testing tool is able to test because that is a limitation as well. Second, we could follow in the footsteps of ants and assign certain types of tests roles. Third, we could give these tests, these ants, some sort of feedback based on their performance, not a goal because then we're back to imperative, but some sort of feedback so that they know where to go, like they have a direction to head. And lastly, we need to figure out how to get these tests to evolve. You've probably seen this list of illities or ones like it, so I won't go through all of them here. In design, I said an affordance is a way that we can interact with an object. I think that these are the ways that a test can interact with our system. They are different goals sometimes, they run in compliments, sometimes they oppose. Sometimes they're things that you have to, they're sacrifices you make at the expense of speed, for example. So when I think about the ways in which a load test interacts with the system, these are the things that I can come up with. In the next slides, I'm going to get into the technical part and I'm going to show you how we can test each of these affordances and then put it all together. Okay, disclaimer here, I am from Grafana. You may or may not know Grafana acquired K6. So I have to put that bias out there, but actually the stuff I'm showing you is so experimental that it can only be done with the free and open source version. So there's that, you don't need anything to use this. This is a tool called K6, it's k6.io. It is a free and open source tool. It's written in Go, so it has all of the performance benefits of being written in Go. And the scripting though is done in JavaScript because it's pretty much ubiquitous for lots of personas. And it's also like the convenience of an interpreted language. It's also got an open source extension ecosystem that anybody can create. Some of the things I'm showing you were made by the team. Some were maintained or first created by community developers. And it also it plays with others. K6 is super composable, super modular. And the fact that it's part of the Grafana suite of projects means that it also works really well with that observability stack. But if you have another tool that you're using, everything that I'm going to be mentioning, because again it's an approach, is going to work for that tool too. Here's an example of a very simple K6 load test. So this is on the protocol level. It makes a single HTTP request to an endpoint. And when it gets back the response, it checks to see whether the HTTP response code is a 200. That's a one step test. And to run that as just K6 run test.js on your terminal. It's not the breadth of what K6 can do though, because we're trying to test, I mean you could have different tools testing, different affordances, but like why? You should just have as much as possible, fewer tools. This is another affordance, browser testing. So the previous slide showed a protocol level test, which means it's sending the actual requests. I showed HTTP, but it could be GRPC or something. This is actually a script that interacts with the browser, that interacts with the application through the browser. So you're not making the underlying requests. You are scripting like the clicking and the typing and hitting the radio button and an actual browser instance has opened up. So it is pretty similar, except that it's all based on the DOM of the page and there's still a check. And the way that you would run it is to just have this flag, which is K6 browser enabled equals true. Browser tests in K6 can give metrics that add to what you can get with a protocol level test. If you're only testing on the protocol level, you're missing out on the actual user experience of how long does something take before it renders on the page or above the fold or how long before they can interact with something. Those things are important too, important parts of your application and user experience. K6 can also send test results to Prometheus via remote rate, just by adding this output flag. So you can get the protocol and browser level metrics going to the same place. That's especially handy if you're already using it. And then to take things further, if you've instrumented your system already, then the K6 tracing module just lets you instrument HTTP headers as well. So it starts the traces from that level from the test level itself. This is another one, there's only a few more. This is an XK6 Kubernetes extension. This is a community extension and it lets you interact with a Kubernetes cluster from within the test script. So this is where it starts to get cool because now we have a test that doesn't just take what it's given. It's a test that can actually spin up its own environment and modify it and tear it down and do whatever it wants with it based on things that it received from the same application. Imagine the possibilities of a test that is that smart. In a similar vein, this is in the realm of chaos. The XK6 Disruptor is also an extension and it interacts with Kubernetes clusters to inject HTTP failures. So this could be like delaying traffic or it could be like just returning an error or an error code or something in the body. And again, having a test that can do this means that it doesn't actually have to reproduce the actual error. It can just fire off this test and see what happens. And then for the communication part that I was talking about with the ads that have network communication, you could use something like K6 Redis as with Redis as a data store to kind of put load testing, not just results, but like certain things about how the test is going. And that could be like a central repository where they all are able to access it. It's another thing that would facilitate emergence. So here are the affordances that I was talking about. Now we've got browser testing, monitoring, tracing, scaling, disrupting and communicating. These are all available. This is like the starting line for any load tests. How do we put them together? I think the standard way as a tester of doing this is through something like scenarios. This is how it's done in K6. There are three scenarios here running concurrently. The checkout HTTP is the protocol level one. Check out browsers, the browser one and this disruptor is the XK6 disruptor. Each of them can have different stages and test parameters. You can have different virtual users that are set in it and they can all run at certain times. But this still isn't really generative or emergent. This is still pretty basic. This is where it gets more theoretical. I think that the main thing that we're missing is agency and agency requires choice. Our tests don't really have choices when we're scripting everything out for them. So I think it's about having on the left column feedback. What are the things that a test can actually know about the state of the world, the state of the application? And there are things like response time, response code and so on. There's so many of these and they're going to be so application specific. But the first step is identifying them so that you know what you can play with. And then on the right side, these are the affordances. What are the things that the test can actually do? And that's why I had to go through all those script examples so that I can show you that yes, load tests can actually interact with the application in these ways. The key thing is not to draw a correlation. You can't say when response time increases then decrease the number of virtual users because then you're putting what you want onto the test and you're not letting it decide for itself. This is a way to test our own expectations. There's a time and a place for that kind of testing. But when you're purposely doing something more explorative like this, then I think it's better to just leave it open for it to make its own connections and to try it all out. After all, after a few generations and a few executions of the test, if your premise is right, if your theory is right, it's gonna end up there anyway. The next part is assigning roles. So think of a group of virtual users or instances of this load test as a type of ant, like a queen ant or a worker ant. We could give them roles the way that those ants get as well. So we could assign our tests roles, some of them maybe prioritize elasticity or scalability, some of them maybe prioritize speed and then explore how much do these overlap or maybe they don't overlap at all and that's useful information too. You can think of this as an ant army. Personally, I like that idea and that metaphor more than the idea of a simian army because a simian army is just, well, it's really random, it's chaotic and sometimes not in a realistic way. I would prefer to have like this considered collective intelligence of ants all going towards the same thing, all trying to do what they want to do but somehow what comes next is going to be something that's more considered than just like a monkey with a wrench in a data center. So I like that analogy better. So we would still need some way though to give feedback. We can't just let them lose and then what do they do? What does that tell us about our application? What if they do something else entirely? So we have to give them some sort of score and that's why I think we should also think about assigning each ant or each type of test, some sort of score dependent on their role and this would be like maybe the response time would be more important for the speed ant, that kind of thing. It's important to note that the scores aren't for us to judge them. It's for them to judge themselves because they're not playing on the same field because they have different roles. And then I thought like we could also allow evolution of these tests. So I showed you K6 right as earlier, what's stopping us from having at the end of the test some something that writes out a factor or that score perhaps somewhere else and then we could let each ant assign a variable as weighting each of the affordances. So it could decide for itself like, oh, I tried this, this time as a speed ant, I really paid attention to load tests. Did that actually work? Let's check the score. And then of course at the end of the cycle they can then vary that weight as well. This is imprecise on purpose, because every application is going to be different. Some other things to consider I'll go through these quickly. Should SLOs be provided as input? I thought about this, I'm not sure. In some cases I think it could work. In others I think it might be too restrictive. Should it be role dependent? I'm inclined to say maybe yes, maybe not all of them should get all the SLOs or should have all the same input. Then I thought about what if they had their own initial like evolution cycle, different runs of the test where they go through this and they get the score and assign the weightings and have them do it on their own first before they kind of play together. That's an option too. Other affordances that we can add, we haven't touched on accessibility at all for example, is that important? Maybe there should be another role as well that is the like accessibility ant or something. And then the question of how to correlate your scores. The answer is you don't. You get the last part because evolution doesn't work that way. You know, you don't sort of cut it off and say like this is the part that I want to keep. You just take the last part. Okay, so we've ticked all the boxes for emergence here. I've already gone through these. But the question is just because we have this doesn't mean that we are going to have emergence. And then what will we, how will we even know? Like what are we exactly waiting for? What is supposed to emerge? What is the point of this testing? We might be tempted to say that the reason to test is to induce failure or is to find out what to do to get something to fail. I think it's much more realistic to just assume it's going to fail. And instead, what emerges is thresholds of operation. That's really the point of all of this. Emergent load testing happens to be really good at being able to map out the boundaries of what your application can handle in a way that you didn't perceive beforehand in a way that you didn't predict beforehand. There are some disadvantages that I want to mention. It takes time. We're talking about several evolutions. It's very loosey-goosey. Yeah, it's going to take time. That's why it shouldn't be something that you replace all of your existing test suites with. It's also unpredictable because you don't know if you're going to get the same thing twice. It's also unrepeatable, not repeatable. And it's unreliable from the point of view of not getting that consistency in results because you're trying to get them to evolve and nature doesn't work that way. So when does emergence work best? When a system is increasingly complex, the more complex a system is, the more resistant it is to these imperative-style testing. And that's when emergent load testing really shines. A system still has to display internal logic, can't just be like complete chaos when there's sufficient time and when exploration is the goal of what you're trying to do. So when you have a very specific defect and you want to reproduce that in production, this is not the one to use for it. This is the one that you use when you're just exploring. Summary, emergence is the evolution of the whole beyond its parts in unexpected ways. We've talked about imperative code and declarative code and how generative code could be the name for a new kind of code that sets not the what or the how, but the when. And then we talked about how tests haven't kept up with any of this at all. We are still doing imperative testing for the most part. We might be declaratively deploying some tests, but for the most part, it's still imperative. There's a big gap here. And then to encourage emergence in load testing, here are some of the things that you can do. You can define the affordances of the application. How can a load test interact with the application? You can assign roles. You can rate each test and then allow imperfect evolution. The imperfect part is important. And then emergence is what you learn about the tolerances of your application. And also, emergence isn't a silver bullet. It's not the end all and be all. It is one tool in our quiver potentially. Okay, that was it. These slides are all available on my site. I just published it. It is the last blog post. And all of the reference things, all of the code is referenced here, including the stuff on K6. And this was inspired by this book by Stephen Johnson, Emergence, The Connected Lives of Ants, Brain Cities, and Software. Thank you, everybody, for listening. I think we also have time for questions. Do we do questions? Do we have any questions? Yes? Oh, yeah, I think so. Thanks for that. My mind is thoroughly, let's try again. Oh, okay, I think it's better. Does yours work? So, yeah, my mind's thoroughly blown. Awesome, and a good way, I hope. But so here's the question. It's a comment saying asks for forgiveness, not permission, right? So when things have agency and motivation, not all affordances are given, right? Sometimes there are insects in my house that I don't want to be there. They have an affordance. I may not like it, but they have one, and I've discovered it by the fact that they are there. In the process of this, did you discover affordances that you didn't realize that you'd given your generative agents? First question. And the second one, how does this compete with fuzzing? I don't need agency. I can just throw randomness at a thing. In what way is this better? Yeah, I definitely did. Especially when you get into the chaos stuff. I mean, I think you could, we're setting the where, right? So you can still apply the thing from chaos about starting small and then increasing the blast radius. So you could do it that way. A lot of the disruptor stuff, it was like, oh, I was trying to tune that. Well, now it's gone. There is a lot of that as well. So I think I don't have a concrete answer for you because you kind of just have to wing it and slowly increase the blast radius, I guess. And I think there's still room for fuzzing and there's still a cause for that. I think this is more an addition rather than a subtraction, if that makes sense. But I'm glad your mind was blown. Any other questions? You can also, there's a question there, but you can also ask me on Twitter. And, oh, I think I put it in there. Actually, don't ask me on Twitter. I don't do Twitter anymore. I do master dawn, apparently. Thank you for the setting, that was very good. Well, you were teaching us, you said there's a difference between AI and emergence that you were explaining. Can AI be trained to do emergence? I think it can be trained to encourage it. Yeah, I think there are ways to do AI that facilitate emergence a little bit more. Like quality of input is a big thing. You can't feed it rubbish, but also, and you also have to have enough diverse things in there. Like, you know, we see AI do horrible fake news that's incredibly racist. That's because its input wasn't good enough. So I think that is not an example of AI, but not of emergence. I think there is good AI that can be trained to do it. Yeah. All right, I think that was it. Everybody feel free to come up to me and ask me stuff if you see me around. Otherwise, see you on social media, and thanks for listening.