 All right, hello, everyone. Thanks for joining. So I'm going to be talking today about benchmarking clouds. Though the session title is technically Hybrid Cloud Benchmarking and Operations on AWS Azure and OpenStack, that didn't fit very well on my title slide. So ultimately, we're talking about today's benchmarking clouds, what that looks like, and what that actually means. So first, hello. I'm Marco Cepi, I work for Canonical. And truly, I do love benchmarking. So before I started at Canonical, I actually worked at a lot of different companies in the Washington, DC area ranging from things like a web hosting company that does more traditional kind of web hosting, things like shared servers, virtual private servers, dedicated servers, that typical platform. I was a tier three system administrator there. I've also worked at a relatively large news organization in their IT department. They were transitioning from print media to online media, and I helped them with that transition and bringing up their entire web architecture. And then I've worked for dozens of startups, some that succeeded, many that failed. I had a lot of fun doing that as well. But the thing that's always common between all the jobs that I've had so far in the IT industry is that all of these companies suffer from a problem of measuring and reacting to performance. And very few of them actually take it as a priority in their organization. So I want to show you all something that I've been helping with many of the people in conical work on. So this is a benchmark GUI. It's a UI that's running. And I'm just going to launch a few benchmarks and kind of talk about what's happening here. So I'm just going to run MongoDB's Perf. Do 10 megabyte file sizes with 16 threads and do read and write performance tests. Yeah, it looks about right. I'm sure to go ahead and launch this. And actually, while I'm doing that, I'm going to launch a few more benchmarks. Let's go over here. Let's launch another MongoDB Perf benchmark. Why not? 16 threads, something about 10 megabyte file sizes, reading and writing. All right, let's launch those. So what I'm doing is I'm actually benchmarking services in a cloud. If you've noticed the top right here, I actually have this running in AWS's AP Northeast 1 region. And I've got this over here running in Azure's US West. And I'm just running these benchmarks against, well, against, in this case, MongoDB. And I've got some ones I've run previously that I want to show you all. Kind of what this means for benchmarking. So if you look at the left here, you can see this is a previous run I did about 20, well, two hours ago, actually. I guess it's been a long morning so far. You can see the parameters that I ran, things that we've seen before, things like the 16 threads, 10 megabyte file sizes, reading and writing. I've also got a set of results from this benchmark. Ran for about three minutes. It had a max operations per second of 5,070,000, with an average of 4,858,000. It performed quite a lot of operations. I'm not even gonna pretend to try to read that number. And it did 177 iterations total and that's the minimum operations a second. I can also see over here what looks like, actually quite a complex topology. I'll talk about what that is in a second. And then finally I get all this hardware information. I found out, this basically is discovery of, well, what the machine was running behind the covers. And additionally, if I really wanted to, I can go and poke at some graphs. Let's see if I wanna see what the CPU in the user space was doing. I'm not very busy at all. What was the system doing? Nothing at all as well. Not a lot going on in this benchmark. Let me see what the disks were running. The right performance on that. That makes sense. Not sure why those numbers aren't coming up. I also have this benchmark. Oh, still running now, but they'll complete in a few minutes. But I wanna talk about a bit what we just saw here. So the way that we do performance and benchmarking is changing. And it's changing, well, I think the thing for the better, but it's changing because of the way technology is evolving. We're no longer running single services. We're no longer running and caring about machines. We're moving to things like microservices. We're moving to scale out architectures. We're moving in a way that kind of starts to overshadow in Trump the previous generations of technology. Before it used to get a machine, you unboxed it, you racked it. You burnt in the disks. You run benchmarks, you run FIO, Siamark 2, a suite of Pharaonics test suites. And you basically sat there and measured everything inside and out. When I was working for this web hosting provider, we would unrack Dell R710s and PowerEdge 2950s every other week in the data center. We'd rack them, we'd benchmark them. I knew exactly how each one of these individual benchmarks performed on each one of those servers. And to me, that was enough. It was just as performant as all the rest of the ones in that same model, which means there weren't any defects. I racked it and then we'd either throw a shared server on it, we'd throw a virtual private servers, pre-cloud era, or we'd give it to a customer to run its dedicated server for themselves. But for me, the benchmarking stopped there. It ran in a way that I expected it to. The machine responded in the way that I expected it to and it moved forward. Moving through to different jobs, when I started working at things like US News and World Reports, we had really this new architecture that we were designing and as a system administrator there, I was responsible for certain components. The developers managed all their little applications they wrote, their front page readers, their news aggregators, their blogging services. But I was responsible for the architecture as a general, but mainly the big back end components, the database servers. We ran Postgres, we ran a huge Postgres cluster. We also ran things like Cassandra and other things that assisted as a back end service for those web developers. And so while I was there, I would go and just run these benchmarks against Postgres. I knew exactly how well my Postgres performed. I tuned and tweaked it to every last knob, same with things like the memcache servers. I made sure I benchmarked our baseline Apache and Nginx servers, I benchmarked our caching servers, I benchmarked each thing individually. And I made sure that it was tuned the way that I was satisfied with the benchmarks that I was running and I was happy with that. To me, that was good. And then startups were just a whole mess. We were running as fast as possible, tripping over our features, trying to get code landed, never mind measuring how fast it was going or if it was performant at all or not. And so what I started realizing when I joined Canonical and one of the reasons I joined Canonical was so excited is because this evolution is actually a bigger problem for this benchmarking performance community. It's not so much how the machine's running or a single component, but it's really more about how the entire solution works together in tandem with each other. And that's becoming the next wave as we start going to these scale out architectures, as we start diving into more complex problems with architectures that spread machines and services across not just multiple machines but multiple regions, multiple data centers. Benchmarking what that looks like from performance is no longer down to the machine, how the machine's responding or even at the application or component level. It's more about how this entire solution works and benchmarking that solution and tuning your solution in order to make sure that you have it humming perfectly from top to bottom, no bottlenecks that are causing any problems. So this is kind of what we have today, more or less, or traditional benchmarkings. We have things for machines. You can benchmark a machine, you can burn it in. Things I spoke about earlier, Pi Bench, Sys Bench, FIO for disks. We also have this idea of benchmarking a component, MongoDB Perf, like I ran a few moments ago, stressing a Cassandra cluster, sieging a little Apache server or Apache Bench, using MySQL tests to kind of drive traffic into MySQL and generate load that way. Or PG Bench, the same thing for Postgres. All of these things are testing at that component level. They don't really exercise that full stack. So what we have here is, the problem is benchmarking these solutions. So we have here a solution, it's a very simple one, but it's one that most people are familiar with. We have a Django web app, which is running, it's connected to Memcache for key value stores. It also has Postgres, DB, backing it, and a squid proxy in front of it, which is acting as a load balancer caching server. You'd be surprised about how many companies are out there innovating that are startups in the DC area that are running basically this over and over and over again. The one key thing that changes is that idea of this web app. That's being written by developers, it's their unique take on solving a problem. And even at previous companies as well, very simple, well not very simple, I take that back. They're more than simple, at a high level, looking at this idea of a logical model of things, it looks very simple. It's squid, it's a web app, it's a basic three tier structure. But when we start moving to things like big data solutions, machine learning, OpenStack, you've all seen OpenStack deployed, it's not just a couple of pieces connected to a tier, it's actually a very large problem. But taking this as an idea here is a solution we have. So previously you've been doing things like this. Well I'll just attach PG bench to Postgres and I'll benchmark it. And I'll know and I can tune and tweak my Postgres to be as performant as possible, given the amount of work that I'm pushing into it, given about the amount of work that I assume that my web app's gonna be generating, the traffic that I'm gonna have to it. And this really is just benchmarking again that component, that single piece there. It doesn't take into consideration the rest of the architecture. A coworker of mine, Chris, he used to work for one of the top three social media websites out there, you can kind of use your imagination, which one they may be. And he worked as an engineer maintaining their distributed file system in-house. So he set up, spent a lot of time, I imagine, tuning and tweaking and benchmarking. You recall the story to me, he said, yes, I have a storage pool of over hundreds of petabytes of data. And I've tweaked and tuned my storage pool so it is responsive. I've benchmarked it over and over again. I know the right speeds, the throughput, I've got dashboards monitoring it. I've owned this entire pool and it is performant to the last end. And he was telling me a story about when he had a page late at night one day because the right throughput to his cluster just tanked. And he was explaining that everything that he'd done was right. He'd benchmarked it, he'd tuned it, he'd responded to the demand that he expected on his workload. But an engineer, developer, developed an application for the social media website. And when they developed it, they had started writing very small chunks of data where this cluster was tuned for larger files. And because of that constant small chunk writing, completely tanked and almost grinded to a halt, the performance in the storage pool. So the problem being is that he took into consideration the idea of, well, I'll just simply benchmark the pool as I've seen it, the component itself, and it's tuned properly. But you don't take into account the rest, if you don't take into account the rest of the solution, the rest of the pieces plugging into that, and actually benchmark through the entire stack from top to bottom, you won't actually get an accurate picture of what the performance is for this cluster. So to do a more of a solution-oriented benchmark, we could do something like Siege. Siege in and of itself just attacks a webpage with traffic, it's a load generator. But in this diagram and the way it's connected, by passing Siege through Squid, and by then Squid passing traffic down to the web app, which would then be exercised using memcache, using Postgres, I can actually go and exercise the entire stack through and through. I could also do something where let me see what this looks like without Squid attached. What if I just Siege directly to the web app? What happens then? How much is my performance and my cache providing me? You could start configuring and tuning this and changing this to test different scenarios and ideas that you may have. Is the caching really helping? Is it not? Do I need more web heads to respond to load? Is memcache really the right performance there? And so together you get this idea of where you're doing different cross sections of your platform, because your platform is deployed on a cloud or is actually the cloud itself, its architecture becomes quite crazy. And so with this, you also get the idea that I now know how my cloud's running. I've benchmarked that I've got performance right throughput, but just knowing the average transactions a second from Siege, how many people can currently can view the site, what's the average response time, is only one half of that picture from performance benchmarking. It is the throughput, but it's not actually what the machines were doing at that time. So this is where the idea of metric collection comes in, because benchmarking is more than just the number that's produced at the end of a run, but it's how everything in your environment responded to that, how the entire solution was working. You may run Siege against Squid and then have an amazing throughput rate. But when you start looking at things like, well, what was Squid the machines that Squid was running on and what was the web app machines running on and what was Postgres doing? You may start finding that, oh, well, Squid itself is taking the brunt of the load, the CPU is going through the roof and it's just about to max out, whereas everything else is going quite slowly. So I now know that, yes, I can handle a thousand people a second, but any more than that, I may start actually experiencing negative performance. So we can attach something like a collector. In this case, this is collectee, but it could be collectee, it could be any number of different things, but you also want to find out what your machines were doing during that time. If you then bump up Siege even further, put more power behind it, you may find that Squid will fall down and then the load isn't transferred further down the stack into your application. Another interesting thing is, how does your stack respond under load to things going wrong? So, what if you have a thousand hits a second coming to your website and suddenly a web head goes away or one of your caching servers goes offline under the entire caching server, which would be detrimental. What happens when Postgres goes through an active pass-to-failover? What happens if memcache, one of its clusters go away? Introducing chaos and measuring chaos is a valuable way to also determine how the performance works of your application on your cloud, because at the end of the day, clouds are ephemeral. Your instances are there, but they're virtualized, they're not actually there, they can disappear at any moments. An entire host node could go down taking off half your application, a region could go dark. We all remember what happened to a lot of popular websites a couple years ago when they all had their stuff running in one region and in one zone then suddenly a data center goes dark or a region goes dark or a rack goes dark. So in this case, this is the Greek symbol for chaos, but you can imagine this is Netflix's chaos monkey, the idea that Netflix prepares for readiness, prepares for disaster, by going off and turning off services at will in production, just to make sure the application runs properly. What if we could simulate that using production workloads, the idea of pushing a lot of traffic at the app at once, and also simultaneously measuring what happens when we're doing things like active failover when units go away. Measuring that performance is another important facet for when you're considering things like benchmarking clouds. So the final piece here is that we've looking at this here and this is a logical model that shows you the services and components and how they connect in the workflow that goes through there. But the last facet of benchmarking is the idea of what does that architecture actually look like? So in this case, we have, again, our workload, our solution, squid, web app, memcache, Postgres, and these two benchmarking services. And this may actually look like this physically represented. We have four servers, in this case, Siege and PG Bench are on one server, slightly larger. Squid, memcache, and Postgres running on another server, the web app and Postgres running on a third server, and another web app and another Postgres running on that server. This is one way that you can model that model that we saw earlier by spreading services across, much like Adam demonstrated previously, how the open stack installer of the landscape autopilot will go and install services and attempt to spread them out as much as possible. And as more resources become available, we'll continue that spreading pattern. Many cloud architectures are more than one service running on more than one machine across potentially more than one cloud. So how that architecture looks is important when you consider the results of a benchmark at any time when you're benchmarking a cloud. As changing any one of these things, for instance, your workload generator machine becoming more powerful, throwing more CPUs, memory, more disk space, potentially changing onto a faster network, all of that will affect the outcome of your benchmark and measuring what that architecture looks like is vital in knowing how to reproduce your benchmarks and then how to actually interpret the benchmark data when it comes back. You could also do things like this where you actually start adding more web heads, adding more components. As components and architectures change it, your logical model hasn't changed at all. It's still these components, they're still interconnected, but the scale and way the distributed has changed. As a result of that, it changes and affects the output of your benchmark results. So what does this all mean for cloud benchmarking? What does all this come down to when you distill it? Really what it means is that cloud benchmarking in general is everything we discussed before. It's the machine level benchmarking. It's how does the machine you're running on perform? It's how does the component tuned and conform? How does the solution as the whole perform? And then what's happening at the time of execution? And then also, what is the actual architecture of the solution I've deployed? How does it spread? How does it look? What are the machines like? What's the profile? Where are they located physically to each other? What networks are they running on? What resources do we have behind there? Are they SSDs? Are they spinning rust? What comprises of that? All of that we've discussed previously is what I'd like to summarize as what it means to benchmark a cloud. So if anyone comes to use it at benchmarking clouds, you have them providing results, but the results not enough to know actually what's going on. Without the rest of these informations, the machines you're running on, the profile, the hardware, the architecture, all of that, that number doesn't mean much at all. And really what this boils down to is that cloud benchmarking is really hard. Benchmarking a cloud, benchmarking services, benchmarking applications on top of the open stacks you may have deployed, or if you're a consumer of open stacks, benchmarking your workloads on top of that cloud, or how your workloads work across multiple clouds, all of this means it's really hard to do in a repeatable fashion. So how can we solve this? This is what really drew me to Canonical when I joined. They wanted me to come, I decided, all right, I'll join Canonical because what really convinced me, what really had me sold was the idea of what Juju does and that it can model these levels of complexity without actually losing insight into what's going on behind the scenes. So I can simplify my workloads and I can have still the observability that I expect and need in order to derive all of these points of data. I can still know the profile and the architecture down to the machine and hardware I'm running on, where all those components are placed, the logical representation of that model, how all the pieces work into each other, and then I can easily attach and add different components, swap things out, I can collect metrics, I can feed all this data, and I can reliably and repeatedly run benchmarks. So I read a lot of white papers and one of the most disappointing things I see in a white paper is that, especially one on performance, is they'll go through at a very high level. These are the things we've done. We benchmarked these 12 things together, whether it's a company publishing a white paper, an independent third party, they'll always start out by saying, these are the things we've done, we have X, Y, and Z software, and we benchmarked them and here's a bar chart of that data. You should use the product. And it always drives me nuts because it doesn't give me enough details. I can't, if I can't reproduce that benchmark, the data means nothing to me. It's just simply out there for people to consume hoping that someone spins up on that. So when I see white papers, especially ones that go in, here's what we saw in the summary, and then they really dive down into, here's the exact way to do the configuration, here's the setups we use, the conditions that the benchmarks run under. To me, that is a really decent white paper. And what all this really comes down to is that for me in benchmarking, if it's not repeatable, it doesn't count. If you can't run a benchmark again, and again, and again, and again, easily to produce the same, to produce results to do comparisons, it's not really a benchmark. All you've produced is a bar graph, in my opinion. So that's what I really love about things like Juju, and it's what really drew me in, is that because of the way Juju models all of the services and encapsulates resources, the extension of those, I can actually go and set up a benchmark and reproduce white papers by saying, here's what they've done. I can deploy those same components. I can attach in things to do metric collection, to do my benchmarking, and then using things like actions in Juju, which allow operators to encapsulate how you run tasks over and over again, things like clearing caches or normal functions you'd have to SSH into, instead of me having to go and SSH in and run a benchmark, for those of you who've run benchmarks, you know you run one benchmark, and then you run a second benchmark, a different benchmark, the output is always different. There is no standardized output form for benchmarking. You either get a log file, which you have to parse, you'll get a tabular output, which may or may not be in some normalized format. Benchmarking tools are wide and various, but there is no standard way to simply say, I run this benchmark, here's data points back. Using things like Juju and encapsulating those benchmarks in Juju, I'm able to go and run benchmarks and get them back in a repeatable, reliable, sane, parsed, machine-formatable, but also human-readable formats, which I found very exciting and very important. So I'm gonna run a few demos, and I'll have some time for questions. I've set up a bunch of environments. So I've got two environments that I ran earlier. I've got Niger, West US, my benchmark seems to have finished, ran for about three minutes, five seconds. Let's take a look at what's going on here. So it'll show me the topology. So over here on the right-hand side, oh sorry, the left-hand side here, I have that same topology. It's Siege connected to a Node.js web server. This is running in Azure's US West region, so I'm just gonna go ahead and show you guys what this app does. It's super exciting. It just simply counts every time you hit the server. So it logs it in MongoDB, so you see it's connected there behind MongoDB. So every time I hit the server, it'll just say it tracked a hit from me. And then if I show you guys, it's not just a static page here. If I go to hits, these are the last couple of hits that come through. So that was me testing it, the last couple of ones, a couple seconds ago. So this is all this web app does. It's not very exciting, but it is a Node.js web app. This could be anything, any workload you have running modeled here. For example, on another cloud, I've got an entire Hadoop stack running. It's the big data stack that's taking log data from all the services deployed, feeding it through flume onto HTFS, and then it has Spark and Zeppelin attached so that you can run your maps and reduce jobs against it. I've also got, I think in the same deployment here, I've got a lot going on in this cloud. That's the same app there. I've also got Cassandra deployed somewhere else. Up here, probably. Yeah, so this is Cassandra running on GCE, as well as that same exact big data model before. Again, that's yarn, compute slaves, flume, Spark, and Zeppelin, all connected and running together. So the first thing that Gigi provides is this idea that you can model these components. So I can go, I can model the services, how they deploy, how they relate to each other, and everything's great there. The second half that it gives you is the, it gives you the ability to observe into that stack and see, not only see where things are placed, but actually do manual placement as well, that architecture spread. All my machines, where those machines are spread across? Wow. Wi-Fi. Is there? Oh, I should have plugged in. So I'm gonna really tempt demo gods by also running demos across three different clouds with Wi-Fi. The DevOps way. Well, that's refreshing. I'm gonna go over to this one here. Let's see if this one's, yes. So I've got a bunch of machines allocated. You can see here there's 27 total. Oh my gosh, my bill's gonna be really expensive. But the important thing is I can see each service and how they're mapped. I can use this to do placement of units, so I can do the co-location of services like we showed in the previous slide. But I can also see the hardware characteristics in an abstract and generic way, which is really important for repeatability. It's not showing me that this is, I don't know, an M3.medium. It's showing me that this is really a one gig or Hertz CPU with 1.7 gigs of RAM and an eight gig root disk and there's probably some supplementary disks attached to it as well. And because we're expressing things in generic fashions, for instance, this is two CPU cores at 6.5 gigahertz for our Spark unit with 7.5 gigs of RAM, I could take those same machines and deploy this across different clouds, whether it be these bare metal boxes here or your data center or your OpenStack cloud that you have running or any other cloud that Juju connects to, which is quite a lot of different configurations. So this allows me to spread and test my load. A lot of people today are running things, running hybrid, sorry, a lot of people today are running hybrid workloads, workloads on bare metal, workloads in OpenStack, but also federated and running as well in a public cloud somewhere. And because of that, and because of what Juju allows you to, to span across multiple clouds to be able to connect and deploy stuff against a public cloud as well as your private cloud or your bare metal. Because of that, it makes it really easy for you to take these benchmarks, export the model that represents that benchmark and import it into another cloud and then rerun that benchmark without having to do much manual work intervention. A couple of commands you've gone and taken this entire stack here, which I stood up this morning, still refreshing. This entire Hadoop big data stack, I grabbed from the charm store. And again, everything I'm showing you here is all free open source software. I grabbed it, I deployed it, I deployed it against a couple of clouds with a couple of commands and then I went and started running benchmarks. So for example here, let's just go ahead and run, let's see, this is AP Northeast. Yeah, so I've got a MongoDB in here. It's up here at the top. So I'm just gonna go ahead and run another benchmark against MongoDB. Same parameters as before. But the idea behind the repeatability is I can use and I can record all the parameters that I run with this job and it's recorded for the lifetime of that job, but I can also tweak parameters. That idea that you can do repeatability but also expand and build upon that, change which services you're benchmarking, change the architecture behind it, change the parameters you actually run with. Maybe I don't want a 10 megabyte file, maybe most of my files are 100 megabytes. I can launch this and then using this UI, I'm able to go through and see, it'll take a few seconds to launch, but these are identical, but I can actually compare what actually went on between these two runs. So this is in a very interesting comparison because, well, honestly, they were ran with the same parameters, they were ran with the same hardware, the same machine just a few minutes later from each other. What I can see is I can start seeing the differences between each run and this would allow me to start diving in a little deeper. For the most part, everything here is identical with the exception of minor differences. One extra iteration was ran on one run versus another run. So that'll allow me to go dig in what was happening at the time of that machine. I can see this run here, the first one I selected didn't have as much, didn't get as many iterations that I could maybe see was there something with the hardware, I could check the load of the machine. So let's see what the load was actually, let's do this in a comparison. Let's compare these two guys. I can compare the run, so the first run here on the left, I wanna see what was the load of the machine, the short-term load of the machine versus the short-term load of the MongoDB machine on the right-hand side. So I can start comparing the one on the left had more load, less iterations ran, suddenly I can start correlating things that may have been going on. I can say, what was the disc activity during that time period for both sides of that equation? I can start doing real comparisons with real metrics that were collected from the machine at the time of its execution and start deriving what may have changed between those two times of the runs, which in this case was not very long at all, I think it was maybe an hour or two. I may boil down to I've got a noisy neighbor running in my hypervisor, in my node, in my unit. I may have other things that are going on that I'm not aware of, but this allows me to start dialing in what was the differences between these two runs. And of course, when this completes, I can actually see what was the actual difference between this run with larger file sizes being written, and what does that mean for my cluster? So I start running larger file sizes while I see degraded performance, while I see better performance, the same performance expectations, all those abilities to compare are there. So we started writing this thing, we've been calling it many names internally, we've just been using it because benchmarking is an important part of every day in Canonical. We're always checking what's the performance of our services we're running, we run and deploy lots of public web services, lots of internal services as well. So we wanted a way to be able to go and measure that and gauge the performance over time. So before we put something in production, we can assert and validate and check, will this stand up the way I expect it to? Will it handle the load I'm expecting to throw at it? Do I need meta hardware? So we're decided to open source this as a whole. So this whole idea behind a benchmark GUI, all the services that are axillary to it, and this looks like in the model, it's these two grayed out pieces here, because they're not the interesting thing, these are just two little components. But these two things here, this benchmark UI, which is what we're accessing the web UI, and this little collector that's running, essentially collecting on each of these agents to gather those low level metrics, as well as some additional tools to gather and profile the machines at time of execution. These run and provide these results here. So this is GCE, I was running Consider Stress in this one earlier. I got 71,000 operations a second, took about five minutes to run. I can go ahead and run a similar job as well. I wanna say show me what it takes to run a million operations through my GCE cluster. Launch that. While that's launching, I'm gonna go show you this other benchmark I ran. Where'd you go? There you are. So this is the topology I was showing you guys in Azure. Sorry to go jump back and forth, but I got a lot of fun stuff to show here. So this was 16 threads, 10 megabyte file sizes, the standard stuff that I've been running before. What's also great is over here, sorry, this is the logical model here that shows you how the service is connected, how they're deployed. We also have this here which shows you the actual scale and spread of things. So there was, excuse me here, one unit of MongoDB and one unit of Siege and one unit of my web app. So one to one to one mapping between here. But if I were to scale services and read benchmark, let's say, let's make MongoDB a replica set. So I'm just gonna throw two more units at this. So if you watch over here in the machine's view, what you will do will provision me more machines, put services on them, and it'll show me the generic hardware constraints for that over here as well. There we go. I was running through now and doing that for me. They're showing up. Machines 19 and 18 are now on here. They have MongoDB running on them. I could have potentially taken them and put them on Lexi containers instead and model density and how my app responds when I have things densely packed, much like the autopilot did earlier, spreading things around. If you were also present earlier for James and Tyco's talk about how they measured the Lexi performance versus KVM performance, where they got those numbers from, all those numbers, all those graphs, all those pretty graphs that didn't have any data behind them, all that stuff was generated using this here, benchmarking the benchmark GUI and they were running this exact, hey. Let me do one up there. They were running this exact model here where you had, this big data cluster was their big data benchmark they ran, so that's where these numbers came from, that's how they got the data, that's how they were able to compare and perform them. All they did behind the scenes was switch the cloud, they stood up the exact same cloud, using Juju actually. They then switched a bit to say use Nova Lexi as my hypervisor versus KVM and then ran the workloads on top of them. So, coming down here, I'll go ahead and wrap up. Before I do that, does anyone have any benchmarks they want to see run? I have quite a few that are running here. I've got MongoDB, I've got Big Data, which is our Hadoop, Terrasort benchmarks. I've also got Siege and Cassandra. Does anyone have anything, preference they want to see? I think I've run most of them now, but I'm happy to run any additional ones if they want to see something else run here. Yeah, cool, that's fine. Yeah, cool. I just wanted to make sure I got signed in there before the time ran out. So, back to my slides. So what this gives you is not only satisfies all the criteria that I've outlined for how to benchmark clouds, but it allows you to quickly validate hunches. We had a pretty big hunch that Lexi as a pure machine container would allow you to have better performance because you don't have the virtualization overhead, that's something like KVM or any other full hypervisor would have. And we validated that hunch using these tools here to do so. So it really allows you to do things like, I'm pretty sure this is the best way to run my application, let me a certain validate that. Finally, this is benchmarking wherever you are. Juju runs against all of these clouds and many more not listed. As far as things you can benchmark, most of the services that we have available in the charm store you can actually benchmark with Juju. There's something in the magnitude of over 130 charms that we have curated available to deploy. Of all of those, the majority of them have benchmarks available right now. So you can deploy services in any of these clouds, get started in benchmarking. Again, all free and open source software. We also have things like rally. So if you actually stand up an open stack cloud using our reference architecture using autopilot, you can actually start plugging in things like rally and running rally against your cloud to say, how is my hardware holding up? How's this configuration I've chosen or autopilot's chosen? How is that doing for performance for me? Finally, I invite you all to come check out what we're doing. I'm really enthused about benchmarking. I'm really excited to see what people are doing. The problems are trying to tackle and see if we can help make those problems easier to solve as architectures and technologies progress over time. JujuCharms.com, benchmarking the Juju.solutions, github.cloud benchmarks. I haven't published it yet, but this is being open sourced this week. We'll probably publish them, github repo there. Finally, hashjuju and irc.freeno.net. I'm Marco Cepi there. I'm happy to answer any questions now or later. But do you have any questions? Yes. Sure, it depends entirely upon the solution. So the question being, how do I simulate workload? And that's very solution dependent. For web workloads, we use things like Blitz or Siege or Apache Bench or WRK, which is a new Golang stressor. That generates web traffic. Depending on the service, for instance, with Siege, you can upload a test plan to say these are the URLs that I wish to hit as part of this, as part of my benchmarking service, and that's all encapsulated in Juju. For the big data stuff, it's Terasort and Teragen currently. There are more benchmarks we're doing to model how you do the entire ingestion data flow. So pushing data at one in the flume and then see it come out the other side and doing reduces on them. Cassandra Stress is just for the Cassandra node itself. That model is just for Cassandra, but if you have a service attached to Cassandra, stressing that workload through there into Cassandra would be the way to do so. Did you have a question about a particular solution in my... Big load? Okay, so you're looking for, sorry. Okay, sure. So the tool that you use for that is gonna be, again, very dependent on the solution you're deploying. So the tool you use to generate load is to test to see if your solution stands up and responds properly under heavy load. It's very dependent on the solution that you're deploying. So in our charm store, for most of the solutions we have available, there is an accompanying workload generator that will allow you to drive work into that, drive load into that service. What about Windows machine? That's a great question. We do have a few Windows charms. Juju is, I didn't put a slide up there, but we've talked about it a lot already. It's cross-platform. We actually can deploy workloads on a boon-two-cent-toss or Windows. We have a few Windows charms, things like SQL Server, things like Active Directory. I don't know if we have a benchmark workload generator for Active Directory, but I know we do have a very similar benchmark for MySQL test to push into SQL Server so you can generate load into there. Usually you have SQL Server with a web application or something on top of that, and then you drive work through the top there. The majority of workloads that people are publishing today on Clouds are mostly web-derived workloads. So by generating load through the web interface, you can exercise the solution down to the bottom, and then you just generate a test plan for that workload. Say, hey, Siege, these are the URLs you need to hit. Post this data to this to generate that stressing effect. That's a great question. If you go next week, yeah. Ah, those are two excellent questions, actually. So the questions are, where's the list of benchmarks for the services, and how do I contribute a benchmark? So the list will be at that second URL there, since we're still in the process of open sourcing all this stuff and getting it out of the door. It's not there yet, but we'll be highlighting on that site, benchmarking that you do the solutions, all of the benchmarks and workloads you can do, highlighting solutions that people are running. We actually have a site that I didn't show. I am doing really good with this stuff. We actually have a site called cloudbenchmarks.org. So this is part blog us testing these workloads and benchmarking, but also actually uploading results. So if you go to here, I can just say, yeah, I want to publish this to cloudbenchmarks.org. Of course I do. And this will take everything you're seeing here and allow you to publish it so people can take a look at your results. You don't have to, it's not at all, it's an opt-in service. So we have some blogs. We have a list of submissions that people have submitted so far to our archive. Let's take a few seconds to load, including the one I just ran a few minutes ago, the Node app one. My time zones are a little weird, so it says in four hours, but that was right about now. So that's where they are. We'll be hosting this all at benchmark.gd.solutions. To contribute, all of this stuff is either in the charms or as a charm. All of our charms are open-source software, so you're able to go and contribute them. The code for such is usually very minimal and small, delta, you just take your expert knowledge of how do I run this and you either encapsulate it in shell code or as Python, and then you submit it as a request to that software repo. So the majority of these are maybe 20 or 30 lines long that just encapsulates and distills what it takes to run that benchmark as code, and then we provide the framework on how to make that repeatable within Juju. Fantastic questions, though. Any other questions? Great, I'm a little over. I thank you guys for your time. Again, here are the details. Barring the monkey in the background, always watching, ready to inflict chaos. So feel free to reach out to me at any time. I'm happy to answer any and all your questions about benchmarking performance at Juju and Charms. Thank you. And then finally, of course, we have surveys.