 All right, well, I guess I'm going to get started. There may be some more people straggling in, but again, I know the conference is on somewhat of a reasonable schedule, so I don't want to keep anybody from their next presentations. So my name is Mark Seeger, and I'm with HP Cloud Services. And the area of my interest focus, whatever, is SWIFT performance. And I've been doing a lot of work with monitoring SWIFT and measuring SWIFT, and trying to find ways to improve the performance of SWIFT. So that's kind of what my talk is going to be focused on. So I've always found the kind of interesting to position things and set the stage, if you will, and talk about what problem are we trying to solve, because everybody looks at performance often from different perspectives. And basically, the problem that I'm particularly trying to solve with what I'm doing is the need for a consistent set of mechanisms and controlled experiments for measuring your performance. A lot of times, people will measure performance, and every time they run some performance tests, they might do it a little bit differently, or they might come up with some different results. And it's kind of hard to tell from test to test to test what's happening. And a lot of times, as you're doing your testing, you'll find some weird behaviors, and you want to be able to modify the tests that you're performing. And again, in some cases, depending on how you've instrumented the environment, you may or may not be able to do that sort of thing. I want to make it really easy to run tests and to be able to operate and configure them. If I'm going to be making multiple sets of tests, I want a very easy way to compare the results. Again, a lot of times, I've heard people say, things are running slow. Ooh, now they're running fast. And it's like, well, how do you know they're running slower? How do you know they're running fast? And you don't really have a good way of comparing. And of course, in the case of SWIFT, for those who maybe haven't used SWIFT or only peripherally have used SWIFT, when you run some of these tests, you might write tens of thousands of objects. And at the end of your test, you have all these containers with all this stuff in them and you want to make it easy to delete them so that when your tests are done, you're back to a clean state. It's really easy to fall into the trap of SWIFT has infinite storage capacity, sort of. So I'll just run my tests. So when I'm running a new set of tests, I'll create a new container and I write a bunch of new objects into the new container. And when you're done, you may have tens of thousands of millions of objects lying around, which again, may be okay, but it's kind of nice to be neat and clean up and clean up when you're done. But anyhow, that kind of says a little bit about some of the measuring things, but from a benchmarking perspective also, you sometimes want to be able to run your tests at scale. So that kind of means you want to run a test and let's say you want to run 10 clients writing a bunch of data, then you want to write 50 clients, then you want to run 100 clients or 500 clients and you want to see what happens as things scale out. The other thing that's really important when you're doing this sort of work, when you're running a lot of parallel clients, you want to make sure that the tests start at exactly the same point in time. It's not one of these things where you have eight clients and I'll start when we're running on this client, then I'll start when we're running on this client, then I'll start running on that client, but they all didn't start at the same time. So now the first couple of clients had the benefit of having exclusive access to the environment. And the other clients have to share and things don't work out and it's not repeatable and that kept to get back to some of the earlier things. So what I did is I kind of built what, for lack of a better name, this thing I'm calling my get put suite. We actually had this tool called put get and I wanted to build my own and I couldn't exactly call it put get even though you do puts before you do get, so I called it get put. And it kind of sort of consists of three or four main scripts that I run. At the bottom level, I have this tool called get put and he's the guy who does all the work. He reads objects, he writes objects, he deletes objects, he measures the performance of the objects and he does that on a single node. And it's a very nice, simple tool and it runs standalone. And if we have time, I'll actually do a little demo for you later that'll hopefully work out better than the other demo during the, I feel bad because you know you test these and you know they work perfect and you get it on stage and they don't work and it's like, all right. So anyhow, get put kind of focuses on running on a single machine. And I guess the message I'm trying to get across is part of this talk, is it's not so much about the tools as it is about the methodology because I know a lot of times people have their own environments and they do all their own testing and if you can take away a few ideas from what I want to share with you then it's all goodness. So as I said, get put runs on a single machine. If you want to run your tests on multiple machines, you have to have a way to start your tests at the same time, stop your tests, clean up, et cetera. And for that I had, I wrote this other tool called GP Master. In GP Master you basically tell, I want to run these tests on these eight machines and I want them all to start at this time. So GP Master goes in SSH's fires up a bunch of copies of get put and he tells them what time to start running and then when they finish he cleans up after them. But it turns out I wound up with so many switches. There's probably like 15, 20 different switches. I said, okay, I need another tool that kind of like works as a macro to these other tools. So I wrote this thing called GP Suite. And you tell GP Suite, I want to run a full set of tests. I want to run a small set of tests. I want to run these kinds of tests. I want those kinds of tests. And you literally give it like one or two switches and he then kind of like acts as a macro and he calls GP Master in number of times to run all these different tests. And then I got to the point where even GP Suite was getting a little crazy. So I wrote another tool. I wrote like just a bash script on top of that that would call it GP Suite multiple times to do my testing. But getting back to the workhorse thing, this thing called get put. So the important takeaway here is he's built on top of the Swift client library. One of the things I found with testing is it's all about the test stack. So for example, I've heard people say, you know, Swift is Swift. You can run it from Curl, you can run it from Perl, you can run it from Python, you can run it from Java. Every single one of those will produce different results, I promise you. Some of them are closer to results than the others but they're not all gonna give you the same numbers. Great example, if you run the Swift tool for those of you who've played with Swift, you can say Swift upload an object, download an object, whatever. If you run Swift and you upload an object and you run Curl and you upload an object, they're not quite the same. And if you do it with Java, it's not quite the same. And you start, it makes you crazy because you start finding out, well, this one is using version X of the HTTP lib and this one is using version Y of HTTP lib. This one is using this SSL encryption cipher and this one is using that SSL encryption cipher and it all affects the performance. So whatever you're doing, you need one consistent set. So that's kinda what I'm doing. I've decided to standardize everything using the Swift client library and hopefully, you know, that at least I have a consistent set of numbers within the tests that I'm running. So like in standalone mode, it really only needs like a half a dozen switches. You tell it, I wanna write objects with this name to the container with that name, here's how many of them I wanna do, here's what size I wanna do, and here's the kind of test I wanna do, gaps I wanna do, puts I wanna do, deletes. And that's relatively straightforward. Then it's got some additional switches that are needed, usually when you're running out of multiple nodes. So for example, if you wanna run your tool on three different nodes and you wanna write them to separate containers for each different node, then each node has to have a way to identify what container to write in and it all kind of builds together. And then GP Masterite already said he kinda coordinates the running of all these and I'll kinda move through this a little quicker. And GP Suite, GP Suite is like I said, it's kinda like my macro. So if I say this command right up here, oops, wherever that button was, oh there it is, this thing's worthless. So for example, if I say run GP Suite full, what that simply does is it runs a 1K put, a 10K put, a 100K put, a 1Meg put, a 10Meg put, and it'll cycle through it for one thread, for two threads, for four threads, and it'll one single command that it might run for four hours or whatever depending on what it is that you're doing. And then when it's done, it cleans everything up and hopefully all your containers that you created are all gone and you have a nice clean environment. So I wanna say a few words about output and can people see that okay? Or I can make this thing a lot, nah, leave it the way it is. But basically when I first started running all this, I basically said, I wanna see what time the test starts and finishes, I wanna know how long it took and I wanna know what my throughput is, X number of megabytes a second. And for the most part, that's all I really was worried about. But then as I started analyzing the results I started finding out I need more information and that's really not enough. And what I added in was this notion of these last two columns over here are showing me my latency. So now I can see, I did a whole bunch of puts and they had an average latency of 0.8 seconds for each operation. But it's like, well, does that really mean a whole lot? What if a lot of them were a lot higher than eight tenths and a lot of them were lower than eight tenths? So I included my latency range. And as you can see on that first, if you could see that first line, it would tell you that the first test had an average latency of 0.08. But it went from a range of 0.02 to 0.22. And that's a pretty broad spread. So I said, you know, I think I need to know more than what's going on behind the law. That really isn't giving me an accurate picture. So I added what I called my distribution. Nope, that's the slide after this one, sorry about that. So this one, I started worrying about object sizes and how much CPU was getting used. And then the final one, over here, I started saying, geez, I had a latency that went from 0.02 to 0.36. But look what happened to my distribution. I had some of them that were less than a tenth of a second, some of them that were less than two tenths of a second, some of them that were less than three tenths of a second. And now you start seeing this picture that says, geez, my megabytes per second or even my IOPS came out to be a fixed number. But if I dig a little deeper, I'm really seeing this spread. It's not a uniform load. And I'm not saying that's good or bad, I'm just saying that's reality. And it's not until you start digging down at those lower levels that you really see what's going on. So having done a whole lot of this testing for a long time, I made a couple of observations. One of them was swift scaling across multiple machines is really excellent, at least from the testing that I've done. If you would get, let's say X megabytes per second on one machine, if you ran eight clients or 50 clients, you would get eight X or 50 X or close to it. Assuming, of course, that your swift back end has the capacity to handle that kind of a load. But it scales really well. But the other type of testing I started doing was looking at an individual client and spinning up multiple processes on that client. So instead of running one thread on 50 clients, let's run 50 threads on one client. And now things start getting really interesting. Because one of the things I found is if you have small objects, they actually scale pretty well. You could actually run 20, 30, 40 threads on a single client and do quite well with it if you're using small objects like 1K or 10K or something like that. But if you start running it with bigger objects, now things get kind of weird because you start finding out that you're hitting bottlenecks and this is what I was talking about. I have a dozen slides back. And what you start finding out is, damn, I'm running out of CPU or I'm running out of network. And it turns out those are two real key things that you can consume really quickly. And you don't normally think of running out of CPU when everybody knows that uploading and downloading objects are network bound, they're not CPU bound. Ah, contrary. Because when you start digging a little deeper to saying, where the hell's all my CPU going? There's two real evil things that crop in. One is compression and one is encryption. And you start finding out that you're spending so much damn time compression, compressing, there's no time to send the data anywhere. And even if you can bypass, even if you can get it to stop compressing, you're still spending an awful lot of time encrypting. So what I've concluded, and we've actually hacked up some of the code that's part of the Swift Client Library. Because the Swift Client Library, you can't disable compression and you can't disable encryption. So we were hacking it up a little bit and we found that you could get like a two X boost in performance if you could disable compression or if you can disable encryption. So my recommendation if there are any developers here or people who know developers or whatever is I would love to see the capability inside the Swift Client Library to disable compression and or disable encryption or at the least change the Cypher algorithms. Because different Cypher algorithms in encryption also use up a lot of CPU. So again, I'm not sure if people are gonna be able to see this or not. I didn't have a vision of being in a room this big. You know what I bet I can do real, real quick. What if, if I can pull out my silly spectacles, what if I went in here and I, oh, we're on the view screen, what if I zoom in and make this puppy bigger? What will that do? Maybe that's not gonna help at all. Nevermind, let's go back to here. But what I'm basically trying to show you here is this is this tool that I love to use on the bottom called Collectal, which is something that I wrote that allows you to monitor the network. And the tool on top is showing you what this get put tool is doing. And if you kind of take a look, it's kind of obvious that this 17 megabytes per second that get put is measuring is kind of sort of what you're seeing going out over the network, which is on the order of 19 megabytes a second, but there's a little startup and shut down where it's a little slower. And it averages out to 17 megabytes a second. And that's the kind of thing that you normally see when you're doing uploads and downloads and stuff. But then if you do, but then here's an interesting experiment. Most of my tests were done using non-compressible objects. In other words, what I would do is I would generate this binary string that wouldn't compress. And that's why when you do an upload of whatever you see 19 megabytes per second. Well, in this case, I'm taking a 100 megabyte object and this little switch that I circled away at the type, the top that says O type S, that kind of says, I forgot what the S stood for, but it means, oh, make all the characters in the string the same. So it becomes a very highly compressible string. What you wind up seeing is now my megabytes per second went from 17 megabytes a second to almost 50 megabytes a second. And the reason it went from 50 megabytes per sec to 50 megabytes per second is if you look at the network, I'm doing less than a megabyte a second to hit that 50 megabytes a second. That's that middle pain over there. So basically what's happening is that 100 megabyte object is compressing down to something really small and it's going over the network in small pieces. This bottom pain is showing you what's going on on the proxy server. And again, for people who aren't familiar with that level of detail, with Swift, you've got this proxy that's sitting on top of all these object servers. The objects are coming into the proxy server and he's handing it off to three different object servers. What's happening is, oops, what happened? Wrong button. So as you can see, the data is coming in to the proxy server at less than a megabyte a second. That's that thing I have circled on the left. But he's going out at almost 200 megabytes a second because what the proxy's doing is he's uncompressing those objects and then sending them out in three different directions. And these numbers here are approximately three times that data rate weight at the beginning, which is the 50 megabytes a second. So that's kind of how all the numbers hang together. Okay, let's change gears for a minute. I want to talk about latency. It's not particularly an exciting topic, but when one of the things I discovered is when you do your I.O., if you measure the transfer times for each individual get and put, that's what I'm calling the latency. And if you kind of take a look at some latency numbers, let's look at these guys at the bottom. Initially, it's like, gee, number five. There's definitely something wrong with number five. I'm glad I knew that number five had a high latency, but the other five, they weren't too bad. Some of them were as low as 0.59 and some of them were as high as 6.7. Maybe the 6.7 is a little higher, but it's not too bad. And that's kind of like a lot of times when people measure object storage performance, they measure the average latency. And my point here is if all you're looking at is the average latency, you may conclude that you had a problem with the fifth transfer. But what if we get a little deeper and start looking at comparing the latency to the I.O.P.s? And now what we start seeing is that, geez, that fifth latency, I was really right. That 0.83 was pretty bad. Look, he only had 12 I.O.P.s a second, or 12 I.O.P.s. But you know, the sixth one wasn't so hot either. He was 0.67, and he had about 15 I.Os per second. The point being is looking at the latency all by itself, you don't necessarily map it to reality until you start comparing it to I.O.P.s. But even looking at I.O.P.s isn't necessarily gonna get you something. What happens if we toss in the latency range? Now we get to see, hey, wait a minute, that last guy, all of them had a low end of 0.02, but the bottom guy actually went up to almost four seconds. He had a spike that was almost four seconds long, and you had no way of knowing it just by looking at the average. You know, that's kind of my point with this, that you really need to look at what's the range of these latencies. You know, but as they say on some of these infomercials on TV, but wait, there's more, because even knowing that your latency range on the bottom was 0.2 to four still isn't telling you the whole story. So if we dive in deeper, and unfortunately, you know, the fonts get smaller the deeper you dive, when we start looking at these numbers, you can see, we're looking at latency distributions now. So this is a histogram. So this is telling me, out of, I gotta remember the data. So I did like 4,000 operations. Actually, this was one test that did like 30 or 40,000 operations. But if you look at the bottom line, you can see out of almost 4,000 IOs, I'm sorry, it was almost 5,000 IOs, 4,000 of them were less than a tenth of a second in latency, but almost 500 of them were 0.1 to 0.2 seconds in latency. And look at this, holy cow. Some of these are up in the one, in the half second, one second, two second, three second range, and it's all that stuff taken together that's really contributing to your overall IO rates. Same thing on that fifth line, it even gets worse. We have two of them that were over five seconds long. So again, the point being that it's more than just IOPS, it's more than just average latency, it's more than megabytes a second, it's really getting down to the individual details of all the IOPS and stuff. And I just realized I'm having a terrible time keeping track of time here, so I hope I don't run too too late. So I wanna give you a couple examples of some of the findings I've had drilling into some of this data. And the first question I kinda put out is a latency of 0.04 good or bad. Well, I was basically looking at 1K, 10K, 100K gets. And what I found was that the rate of my 10K gets was actually worse than the rate of my 100K gets. And normally as you, I'm sorry, the IOPS. And what you find is if you're writing objects of a certain size, when you write objects of a little size, you can write a lot more small objects than big objects. So why is it that I could write more 100K objects than 10K objects? And looking into it, I realized there was an issue with latency. And after a lot of digging, I discovered a 7887 byte get was almost four times faster than a 7888 byte get. And it's like, no, I must be hallucinating or something. And it's absolutely 100% reproducible. And it occurs objects up to and including 22469 bytes. There's something mystical about that range that has this property. Well, it's a good guess, but it's the wrong guess. We start, you know, I got some help from some of my colleagues. We started diving in. We started looking at S traces, TCP dumps. And it finally got to the point where you could actually see. Oh, and by the way, if you looked at the detailed transaction logs within Swift, Swift was actually returning the data at like the same rate that it should. It was returning it at an expected rate. But when you looked at the TCP dump, you can actually see there was this like three, there was like this 30 hundredth of a second delay between sending out the data and the other guy acting him. And I was like, what the hell is going on? Well, as it turns out, I don't know if anybody here has ever used Netperf. Well, the author of Netperf works in my group. And I'm there, Rick, what's going on here? And he dug in, he's there, ah, Nagle, I'm there, huh? I don't know how many people are familiar with the Nagle algorithm. I've heard of it, but I don't really know a lot about the Nagle algorithm, but it has to do with delayed acts. When a lot of data is coming at you, you assume there's gonna be an act so you don't get too much traffic going in the opposite directions and stuff. And the way we run our proxy servers is we have this thing on there called a pound server which allows us to have multiple SSL terminations for different ports. The bottom line is that the connection between the pound server and the proxy server goes over a loopback connection and the loopback connection has a different maximum packet size as a remote connection and Nagle kicks in and inserts a delay. And the bottom line is we need to go in and diddle with the event lit library and maybe get some fixes put into event lit so that it can deal with this situation. But the point is it's all about latency. If we weren't looking at latency numbers, if you were just looking at megabytes per second, you never have even noticed this. Another example, I've noticed there's a lot, a lot being a relative term, a few percent half second latencies. And if you look at that latency distribution thing I was showing you before, you'd see a whole bunch of latencies under a 10th of a second, then a couple here at one 10th to two 10th and a couple at two 10th to three 10th and maybe another 30 or 40 or 50 of them at a half a second. And talking to some of our developers and whatever it turns out that there's a timeout inside Swift that says when a message, when a request comes into the proxy server, the proxy server tries to connect to an object server and there's a half second timeout. And if the object server doesn't respond, he says, okay, I'll try another object server. So the bottom line is you lose a half a second. Now, a half second is a long time when you're doing small object IOs. So given that small objects have latencies on the order of 0.02, 0.03, 0.04 seconds, you're losing the opportunity to do 30, 40 IOs because of that half second timeout. So again, one of the things that would be worthwhile doing perhaps at the development level or whatever is to have this notion of being able to say, geez, I tried to talk to this node, he wasn't responding. Maybe I shouldn't try to talk to him again for a while. And maybe he's rebooting or who knows what he's doing, but maybe give him an opportunity to respond at another time. Turns out there's another swift timeout that I guess, actually I'm not sure if it's swift or Linux, but there's this other timeout that periodically shows up that six seconds long. And it turns out that we're able to trace back where it was going on and it turned out that one of the object servers had a bad disk on it. And the proxy server didn't know for six seconds that this error occurred, so he was hanging around for six seconds delaying your upload or download or whatever, and then he would go ahead and do it. So again, you could see this thing, you could see this transaction that took six seconds, yet if you looked at the individual proxy servers, this one responded in 0.02, this one responded in 0.03, and this one, six seconds later, timed out and handed it off to somebody else to deal with it. So again, to my way of thinking it's another opportunity to maybe do some kind of tuning inside swift or maybe make swift, maybe give it the ability to say, hey, two out of three succeeded, I'm gonna return and I'll deal with the third put behind the scenes or something. So as far as latency goes, there's a variety of other areas to look into with latencies because I've seen operations have latencies of 10 seconds, 20 seconds, 30 seconds. I mean, it doesn't happen all that frequently, but it does happen occasionally. And on my list of things to do is to start digging in and looking at exactly why, here's an operation that took 30 seconds to complete what exactly is going on. And one of the things that we did was, again, another hack to the Swift client library so that when you did a put, it could return the transaction ID associated with that put. Then you could take that transaction ID, dig into some of the log files and by looking at the log files, you could start seeing when did it come into the proxy server, when did it get dispersed off to the three object servers, which object server was the culprit for that long delay or was it multiple object servers or were it for the culprit for that delay and start working your way down the stack and seeing what was going on. So again, if there are any Swift developers or whatever, I mean, we had to hack the Swift client so that it could return transaction IDs and it would be really nice if that was just part of the native API. I believe we are in the process of trying to get something into the, you know, getting it into Swift through our normal process, but it hasn't gotten there yet. Just as an aside, I wound up writing this little utility to help out in terms of looking at what Swift is doing and looking at my objects and deleting my objects and stuff because one thing is sometimes if a test craps out in the middle, I'll have a whole bunch of containers and a whole bunch of different objects in the containers and I don't know how many, again, I don't know how many people here have used the Swift client tool. You can say Swift list and it'll show all your containers or you could say Swift list this container and it'll show you all the objects in all those containers. Personally, it was driving me crazy because all it told me was I had 38 containers, I had no idea how many objects there were or I could say list this container and then I would find out I had 10,000 objects in it would take forever to list it. So I wrote my own little utility and basically I added the ability that says when I list the container, tell me how many objects are in it and how much storage it's using, that sort of thing and then I discovered deletes are slow. You have a container, if a test craps out in the middle, you might have a container with 50,000 objects in it and for you to delete 50,000 objects from a container using the Swift delete command, it could take five, 10 minutes or something like that. Well, meanwhile, I had been doing so much stuff with my get put tool, I got really good at using the Swift API. So I said, yeah, I'll write my own little parallel delete capabilities. So now my Swift clonk, now when I use my little Swift command tool and I say delete this container, it will fire up 50 threads and do all the deletes in parallel and now I can do on the order of 3, 400 deletes a second which actually means you can delete, you can delete a large container within your lifetime. So here's kind of like an example of some of the output. I just kind of said, it kind of sort of looks like an LS command with Linux. So you say LS and it'll list, it'll show you how many containers I got, how many objects are in each container, how many megabytes that container is taking and it's also kind of handy because I can see what time the container was created or deleted up was created and then it gives me the name of the container and then I added a few other kinds of options in here so I could like list containers that started with the prefix so I could see some of the containers or I could delete all the containers that had a common prefix, that sort of thing. So that's kind of the main bulk of my presentation, if anybody has any questions. Sure, I'm starting to get into that process. Be it a positive or a negative HP is a big company and we've got these teams of legal eagles who make you jump through a few hoops but I'm going to try to jump through those hoops. Now, when you asked me if the material is gonna be available, did you mean the presentation itself? You know, to be honest, this is my first open stack so I don't really know how things work here. Is there like a central repository that presentations are made available to attendees with? I honestly don't know. Okay, well maybe some folks can help me out a little bit when I'm... Okay, okay, cool. Does anybody know what, you know, I'm not sure what happened to the schedule because of the whole keynote. Does anybody know how much more time we have here? I mean, do I have a couple more minutes? Because I thought if I had a couple more minutes, maybe people might want to see a demo of how some of this stuff works. So what I did was I brought up a couple windows here and oh, I bet you I got screwed by that projector. Well, that's okay, we can dance on our feet. Basically what I wanted to show you is at the very bottom, I think this is gonna fit. Yeah, so here's this little collectile thing running and he's gonna tell us what's going on. But at the top, let's start out real, real simple. So here's this little get put tool and it basically says I wanna create a container called C and I wanna create an object called O and I wanna write one object and let's write an object with a size of 100 megabytes, okay? And what we're gonna do is we're gonna run the test, put and then we're gonna run the test, get and then we're gonna run the test, delete. And what I want you to do is we're gonna get into trouble down in the bottom, because the headers aren't repeating often enough. So let's stick in the old header repeat switch and every five lines it'll print out a new header so you can at least see what's going on. So what I wanna do really is point your eyes at this column here, which is the kilobytes out over the network. And at the same time, you have two eyes so each one can look at a different end of the window, point your other eye at the CPU load. So what I'm gonna do is I'm gonna write a 100 megabyte object and what you're gonna see happen, hopefully, is you're gonna see a lot of CPU utilization. We're using 12% and you're gonna have to take my word for it. We're using 12% of an eight core box. So that's 100% CPU utilization on one of those CPUs. And what you're also seeing and it's kind of a fast test so it's kind of hard to tell. But what you were also supposedly seeing, well, you know, I can make this a little longer test. Instead of writing one object, let's write five. So what you're seeing is you're seeing this network traffic on the order of 27 megabytes a second. And we're consuming almost 100% of the CPU to do it. And the test should finish fairly quickly and then we're gonna start seeing the traffic on the inbound. So now we're doing the reads and you can see the read rates coming in. The other thing that's worth noting and it's kind of hard to see on the top because as I said, the output is wide in the window. The projector kind of, you know, diddles with your settings. So I'm gonna have to make this a larger font and I don't know if it's gonna make it harder to read or not. Oh, actually it might all fit. So the other thing I wanted to do, so you kind of saw the read and write rates. As I said, we were using this patch that didn't compress the data. I'm gonna add a switch that says compress the data and what you're gonna see is we're still using 100% of the CPU. We're using 100% of one CPU because it's only a single threaded operation. But now look what happened to the network rate. We're only getting 14 megabytes a second, 15 megabytes a second, 12 megabytes a second because that 100% of the CPU, he's spending all his time compressing these objects before sending them out over to wire. So the challenge, and I think this is an important one and it's purely an end user decision I think, depending on your application, everybody knows binary data doesn't compress very well. So if you've got data that doesn't compress very well, why are you spending all your time compressing it? You should be able to disable that. On the other hand, if you're having highly compressible data, yeah, maybe you wanna compress all that data. So we're in the middle of doing our reads and our reads should be pretty close to done. Oh, but they're gonna take, see, because we're only sending out smaller chunks so it's gonna take longer for the test to run because I told it to write. All right, so finally finished. The next thing I wanted to show you, and this gets kind of interesting, is let's do the exact same test, except this time I'm gonna use 1K objects. And if I use 1K objects, and let's not play with compression anymore because that's getting boring, I'm gonna do 10 IOs but now let's spin up, let's spin up 20 processes. So now what you're gonna see is I'm only using 17% of the CPUs and I actually did quite a bit of IO. If I were to now spin up 50 processes, and maybe I need a little, I need more than 10, let's do 50. What I'm trying to show you now is an example of consuming, no, I guess even that's okay, did I say? Oh, I know, I apologize during my, so I'm able to run 50 threads reading and writing 1K objects with no problem. I was using like 10% of eight cores or something like that but what it was that I wanted to show you, if I up the size from 1K to 1M, yeah, this is the test I wanted to show you. Now watch what's gonna happen. So I've got these one megabyte objects and I'm gonna have to control C this down below. What I wanted you to see, what I wanted you to see was those one megabyte objects somewhere down here, kind of scrolled off the screen. Yeah, right here, you see that guy? He's using 50% of the CPU. So if you're only looking at the CPU, you could fool yourself and say, shit, I got an extra 50% of the CPU, I'm fat city. But then you come over here and you say, well, wait a minute, I'm sending out 92 megabytes a second. You're kind of approaching saturation of the network link. So my point is it's kind of this double-edged thing going between network bandwidth and CPU load. And this is a case where if these were compressible objects, which they're not, you would have been able to send more over the network because you wouldn't hit that network bottleneck. So as an experiment, and I haven't tried this before, so we're in unknown charters, but if I say C type S, which says make them all the same, and I say, let's do some compression and fire this guy up, I'm willing to, I'm gonna guess that we're gonna use a lot more CPU now. Actually we're not gonna do anything. C type, oops, we need an S. Nope, invalid type. Oh, it's not C type, it's O type. Too many switches. Okay, so now what we're seeing, yeah, that went really fast. And the reason it went really fast was because I only had 20 processes. Let's make it 50 processes and 50 of those and whatever. But what I think we're gonna see, what I hope we're gonna see is, see, we're hitting 100% CPU load now. Again, these are one megabyte objects, highly compressible, and I keep making sure I don't step off the stage. And you'll see, we're barely touching the network. It's only like a megabyte a second, two megabytes a second. So, this is a demonstration that says, compression can be your friend if you have highly compressible data, you can run a lot more threads and get a lot more stuff out the network, but now we hit that CPU load. I'm sorry, the CPU limit. So, this could be another one of these situations where either you shouldn't be doing this or you need double bond to networks or whatever. But again, the real takeaway for here that I just wanna leave you with is there's a real big price to pay between the compression and the encryption that really can have a negative impact on the ability of a client to do a lot of work. It can do a lot of work, but maybe only a few threads. The thing I didn't show you, but you could do the math yourself that a lot of people tend to forget is if you've got a 100 megabyte network and you're able to do 25 megabytes a second, you can't run more than four threads on that size object. At the same time, if one thread is gonna consume almost 100% compressing and encrypting those objects, you're gonna need at least four processors. And there's really no magic in between. So that's kind of like the big thing I wanted to leave you with. And then the other earlier point I was making all about the latency and the impacts that latency can have on you. So, if there's nothing else, I guess it's probably getting close to the time to move onward. So I guess I wanna thank y'all.