 So, out of curiosity, not a full room, half, maybe quarter of the room, how many people here have used CEF before? Almost everybody. Okay, cool. How many of you are still using CEF today in production? About half of you, okay. And how many of you spent time benchmarking CEF in the past? Three, four people, five people. Okay. And how many of you enjoyed benchmarking CEF? One person. Okay. Okay, cool. So, today I'm going to be talking to you about SciBench, which is a new way to benchmark CEF. It's an open-source project written in Go, and I'm going to give you a bit of background about the project and kind of why we did it, but first, let me talk to you a bit about why benchmarking is painful. So, it's painful for a number of reasons. You need to run tests many times. You need to run tests for a minimum length of time in order for it to be able to test. You need to, there's lots of variables, loads and loads of variables, especially with CEF. When you make changes to a workload, you need to be only changing one of these variables at a time as you go through the tests, so you can kind of figure out and diagnose what changes are having what impact. And then when you're using those results, even if you've only changed one variable at a time, there's a good chance that more than one thing has changed and you just didn't realize, so that's always fun. So with CEF specifically, benchmarking becomes even more complicated. The first reason is, CEF has many interfaces, so we're not just interacting with CEF one way, we're interacting with it in many ways. You have block, file, object, or you could talk directly to the RADOS API. With CEF, you have massively varying workloads, so the different CEF protocols all have different workload characteristics, and so you need to take different approaches and kind of measure each approach in CEF independently, figure out, okay, well, so for RBD, this actually works really well, but for S3, this is terrible, or whatever. So that's kind of another issue. Another issue is that distributed systems need distributed benchmarks. So because CEF's performance scales linearly, as you add OSDs to the cluster, you also need to scale your benchmarking tooling in a very similar way to make sure that you're not limited by your benchmarking architecture. So the drivers or worker nodes that you're using to benchmark, or client nodes as some people in benchmarking will call it, need to not be the bottleneck, you need to have enough client nodes to fill up the CEF pipe. So another issue is that workloads can be invisible to you, because many people that are operating CEF are like, oh, yeah, I'm operating this as a service, and they go and talk to their customers, and they say, hey, what are you using CEF for, and their customers don't tell them, they just say, we want an object store, or we want a file store, and they don't say what, because in many cases, their customers don't know what they're going to be doing. A lot of people that operate infrastructure as service providers, and so the eventual need is kind of opaque. So you kind of figure that out over time. That's a bit of a challenge. And then another issue with CEF, which is always fun, is that the background work in CEF can also get in the way. CEF's doing all this stuff in the background like opaque deletes, it's doing scrubbing, it's doing self-healing kind of checks, and those things you have to be aware of as well when benchmarking CEF to make sure that when you look at changes you've made, you haven't inadvertently introduced another issue. So first, I work for a company called Softiron, and we use CEF as a core part of our product. And our last product, Hyperdrive, which was an appliance for CEF, we had this deployed in dozens of sites across a very large customer base, loads of different workloads, and we needed a way to figure out when we architect a new Hyperdrive cluster what's that going to look like for a customer. And today, we offer a similar product, which is a full end-to-end cloud, and we have the same problem, which is a customer's going to be using CEF in a certain way. How do we estimate what type of hardware to provide them, are we providing NVMe, are we providing SSD, are we providing hard disk, and in some cases kind of what combination. So we needed to figure this out as a critical, it was a critical blocker for our business. And so we started with this tool called Cosbench. Anybody heard of Cosbench? Also, the same guy that loves benchmarking. Oh, no, a few more people, okay. It was originally created by Intel, it's open source written in Java, and Cosbench was, for a while, the only tool you could do benchmarking of CEF with, it was targeted at S3 and Object, but it also did have some limited support for RADOS as well, and it was originally designed also to benchmark other Object stores so you could compare, like, your CEF deployment to an S3 deployment or your CEF deployment to Google Cloud or something like that. So we used this for a while, and we didn't really want to reinvent the wheel, but there's some problems with Cosbench. The first problem is that the JNI is expensive. If you want to use the Java native interface, or use Cosbench to measure performance for anything other than S3, because Amazon does have a pure Java implementation of S3, but for everything else, it has to, you have to traverse this native interface, and that's very expensive. So it completely defeats the point of doing benchmarking from Cosbench. Together, there was a number of other problems with Cosbench, which is, first of all, it didn't really have a maintainer, and didn't see any code contributions for years, so the Intel guys kind of abandoned it after a while, I guess they either moved on with their lives or stopped using it. There's also using this thing called OSGI, which was this horrid kind of bundle-based Java kind of, it was very much built as a monolithic application, and so Cosbench was originally targeting lots of different object protocols, and that may have made sense for that, but it really was a very fragile structure and didn't really work for us. So there were a number of issues with Cosbench, the manual workflow, no builder and sole system, and we kind of spent a bunch of time with Cosbench trying to figure out how can we make this better? So we tried to build it with Maven, how can we package it, document it, kind of use it in a sensible way, but at some point we kind of just abandoned the project because it was more effort than was necessary, so we ended up writing our own benchmarking tool and that's Sybench. So what was the goals of Sybench? Sybench, we wanted a tool that is simple and lightweight, easy to read, easy to run, easy to debug, so you know what's going on. We wanted it to be linearly scalable in the same way that Ceph is linearly scalable, so we didn't want it to get in the way or to benchmark Sybench itself, we wanted to benchmark Ceph. We wanted to benchmark all different Ceph protocols, so this is a tool designed from the ground up for Ceph. It had to be efficient and low-level, so we wanted something that lets us call out to C libraries like RADOS without performance implications. And we wanted it to have similar performance to FIO because FIO is kind of an industry standard for benchmarking and it's pretty low-level itself and FIO is great, frankly, and so we kind of wanted to be able to look at FIO and see some numbers and look at Sybench and see some numbers and see something that made sense as a result of doing that. So those main goals, and then the final thing is that we wanted to have some framework that gives us control over the data that we use to run the benchmarks, so basically we want to also control what data we're generating, not just using like dev random or something like that. So what's the architecture for this? It's written in Golang, so it's almost free to call out to C. It's both a daemon and a CLI tool, so you use it like a command line tool, but it's also running on your driver node as a daemon. It handles auth, so it takes the Ceph keys when necessary and the S3 keys when necessary as arguments and it passes them to the monitors of the gateways as it needs to. It's multi-threaded, so because it's written in Golang, it's very easy, but also by default every Sybench driver spins up a thread per CPU core on the worker node, and so you can control how many threads you want and play with that as a variable as well when you're doing benchmarking. And that could be useful if you have for specific workloads that do that. It does both bits or bytes, networking people who seem to love measuring things in bits, storage people who seem to love measuring things in bytes, no one cares, divide by eight, whatever. So then it also has this control of ramp time, which many benchmarking tools do, so you specify an uptime and a downtime and then it won't really measure the first three seconds or the last three seconds. And then finally, Sybench also is not focusing on, it's only focusing on the benchmarking. It's not focusing on all the setup and talking to the monitors and capturing the data and figuring out, we figured we wanted something that just did the benchmarking and then we ended up writing another tool which I'll talk about in a sec called Benchmaster, and that was a tool that kind of helps you orchestrate your benchmark, run sweeps, run multiple things. So I'll talk about that in a sec, but Sybench is just about running one workload and doing it well. So what's the architecture of Sybench? You can see on the right here, you've got a self-cluster, you've got librados, librbd, libcffs, rados gateway, librbd, and libcffs twice to demonstrate the file system. So there's from a Sybench worker, you can either talk to rados directly, you can use RBD images, you can use a mounted file system, you can use rados gateway, or you can talk the last two are kind of native, so this is like either a native file mount or a native block device. So you could theoretically use Sybench to benchmark stuff that isn't SAF, you can benchmark anything. So that's kind of what the architecture looks like. So for librados, you just provide a SAF pool, a SAF key and a monitor address, RBD, you just provide the same thing and it handles the RBD images. So with rados gateway, and this is something that Cosbench did very well, is you don't have to worry about load balancing HTTP requests or figuring out how to do HAProxy or something similar in order to deal with benchmarking S3, because it kind of, every worker can talk to a different rados gateway server, so there's kind of some inherent built-in load balancing just due to the fact that you have loads of workers and you can provide it loads of endpoints and then it will do that for you. So that's pretty cool. So what some of the other stuff you can currently do, you can do bandwidth limiting, so some customers have requirements such as, hey, we want to have 100 millisecond response time when doing 30 gigabytes a second of traffic, so this is a good way of getting latency numbers at a specific bandwidth, when you max out bandwidth, that's not always a good indicative measure of latency, so it's a good way of limit workers for maxing out the pipe, so you get a really good view of kind of what latency implications is this worker going to have. It has a slice generator, so by default it generates random data, but it's not very useful when you're trying to measure things like compression or deduplication, so you can put that random data into a buffer and then use the same data again for future workloads in order to see the impact. You can do read write mixes, so it's not just about doing a read workload, doing a write workload, you can also do like a 30, 70, 50, 50 split at the same time, because most workloads are going to be combined. Syringe also has support for individual statistics, so you can write out all the stats from a worker and then you can go in and do statistical analysis and a whole bunch of other kind of investigation to figure out what actually really happened, if the numbers don't make sense. This ends up being a really, really big file, so it's only worth doing if you're really confused. And then finally, Syringe doesn't delete anything, it doesn't clean up after itself by default, you can't tell it to, but it's because deletes instead of have a huge impact on performance and they're also silent, it's hard to tell when they're happening, so we don't do delete by default, but you can turn it on, and then that gives you kind of a representative view if you're doing those deleting. So Benchmaster, Benchmaster is a small wrapper for Sybench and also for Cosbench as well, because we kind of had a migration process moving from one to the other. It's for running a series of benchmarks rather than just a single one, so it allows us to kind of provide a set of options that we can sweep over, so I can say, hey, go and run a workload for 1k object size, go and run a workload for 4k object size, 16k object size, and then come back and give me all the answers. And then finally, it also writes to Google Sheets, so with Sybench I can generate a Google Sheet, and then all my kind of workload data ends up in there, and that's just a very easy way to draw graphs or use the output collected over time, so it's much more organized. So I want to show you guys kind of what a Sybench workload looks like. Figure out, hopefully the conference, can you still hear me? Hopefully the conference internet will allow me to do this, because I'm actually just doing it on a remote machine. Did it hear somewhere? There it is. Cool, cool. So you can see I have this self cluster. It's just a small self cluster that we have in one of our Berlin labs. It's backed by, it's actually backing an open stack cluster, so I could theoretically make lots of people's lives miserable if I do the wrong thing. That's fun. It has 36 OSDs, it's just three nodes. You can see they're all hard disk across three nodes, and they're all up. It's looking fairly healthy. There's some data on the cluster, not too much, about three terabytes used. So that will all look pretty familiar. So what's Sybench? The first thing is, I want to show you that Sybench is actually running. And for this demo, we're not really interested in the actual numbers, like the benchmarking data, because it's a very small test cluster, so it's not really about hitting really big numbers. This is just about showing you how it works. So here there's Sybench is running as a demon, and that's kind of how it works. And then we also have the command line tool, which has a fairly big help menu. But basically, for any one of the Sybench commands, I can then provide it a list of servers. And those servers will be the worker nodes. And you can run this command from any of the worker nodes, it doesn't really matter, as long as it has the demon there that's listening. It has an API that we'll just talk to, and it will send out the benchmark. Other things to point out, you can see the different sections. It has a man page as well, which is always nice. So some information about kind of benchmarking with Sybench specifically. Some guidance, more detail into like every different command and option for the tool. There's also like a website and some information on this. There's also a website which will give you detail on kind of how to download it and package it and stuff. Is that better? Not big enough? Okay, I hear that a lot. Right, so next we're going to do a basic benchmark. So yeah, so I've got some historical, let's see, commands, some of which worked, some of which didn't. So I'll run the ones hopefully to do. So here I'm going to do a Rados run, right? So this is going to be uptime one, downtime one. So it's just going to be a five second workload, and we're not going to measure the first or last second. I'm providing it the SEF key, which is I'm just including the command to print the key. And then I'm giving it the monitor at the end. So let's see what happens. So very simple, we've got a write stage, prepare stage and a read stage. The prepare stage was actually skipped because I didn't clean up the data. So it could just go back and read the written data. So you can see that the reads were faster than the writes. Very simple. So the next thing I'm going to do is I'm going to use Benchmaster to create a sheet. And I've actually done this before. This is the Benchmaster Help page. And you can see here I have sheet create. So I can give it a sheet name and an email address. And it will create a Google Sheet and share it with me. So let's try that Benchmaster, oh, can't spell. Sheet create devconf. And then I'll send it to myself. So I just got an email, open it up, and that's a very small spreadsheet. Oops, that's a very big spreadsheet. That's all right, cool. So there you can see the spreadsheet. And then let me go away and run a benchmark. Try and juggle seeing things on this screen and that screen. So now I'm going to pass, I'm going to run a benchmark. And the sheet I'm going to provide is the one we've just created, right? So devconf zz. And you'll see here that I've added a new option, which is the object size. And I've added 128, 512, and 1 meg. So we can see the different speeds at these different sizes. And I'm going to make this a five-second runtime as well, so that it actually doesn't take absolutely ages. Because it's going to do three, right? So I'm going to leave that to work for a bit. As it does that, I'm going to talk to you about the rest of, just go through the next slides, we can come back and review it in a second. I can figure out how to manage this. So some of the ideas for the future with sidebench. So we thought about doing a workload generator. So this is something that over time we realized is quite hard to map a customer workload or a set user workload to something that we're going to be, that actually creating a benchmark that represents that. And also the invisible workload problem, we had service providers that just like, we don't know what people are running on our system, sorry. So just give us something that works for everybody. And it's like, well, but that's not really how benchmark or computers work. So we wanted to have something that we could kind of sit there, like a generator, and say, OK, here's the workload that's kind of been on average over the last month. And here's a benchmark for sidebench that you can run to test other potential clusters for this workload. So that was something quite cool that we haven't got around to doing yet. We also wanted to do sweeps over OSD counts. So this is quite cool. Actually, I wrote a script that did this, but I never actually patched it into Benchmaster. And so the idea here was if you want to see, like, prove to yourself that the stuff that you're deploying is actually scaling linearly in performance, what you can do is remove a whole bunch of the OSDs from your self cluster, and then kind of add them in gradually over time per node. So let's say you have 20 nodes. You just scale it back to three nodes. So you could just do a basic triple rep, run a benchmark, add in the fourth node, run a benchmark, add in the fifth node, run a benchmark, and then watch. And then if it's a linear graph, then you're happy because it's scaling linearly as was promised by Sage and the other gods. But if not, then you're in trouble, right? So I actually did this many times, and it did work really well, but just never got it into Benchmaster. So that would have been nice. So meta operations, so there's a whole bunch of stuff in Cep that isn't just, like, running, like, reads and writes, and there's other stuff that you might want to do, and so it would be cool to look at, like, how do you measure the impact of a snapshot or other things. There was support for Kubernetes, so obviously Kubernetes has a fairly mature CSI driver for Cep. It works really well, both file system and block, and so it would be cool to figure out, hey, you know, can we have Benchmaster go away, spin up a Kubernetes cluster, spin up a bunch of pods, map the storage class or the kind of the persistent volumes into pods, run the benchmark, spin it down, and then spit out the results and then kind of make that repeatable process. So that was something we never really kind of got round to either. And then, yeah, if you guys have any ideas, you know, either leave an issue or some merge requests, perhaps it's welcome, you know, it would be cool to see what you guys have, if you have ideas as well. So while I've been talking, hopefully, if I can find my cursor again, we can see that this benchmark has completed. And remember, I ran three benchmarks with Benchmaster, so it would have gone and told Sidebench three times, go and do this, go and do that, go and do this, and then spat out the results in the spreadsheet, and there we go. So we have three results. You can see they're written with Sidebench, the different sizes, the time. We can see that the right bandwidth differed massively between the different three object sizes. Same with the read bandwidth. Seems to have interestingly gone down between 1 to 8 and 512, which seems kind of dumb to me, but probably because I did a five-second run or something. And yeah, that's the benchmark. So yeah, any questions? So is your question, the question is can you take an FIO configuration file and use it with Sidebench? There isn't, but I actually do really want that as well, because doing it on the command line is very annoying. So yeah, we can add, that would be a very trivial change to add as well. It's just about, yeah, exactly. So, but yeah, that's a very good question, and I think we should do that. So that's another idea for the future. I'll be at a minor one. It does, it does. Oh yeah, the question was, is it also possible to count the IOPS? And over here you can see there is a latency value, which translates to IOPS if you multiply. That's basically IOPS, it's just latency in milliseconds. Cool. Any other questions? Bavar? Well, so CBT, so the question was, have you finally given up on CBT? And I never gave up on CBT. CBT is a wrapper for lots of tools. And so in a way, it's kind of like the Benchmaster thing that we did. And actually I gave this talk in, at one of the Seth days a few months ago, and the guy that wrote CBT was in the room. And he was like, hey, we should talk about integrating sidebench into CBT. So I think that would also be a very cool thing to do. But I haven't played enough with CBT to be able to say, you know, the value of like comparing it with FIO, or I know it has support for Rados bench, it has support for Cosbench. I think it has support for another one of the go-based benchmarking tools. So yeah, I mean, I think that's a worthwhile thing to look at, but it wasn't useful for us, for our purposes. So we wrote this very much kind of for what we needed. But that's a good question. Cool. Any more questions? Well, thank you very much.