 Okay, let's settle down. Our routing is here. Last but certainly not least, Patrick's going to tell us about Ceph, Zen, and Cloudstack. All right, thanks. Maybe that should be last but certainly potentially least. I have the unenviable task of being the last guy between you guys and beer, so we'll see if we can't get there a little faster and stay ahead of this wonderful early boon here. I'm Patrick McGarrey. I'm one of the community monkeys working for Ink Tank. We're the company that's bringing Ceph to the world. How many of you guys are familiar with Ceph? All right, awesome. That means there's a whole chunk of my presentation that we can just fly right past. So that'll be even better. Just a little bit about me. Like I said, I'm working for Ink Tank, mostly doing the community stuff for Ceph. A little bit about me. I started off cutting my teeth to the community stuff working for Slash Dot, which is where I met Ross. For those of you who don't know him, he was SourceForge and I was Slash Dot. Did a stint at ALU to get that hole. I worked for a big company thing out of my blood and now I'm done with that forever. Worked for Perforce and now I'm here at Ink Tank. I finally feel like I've come home to the open-source world again, which is really nice. I'm also ScuttleMonkey on SlideShare if you want to see these, they're up there. So here, what are we gonna talk about? I'll breeze through the 32nd overview of Ceph real quick. I had a little deeper dive on Ceph, which I can touch on a little bit in case there's questions. Go ahead and stop me or whatever, but sounds like most of you folks are probably gonna know the basics, the 101. We'll touch on Ceph in the wild, which is the important part, right? That's the cloud stack piece, the zen piece, and then all the rest of the various things we can touch on a little bit too. Something I've been geeking out a lot about lately is the orchestration piece, so I like to touch on that a lot. Compare and contrast some of them. The nice part about it is Ceph plays with basically all of the ones that I've seen thus far. Learned about a new one this week. It was Deploy, which I know nothing about. So maybe not that one for Ceph, but can talk a little bit about the community status in case there's questions about that. Some what's next, and then if there's any questions we can wrap those up at the end. So what's Ceph? Besides Wicked Awesome. It's software. That's the biggest distinction that I think some people don't understand, especially when we say we're here to steal EMC and NetApp's lunch money. It's software-defined storage. It's just a software daemon that runs on Linux, so that's basically all it is, and it's really cool because it runs on commodity hardware and allows us to do storage really cheap with no single point of failure. The other thing that's really interesting is it's all in one. It's object, block, and file all in a single cluster, and a lot of intelligence built right into it. So a lot of people that are familiar with storage admins, they'll tell me, well, how many admins do I need to run it? And they'll tell me in the history I'll have N terabytes per storage admin. It's kind of my rule of thumb. Everybody has some rule of thumb that they go by. My favorite anecdote to answer with is, well, Dreamhost has two Cef clusters, one's three petabytes and one's five petabytes. They're both run by a single guy part-time, so that usually kind of blows their mind for a minute, which is fun. And then Crush. Crush is the secret sauce that makes Cef so powerful. It's the part that makes Cef infrastructure aware, and it's the placement algorithm that handles all of the data placement, so it's really cool stuff. It was the original research that was done by Sage at UC Santa Cruz that kind of led to what Cef is now. And then my favorite part about Cef is the scale. It's meant for huge, huge amounts of data. It was originally designed for supercomputing applications, so it was designed for exabyte scale. So we haven't hit a limit yet. I'm looking forward to the day that someone decides to try. So yeah, that was fast. You can find out more. There's all kinds of good stuff. Cef.com, our doc writer is kind of superhuman. I really, John Wilkins, he's amazing. And then if you want to use it, you can play with DreamObjects. It's public, it's basically an S3 competitor, and it's all based on Cef. And of course, if you know anybody that wants to pay for it, Ink Tank is more than happy to take their money. So this is kind of the architecture diagram that explains how Cef all fits together. Underneath it all, as I'm sure most of you know at this point, it's an object store. We get some really cool things for free by doing an object store underneath. We don't have to worry about, you know, the hierarchy of basing it on files or anything like that. You know, a lot of the incorporated metadata and things like that that goes along with it. Although interestingly, Cef doesn't actually have a whole lot of metadata unless you're dealing with CefFS. But on top of the object store, we expose that via three different interfaces. We have the RESTful APIs, which are the S3 and the OpenStack Swift gateway. We have our virtual disk, so that's our block device. So the CefRVD. And then we have the CefFS, which is the POSIX compliant scale out file system. Although for the developer centered folks, I usually like to show them this picture because there are actually two object interfaces. One of them's the low-level library interface if you wanna roll your own. So, of course, the basics of Cef is you start with some amount of disks. You have a big pile of disks in a data center somewhere and you're gonna throw some arbitrary file system on top of that. We decided we didn't wanna reinvent the wheel. So we took advantage of, you know, our favorite is butter, but obviously it's not quite there yet. It's been not quite there for about a decade. But we think it's the future. We hope it's the future. There's some really cool stuff, some underlying cloning and things like that that really make it cool. Most of our folks that we have that are customers running in production are running on XFS. The larger extended attributes make that pretty cool. That's what gets most of the power from that one. Of course, you can run it on X4 and now ZFS. My presentation is out of date. ZFS is also in that list. And then, of course, that's one rack-mounted server and you have many, many, many of these OSD machines. That's what the OSD is on top. That's the software daemon, the object storage daemon. So you have many, many, many of these servers and then some small amount of monitors in there as well that are kind of the air traffic controllers. They heard the cats, they do some authentication stuff, but they actually are not in the data path. That's the cool part about CRUSH. CRUSH is that placement algorithm and it's what allows the clients to calculate where the data should go or the way that data should be living and go directly to the OSD. So there's no, none of that single name node lookup slowdown that you have to worry about. It's the pseudo-random placement. It's the thing that also you're able to kind of define via a CRUSH map what your infrastructure looks like in the data center. So you have n number of disks in Y servers, in X racks, in some number of rows. And based on that, you can then create rules about where you want your data to live. You want a fast data pool. You can say this pool data fast is going to use my SSDs. Or you can combine by doing some number of spinning rust with one SSD that handles the journal for all of them. Or you can create your own failure domains based on power circuits or whatever. So CRUSH is actually relatively simple but also quite powerful in terms of what you can do with it. This is the part we can probably breeze through. It just handles a bit about what happens when you wanna stuff something in the cluster. You take it and you hash it into some number of what we're calling placement groups. These are just the by default, their four meg logical buckets that we cram into the various servers. And so what happens is your client, you're gonna hash that and based on CRUSH, you'll know where it needs to live. And so it'll send that to the OSD and write. The OSD then based on your replication level will then peer to peer with the other OSDs based on where it should live based on CRUSH. Send those out. When those writes have finished and acknowledged the primary OSD, then it sends the acknowledgement that the write is done. CEP is a highly consistent system so that's how it has to work. Of course, you do this for many things all at once and you see you kinda get this random distribution of data that's nice and pretty and even. And when the client comes in, uses CRUSH to look up these things, it will read the original copy or if the original copy gets, your half your data center burns down or something and it'll know where the other copies need to live and it can go there. Speaking of which, if you have a node failure, these OSDs are actually peering all the time and saying who's up, who's down. And if you run into a case where an OSD thinks, some number of OSDs are always reporting when they think one of them's down to the monitors and the monitors are actually using a Paxos algorithm to make this decision making of who's up, who's down, et cetera, et cetera. So when the decision is finally made that one's down, the OSDs that have the replicas of that data will know, hey, we're no longer kosher with our replication level so we gotta fix that and it'll automatically peer with the OSD based on the new CRUSH map, where it needs to live, move the data, and then the client will know where to go. So cool, we were able to breeze through that pretty quick. Now we'll talk about stuff in the wild. Anybody remember that show? Wild America with Marty Stoffer? God, I love that show. Ha ha ha ha ha. So Linux distros, no incendiary devices please. We work with a pretty fair number of different distros. Obviously our roots are pretty heavily in Ubuntu. That's where we originally did most of our writing and testing, but now we're in Apple. I hear rumors that we'll be rel happy very soon. But yeah, there's packages for all these guys. So it's at this point pretty easy to deploy depending on how you wanna get it out there. Of course, OpenStack, there's a lot of stuff here. I will kind of breeze past that very quickly. But of course, the nice and fancy one, the CloudStack. This one is a lot of fun because I love this story as a community manager because this integration came entirely from the community. This wasn't something the intank decided, hey, we're gonna do this, it's strategic or biz dev or whatever the hell, it's a guy in the community said I'm using CloudStack and I wanna use Cep and he wrote it. It's a veto from 42 on if you guys know him. So right now you can use it as alternate primary and secondary. So a lot of the snapshot and backup support stuff that he's been working on is coming in 4.2 and I just talked to him last week and it's all done, its package is ready to go. So we're just waiting for the arrival of that with 4.2. He's also working on some RBD Java bindings for some of the other stuff, but right now QMU and Libbert are creating images by format one by default but he had to do a little bit of hacky stuff to make format two work. So that's kinda where that's at now so I guess it could use some polish but the functionality will be there in 4.2. Obviously the RBD will be primary and you'll actually be able to not have to have that little NFS mount that has been required thus far and then we'll have the gateway S3 interface for the secondary stuff, the templates and backups and ISOs and stuff like that. Next one, all right. And this one, I'd blatantly ripped this off from Vito. It's a good diagram of kinda how it works. Whether it's KVM or Zen, this is kinda the logical flow of how things fit together. The management server talks to the agent, runs KVM, hypervisor, et cetera. It's all right there, I can leave that up for a minute. But the important part here is the management server never talks to the SEF cluster so it kind of keeps that logical separation so it makes it easier. There's not an extra layer of code that we have to manage or anything like that which means given how that all lays out, one management server can manage thousands of these hypervisors but also that the management servers can be clustered. So I mean you guys are probably all familiar with CloudStack but the cool part is that he's actually started playing with different implementations of having multiple SEF clusters to do different workloads. So multiple pools, region stuff. So he's actually been a good test bed for some of the region stuff. If you guys saw Sage's talk this morning about a lot of the geo-replication that we've been working on. So there's a lot of thought that's going into that. The gateway and the block device, they have answers to geo-replication mostly from a disaster recovery standpoint but the next thing that's on the way that we're all really excited about is that the underlying RATOS infrastructure is actually gonna have the ability to define regions and zones and do multiple geographically aware pieces. So that's nice. Of course, I couldn't say Cloud without talking about some of our other friends. We're in the SUSE Cloud. We work with Google Ganetti, Proxmox, Open Nebula. There's actually a talk next week in Berlin if any of you guys are gonna make it that far that fast. The Joel Merrick from the BBC is talking about his adventures in research. So he's talking about some of his experiments with Zen and KVM, OpenStack, CloudStack, some of his Seth stuff in there. So there's some really cool stuff. It's always nice to see it through the eyes of a user but it's definitely worth it if you're gonna be in the area or I think they might actually be live streaming that event too if you can find that. Beyond the Cloud stuff, project intersection, we have obviously close ties with the kernel for a long time. We have native clients for RBD and CFFS and a lot of active development in the Linux kernel. Alex Elder, one of our guys, actually made the top list of the report that came out this week that we're all very excited. We thought he got major cool points for that in the office. We have things like a Wireshark plug-in. We've done some work for iSCSI via the TGT library, working on LiOnext and we're actually, there have been some very creative solutions where people were using VMware because they wanted to use Seth but they had Windows infrastructure they had to deal with. So the guy got really creative and did a fiber channel into Seth so that he could back his VMware infrastructure. So definitely some cool hacky project intersection as well as we're a drop-in replacement for HDFS on a Hadoop, we're upstream in Samba. The Ganesha project is being used to re-export both CFFS and RBD, so the file system and our block device as NFS and SIFS and of course, the Zen server stuff. So doing it with Zen, all things Zen. This has actually been really exciting for us because it's another thing that is coming a lot from the community. We didn't really push hard for that but we're definitely happy to help it along. Obviously the support for Libvert looks like most of the faces around here are all Zen experts so I'm not gonna try and talk to you about that because I know far less than most of the people here but obviously we started with the block tap driver two and three and now it's kind of exploded and it's moving to the QEMU, the new boss same as the old boss but the one thing I did wanna touch on is something that's kind of a point of confusion for the community, from a community standpoint which is the naming stuff, right? Zen versus, or SIFS versus Zen server versus Libvert kind of how we talk about things. We talk about a block device, Zen server talks about a VDI and Libvert talks about storage volume, right? Or pools versus storage repos versus storage pools. It's just kind of different vocabulary and the guy who's been doing a lot of the work on the SEPH and Zen integration actually had a really good talk in London, I think it was. It's made the rounds on YouTube, you've seen it. I ripped this off from him because it was just perfect. It gives you a really clear indication of kind of how this stuff is falling together. The client stuff, cloud stack, open stack, Zen desktop, whatever it is, going through Zen API stuff and you've got your domain manager which goes down to your kind of the Zen control library and the standard Zen libraries and this QMU obviously being the, what is it called in Zen, vernacular upstream QMU? Not the older one. And of course on the other side, we've got the storage plugins, the SM adapters which talk through Libvert and lets you to do, this is the experimental part that's kind of out there as a tech preview right now that allows you to talk to things like Ceph or like OCFS2 or things like that. Pretty exciting. I like to refer to Ceph as a gateway drug and Zen, cloud stack, open stack is a great example of, we see people come in that way, they'll come in for block and stay for the object and file. But we already talked about the reduced overhead but it's really exciting to see some of the prototypes that come out of that. Somebody will say, okay, I need block storage and they'll get the block storage and they'll be happy. And they'll say, well, hey, there's a couple other interfaces here I can tinker with. So we're seeing a lot of like the Intel guys. I think he might actually have to go to addiction counseling or something. He's done 700 patches in the last three months for CephFS. So it's pretty neat to see just kind of how this works. Putting it all together, you get one piece that brings you in and people kind of go nuts from there. So I can talk a little bit about the block object and file but I think I'd probably breeze through these and if there's questions, we can touch on those. So the cool parts about each of these for Ceph, obviously the being Ceph, talking about some of those wins that we got from having an object layer underneath. One of the really cool parts about that in terms of a block device, it allows you to do cool things like squash hotspots. So things like because you're taking that block device and striping it over a number of physical hosts in the object layer, you actually get to parallelize a huge amount of your workload and so you can have your block device be arbitrarily large or arbitrarily busy and it doesn't matter. We can also do things like instant clones and live migration and all that stuff because it's all the same storage back end. The object, like I said, we do Swift and S3. They're well-established APIs so it's nice. If you have an app that you've written to use S3, just change the endpoint and you're done. This is the secondary storage part for CloudStack and there's also some very easy horizontal scaling. It plugs into existing things like you can just put them behind an HA proxy box and you're ready to go. And this is the file system which I haven't spent a whole lot of time talking about and just as we haven't spent a whole lot of time QA'ing it which is kind of why we aren't telling people to go ahead and use this in production. We've obviously focused our efforts on the object and block part of the house because that's where all of the demand thus far has really been. The thing with the file system is it's the only time that you have to introduce the metadata server piece of the SEF kind of family of nodes and that's only the directory timestamp kind of metadata stuff. Again it's not in the data path. For the data you still go directly to the OSD, you just also have to pull that extra little bit from the metadata server. And this also has the ability to be horizontally scalable. You can turn on many, many, many and this is one of those experimental parts that hasn't been QA'ed a lot but the SEF metadata servers have the ability to spin up many of them and we have actually what's called over-calling dynamic subtree partitioning. So as your directory gets busy, you have hotspots all the way down to a single file or a part of the tree or whatever. As there's use, these metadata servers will kind of shuffle the load between them and you can even have a single metadata server serving for a single file if it's that busy or whatever. So that's one of the really cool, exciting parts about SEF that we just haven't been able to get to yet and there's a lot of guys inside Ink Tank that really want to work on it. We just haven't had the time. So this is the deployment stuff that I've been geeking out about. I always like to touch on this stuff. One of the most often asked questions that I get is, okay, SEF sounds cool. How can I use it? How do I get there from here? So I like to touch on the orchestration stuff. Obviously, Chef and Puppet, these guys are the, maybe the 200 pound grill is in the room. They're the mature options that most people have heard of. Chef being more of the dev side of dev ops and Puppet being more of the procedural sys admin kind of crowd that they're aiming at. Ansible and Salt though are kind of the other end of that spectrum. They're the ones that have kind of gone from zero to hero in very short amount of time and each of them has their own stuff. We heard about Salt. Salt is really cool because it's fast, fast, fast. I've seen some people do some crazy, silly things with Salt at scale, deploying thousands of things at the same time and it's just ridiculously fast. Ansible is kind of neat. I like it because it's agentless. So it's very lightweight. It's a very light touch. It has kind of a different mindset. And then there's some more options. Juju is kind of my favorite when I'm just tinkering and playing around with things. It seems to work the same way my brain does, which might be backwards, but I'm not sure. Canonical has done some really fun things. Making it relatively agnostic. If you already like Chef and you wanna do stuff with Juju, you can just use your Chef recipe and wrap it in Juju. Same with Puppet or same with Python or Bash or whatever you wanna use to deploy. You can wrap it in Juju, which gives you the advantage then of being able to talk to Mass. If you guys know Metal is a service, it's their bare metal things and you can go all the way from the bottom all the way up with Mass and Juju and it plugs together and it's really cool. Some other hitters there. There's Crowbar. Dell has some skin in the game. Come on, ITI threw in there just basically saying, there's a lot of people that are doing home rolling their own thing. There's so many flavors out there. And then ChefDeploy, which is kind of our do it without a tool. This is our quick eight commands or whatever to get yourself to a Chef cluster if you really don't wanna use somebody else's overhead. Community, I'll touch on the community stuff just a little bit. Just wanted to throw some slides in there for the people that are voracious about downloading stuff off of SlideShare. Ink Tank, a little bit of history on Chef. There were kind of four main periods of Chef development and this actually shows the number of authors cumulative to kind of show the growth of each of these inflection points. There was the research project, which is the first block at UC Santa Cruz. There was kind of an incubation period where Chef was still being developed inside of Dreamhost. And then the launch where it got spun out into Ink Tank and then kind of the growth period that we've seen with things like integration with OpenStack and CloudStack and some of that stuff that's really kind of been our next inflection point. We've seen some really cool code contributions. Just wanted to show this that the employee additions versus the non-employee contributions, it's up and to the right. I don't want to spend a lot of time on this. Commits, so this commit graph is actually pretty cool. I don't know if you can see very well, I guess not. So the blue at the bottom is Sage, one human, although human is relatively debatable. The yellow is obviously Ink Tank contributions and then the purple is community contributions. So those are really started to take off in recent history here. So a lot of it's our friends from Intel, but obviously there's a good number of people. Daniel Gaff did a whole bunch of work too. He's a big part of that pie. And then the lists, the blue stuff is us and the yellow stuff is all community. So this is list and I didn't want to put it on here to duplicate, but the IRC participation looks almost exactly the same as the list. So what's next? So the Seftrain, the Ink Tank plans really are about geo-replication which Sage talked about this morning. Definitely check out his slides. He has some really good deep dives into kind of what the thinking is around that stuff. There's a lot to think about, it turns out, all the way from clocks all the way up. So lots of stuff to think about. The erasure coding stuff is actually near incompletion. This has been some good work here. This is especially interesting as it relates to the other thing next to it, which is tiering. Some folks want to take and have multiple dynamic tiering. It gets hot and it goes up to the SSDs, it gets cold, it goes down to some platters. But in combination with that, those platters, people want to do erasure coding so they don't have to have as many replicas and they can squeeze some more dollars out of the bottom end. And then governance. We are pretty open, but we want to make it open, so we've been talking a lot about this and kind of progressing our every quarter after or as we approach our next stable release, we hold our Ceph Developer Summit, which is a virtual summit. Thus far it's been on Google Hangouts, but that's, I think, not going to continue because we have too many people for it. But we hold blueprint process, so anybody that wants to write something or wants to see something written, they have a submission window where you say, hey, here's a blueprint, this is what we want. And then we all get together at the Developer Summit and talk about how we're gonna get there. Of course, I wouldn't be a community manager if I didn't plug Get Involved, the Ceph Developer Summit, which I talked about. We have a number of Ceph days. We've done three at this point. We did one in Amsterdam, one in New York, and one in Santa Clara last week. We have one coming up soon in London, second week in October, I think. So if you want ideas, Developer Summit and Ceph Day are great places to do face-to-face, meet space communications where we figure out, hey, what's happening, how can I help? Of course, IRC and the Lists. And if you're looking for project ideas, we have project ideas on our wiki. Red Mine, obviously, is the easiest place because that's where everything kind of is held in brain trust. And again, IRC and the Lists, so. Questions? All right, we've breezed through that in no time flat and we'll be out of here early for beer. That's awesome. Let's give Patrick a round of applause.