 This is John Dickinson speaking for an introduction to OpenStack Swift. So, my name is John Dickinson. I am the project technical lead for OpenStack Swift, which is object storage piece of OpenStack. And so I was on the original development team for Swift when it was first open sourced. I've been around OpenStack, been from forever. So it's been a really fun ride. And what I really want to talk about is not so much like, here's how all the little pieces work of Swift or things like that. I want to give a little bit of an overview, just because I think it's always good to keep in front of people. Here's what Swift does. There's kind of some of the big features in there, especially some of the later features that have been talked about. And I want to talk about maybe some of the more operational concerns. I'll focus on those more than say, like, here's what a developer would do for API usage, or here's what a contributor would need to know for, say, data placement and stuff like that. So I'll try to focus on that. And I don't have a lot of time, so we'll dive right into that. So the first question just in general is why do we need this? And I love hearing especially about some of the new things that are being talked about inside of OpenStack to give a little more dynamic deployment options and scalability things, which is really, really exciting. And one of the reasons it's so exciting is, of course, because we can't get away from the fact that the reason we need all of this is because people are using applications differently. And they want to do different things. And we've got mobile devices and we've got websites and we've got little embedded devices and everything in between that just basically means you have a lot of data that's being generated and consumed and accessed all at the same time. And you need it now. You don't need it later. You need it as soon as you can. And you need a way to store that in a way that's scalable and in a way that you can have access to it right now. So that's kind of where we got to. That's where we are and why we have this thing. And inside of, you know, some of the first initial use cases inside of OpenStack in general and then a lot of the things that people first approach Swift with are just the kind of basic day-to-day things that we've been doing, we should have been doing for decades and stuff like backups. You've got data and you're just like, great, well, I need to back it up. Let's put it in Swift. Great. If you're inside of the OpenStack deployment, just like the full cloud deployment there, you're going to say I have images and snapshots for VMs and for block devices and things like that. So where do we put that so that I can persist it and get it later? So that's kind of the use cases that we have is that data that can grow without bound that needs to scale but also needs to be available all the time and it needs to support just a whole lot of concurrency across the entire data set. So that's where Swift fits in and why as part of a complete cloud deployment, you need something like that. You need something more than what we used to do. So what is it that we used to do? Used to you had data, you want to put it someplace, you plug in a hard drive and you're good to go and then you hard drive fills up and you think I'm going to buy another hard drive and then that one fills up and you just keep going with that and you really haven't solved the problems because now you've got all the complexity of figuring out where things are going to be placed and how you're going to do that. So that was kind of the beginning of the journey towards where we are today. You start trying to figure out how to place all your data out there so it can be persisted and then you have your really bad day when one of your hard drives fails and you started to realize I need some redundancy in this and then you go buy a raid card and that's going to solve all your problems. So now you have raid volumes and you're going to fill up a raid volume and then you have to get a second raid volume and then you've really just not solved your problems, you've just made them bigger and harder and so the point of Swift and this is kind of the intro, the whole like here's the elevator pitch of what the point of Swift is, is to be a system that will abstract your content away from the media on which it's stored because that means that you can change any of those at any time without affecting the other one. So if you upgrade to some new technology for storage media, you move from spinning drives to flash to non-volatile RAM to competronium or whatever the next thing is going to be then you can do that without actually losing the data because the system itself is managing, here's how you durably store it across these and you're built into the system itself is handling those failures and working around them. So it matters as over time as you're dealing with capacity and moving with new things but it also matters when you're just dealing with I need a patch of kernel and reboot a machine and those are the kind of things that you can do without worrying about impacting the data itself and losing it and losing availability to it and on the other side it also means that the data itself you know is then going to be long lived and decoupled from the media so that you don't have to think about those hard problems of storage. What are the hard problems that Swift stores solves? So I wanted to, I'm kind of going through quickly here but I wanted to say basically from two perspectives what are those hard problems and why does, why is Swift able to solve those or why is that important? First off, from the developer perspective, so the people who are actually writing the applications you don't want them to have to actually think about those operational concerns. You don't want to have to think about where do I actually, which particular hard drive am I putting my VM image on? Where am I specifically snapshotting this blog device to and where specifically am I storing this video file or this cat picture or you know whatever it's going to be? The developers don't want to have to worry about that because that's not actually the value that the application that they're writing has. The value that the application has is they're able to, it's going to be like the next awesome game or it's going to be some document management system or it's going to be anything along, it's the radio astronomy telescopes that are being talked about in the other rooms right now. It's the new things, any application that's being built is trying to build some kind of value on top of a process and things like that, but the hard problems they're doing is not, you know, I'm going to figure out how to store all my images that are uploaded. That's not what they need to concern about. They need to treat that just as a utility as in, throw some data at it and later on I can get it back. But the other side, and this is what I wanted to focus on a little bit more today is from the ops side. I want to solve this guy's problem. I want him to not have to worry about what's like that. So the point is, okay, so the ops guys, what do they have to do? What are the hard problems there? You want to have to deal with capacity management. You're going to have to deal with durability of the data that's persisted there. That's kind of the responsibility of those ops guys. You need to figure about scale. Like how much capacity do we have available to provision for storage? How much capacity do we have for serving it in and out? Those are things and sometimes they need to be scaled independently. You want to deal with upgrades. Like maybe there's this flag day where some common library has a bug and you have to reboot the world. I'm sure that would never happen. But if it did again, then you need to worry about that. Then there's just the day-to-day stuff of I need to upgrade my kernels. I need to add new functionality. Maybe I just need to reboot because I needed some new monitoring tools. I don't know. You have to deal with, as the operators, you've got to deal with other things that integrate alongside of it that aren't actually part of this. Like I want to put a cache on front of it, whether that's just something like Squid or Varnish or that's a whole CDN infrastructure and things like that. Those are kind of the hard problems that the operators have to think about. I would argue that just like the developers need something that they can offload the hard problems from, they need to be able to not think about... They need to focus on providing value in their application and not the storage. The operators also need a system that they can offload the hard problems of storage to. They don't want to have to wake up in the middle of the night to swap out a hard drive. That should just be handled seamlessly. They don't want to have to come in on their on-call weekend because a server failed. Because that happens a lot, especially if you have a lot of servers. Those common things should be taken care of automatically. That is why we need something like Swift alongside of the things that are able to run those applications scalably and dynamically, the things that are provided by Nova and Neutron and Cinder and the other pieces of OpenStack. That is how it all fits together and the why of Swift in my mind. That being said, just extraordinarily briefly, that is how Swift works. The client talks to something called a proxy server and the proxy server talks to something called a set of storage nodes. The proxy deals with API requests and the storage nodes deal with persisting the data. The really cool thing is that if you need to scale out one of them, you can and you can do that independently. If you need more capacity on your storage hardware, you can add more storage nodes. If you need more capacity on your throughput and your bandwidth or connectivity or something like that, you can add more proxy servers. They don't share any state so they're all horizontally scalable and they can come and go as they need. You'll generally end up putting something behind a load balancer and running it that way. That is how things work. A couple of other features I want to talk about, just a few things. Not this big list. This big list is really cool just because these are things that were all developed as a result of people from the community saying we need something to actually... We need to solve this particular problem. To me, looking at every single thing except for... It's on this list except it's not even on that list. Basic CRUD operations are things that people running Swift in production have come to the community and say we want to contribute these things back. I do want to talk about two specific things that especially are nice from an operator's perspective and are kind of newer features. One of them is global clusters and the other one is storage policies. Global clusters very briefly allows you to deploy a global cluster, one logical cluster. This is how it works. If you have a multi-region cluster, say on different sides of the world, you have your cluster configured safe for Portland and Hong Kong. When you upload your data into the cluster, it's going to be routed to what's the closest. One way you could do that is something like the GATO DNS or something like that. The data is sent into that local region which gives you nice kind of locality of access and things like that. And then the cluster can either at the time of the request or as it's shown here, kind of asynchronously replicate it across the network to the other regions. So now you have that kind of full durability across all of those failure domains and that's kind of nice. Now when you have your data stored in there, you might send it to a friend and they could be on the other side of the world and you send your cat picture to the friend and they get the link and say I want to look at that funny cat picture. So they are again routed to their closest thing, the data sent back and that's how that works. So that's kind of how the fancy video I love about how global clusters work. But the point is on this and especially combined with the next thing I want to talk about which is storage policies is as an operator you can configure your cluster to reflect exactly the kind of things you want to expose to your users, whether that's geographic access or particular tiers of storage or things like that. And this is something called storage policies that was completed in July of this year. So this past year I guess. And what storage policies allow you to do is set this up to say that here is a part of my cluster that is going to be identified by a particular set of hardware. So that could be hardware that's only in my particular location and it could be hardware that has a different class of service. So it could be things that here is something that's going to stay inside, something that's going to stay in Europe. And here's data that's going to stay in Asia and here's data that's not going to be in North America or something like that. And then you can also configure it say on, say this is flash media, this is going to be spinning media and you can expose those in various ways. And the other piece of it is with each storage policy you can configure how things are durably stored across there. Say what's your replication policy on it? Is it 2x, 4x, 3x, whatever you need. And then right now we're actively working on supporting erasure codes on top of that too so you can have non-replicated storage. So all of that, how do you get started? This is the company I work for, but we do have a free sign up you can do that. It's really a very easy way to do that. You can go to SwissStack.com to get a little free trial there. It is 100% of the open source upstream code so it's not that. But if you wanted to roll your own you can come ask us questions on IRC, go look at the code and some of the developer docs there. And also there is an O'Reilly book of which I have some right here. So if you'd like a copy of this book come find me and I think I've got 13 of them to do that. So unfortunately I'm not going to have time to talk about fun stuff like failure handling and sad servers and things like that. But as far as important things to worry about inside of your clusters from operators, when you're deploying these clusters you're going to have to build out some sort of monitoring system and you're going to want to be able to have your graphs. You want to integrate with Nagios plugins. You're going to want to know more than just what's your disk capacity but what are your actually Swift metrics and those things that are going on. I want to, there's lots of stuff I'm not covering in here like your data placement and your production designs and API features and stuff. But I do want to kind of highlight a little bit of this even though I say I'm not talking about it. Kind of on the operational side things that I wish people would think about when they do this is building out Swift. Don't try to build your global cluster immediately. I'm going to start with, I'm going to have 500 petabytes in my cluster and it's going to span 13 continents and it's going to be this massive huge thing. I'm going to store 10 trillion files in this thing. How should I scale my cluster to do that? You're like wow man this is going to be really big. Well you're going to have to account for 10,000 hard drives and all this kind of stuff. Where are you starting? Well I've got two servers and I've got 30 hard drives and so when you try to shoehorn all of that kind of big scale ability very often people come in and try to shoehorn this massive thing that they think they're going to have five years from now and they try to put in the two servers they do have today and that doesn't really work very well. It odds a lot of overhead to what you're doing. So some of the things that we've seen over the years is just kind of grow piecemeal, keep things balanced as you're growing. Don't try to work against Swift's placement algorithms to say that well I'm going to have 90% of my capacity enabled over here but I really want to have all these different failure domains so will that work, will Swift balance it out? It's like well you're kind of really underweighted in one area or another. So anyway all of that being said those are kind of some of the gotchas. You've got to let things expose the failure domains to Swift that you actually have. Don't unbalance things. Don't try to shove too big of a cluster into small things. Don't try to run everything virtualized. That's kind of the... I love hearing about all the containerization of stuff but it hides some of these things from Swift itself which it actually relies upon. So I feel like I'm kind of going scattershot on a few things here but I apologize for that but I'm getting signals from Michael about hey you need to hurry up, you need to hurry up. So that being said I probably do have time for a couple of questions on that. I have some books and some other people around here. So as far as operational I know several of you are running Swift today and have been doing that so how can I help? What kind of questions do you have or are you curious about where Swift would fit or how it would work for you or questions that you have about it? You talked in your elevator pitch about the distinction between the object stored and the media on which they're stored as being the distinctive of Swift. How is that different from a distributed file system? What's the distinction in your mind and what are the pros and cons? Well I would say that is not a distinction that is unique to Swift. I think it is a distinction that is unique to what I would consider modern distributed storage systems. So there's several of them out there. I think the big distinction that you would find from distributed file systems is it has to do a lot with the access patterns. So specifically with object storage one of the defining characteristics of object storage itself is that the entirety of that data and metadata for that object is updated atomically. So you're not going to have say a partial overwrite of a particular subset of this piece of data. In other words, you're not going to say I have a 10 megabyte image and I'm going to overwrite the 100 bytes right in the middle of it. You're going to update that entire thing all at once whereas that's kind of a supported common thing inside of a file system. The biggest differences between file systems and Swift specifically, not all object storage, but Swift is the difference in the fact that the file systems are strongly consistent so that you're going to be able to have a consistent view of the data itself where Swift is an eventually consistent system which allows for kind of different access patterns as far as the conferencing and scalability of how things are accessed. Basically it answers the question of what happens when there's failure. Is your system going to stop failing? I want to say what happens if 50 or 60% of your servers are unavailable because of a network partition? Are you going to be able to still respond to requests or not? And Swift can and strongly consistent systems generally wouldn't. Was there another over here? Okay, Bruno has one. Sure. John, last year you were giving people advice in terms of what Swift needs, what are the hardware requirements for Swift on a single region? And now the question is, what about global clusters? What would be your advice to people who are deploying global clusters in terms of what to do, what not to do and what are the caveats there? I think there's a general high level advice for deploying a global cluster and what's going on. The first thing I'd say is keep them even. So if you've got say 100 terabytes or you've got a petabyte in one region, you should probably have the same in another region just to make sure that the placement works that way. So that you can know that you're not going to lose one region and lose access to sets of data because all of it was in one particular region. The other thing is looking at your connects between them and seeing how much network capacity you have so you can shuffle data between them as necessary. And that's something we're continually looking at in Swift to make more efficient but it's still the truth of the matter is data has to move and you want that to happen. You want that to happen as little as possible but it still has to happen so you have to deal with that sort of thing. It really comes down to a lot of what your available hardware and use cases. There's lots of people doing that in production today so it is kind of this well-tested thing. You can do it but keep things balanced is really the thing I would say upfront. That's the big thing to do. So as a reminder, I've got some books here. Come find me, I think we've got one more talk than lunch. So come find me at lunch and I'll be happy to give you one of those if you would like and if you ever have any questions you can find me online or in person at various events or later this week. Thank you very much, Michael.