 I'm ready. Are we at time? Yes. Welcome. I hope you're here to learn about the state of the project for Swift, fall 2012. That's what we're going to talk about. My name is John Dickinson. I am the project technical lead for OpenStack Swift. Been involved in Swift since the beginning. And here to tell you what's going on with Swift. First, though, I'd like to talk about my vision of Swift, where I see things going. I want Swift to be used by everyone, every single day, whether they realize it or not. When your kids go home and look up something online, I want them to, that data to come from Swift. When your mom picks up her cell phone and wants to go play games, I want that to be using Swift. When you go log into your bank statement and pull up an old check image or your bank statement, that should come from a Swift cluster. Swift is solving large-scale data problems. And those problems are ubiquitous. Swift is a really good solution for that. I want to see Swift be used by everyone, every day. And the reason is because data is key to the future. Storage is the foundation of our entire technology infrastructure that we build. You can't compute on data that you don't store. You can't deliver data to people that you're not storing someplace. The great thing about OpenStack is it allows users to retain ownership of their data so that you can actually not only contribute to the code that is storing your data, but also to the community and even get involved in the leadership of the project that has that. OpenStack and OpenSystems like that allow you to have complete ownership of your entire data. First, let's talk about some new features that have gone into Swift over the last six months. Just as a brief recap, Swift released Essex for the last, in April, right before the last summit. Essex was Swift version 1.4.8. I'll pause just a little bit here. Swift's versioning is a little bit different than some of the other OpenStack projects, so a very brief overview here is that every release that we do on Swift is production-ready and you can upgrade to it in production with your live cluster. It's certified to run at scale in production. So we make regular releases that route the OpenStack development cycle as opposed to doing, say, milestones like some of the other OpenStack projects. But then we coordinate every six months with the other projects to ensure that we have something that works well with everything else and that we have a common release in time for the OpenStack six month cycle. The versioning number scheme actually comes from the fact that Swift existed prior to OpenStack and was an open source as part of OpenStack. So, new features that we have had. Swift 148 was Essex. Folsom that we just released was Swift 1.7.4. Don't get too hung up on the differential numbers there. We've had several releases in the past six months and including some really very nice features. The first one is something somewhat internal to the cluster but probably affects most people. And this is the fact that we have the ability to place data as uniquely as possible within the cluster now. In the Essex release and prior to the Essex release, the way data was placed in Swift is that it is based on availability zones. And so in a standard Swift cluster, when you have three replicas of data, you are required to have at least three availability zones and we put one replica into a unique availability zone. And if you did not have enough availability zones, then you were not able to actually place your additional replicas. This is a bad thing. It worked pretty well but unfortunately it was somewhat inflexible for different sizes of organizations and also in the cases of certain kinds of failures, you cannot as effectively make use of the other hardware in your cluster. So what this patch did is allows data to be placed as uniquely as possible. A storage node in Swift is designated by three things. One, the availability zone. Two, the actual server that it's on, the IP and port. And three is the mount point, so referencing that particular hard drive. With the unique as possible patch, what happens is that first data tries to be put into a unique zone. But if that's not possible, then it's gonna go down to the next step and it's gonna be putting it onto a unique server. And if that's not possible, it's gonna put it on a common server and then put it, make sure that it goes onto a different drive. And so what this means is that if you are initially deploying Swift with a smaller cluster and you only have one availability zone in reality, you don't have to fake it and tell Swift, oh really I have three, but really it's two hard drives, two hard drives and two hard drives or something like that. You can just say no, I have one availability zone and then as you grow, it can add that and your data will be migrated to be as unique as possible throughout the cluster. What this allows you to do is more flexibly grow your cluster and also allows you the case of using more of your hard drives as hand-off nodes when you have major failures. Say if you have a large portion of your hard drives fill up or maybe even some catastrophic failure of a significant piece of your infrastructure. The second piece of the second big feature that we've had in Swift over the last six months is a deep StatsD integration. StatsD is a really nice tool and library that allows you to send out very lightweight packets when something happens. It's generally used for things like counters and timings and things like this. It's based on UDP and it's a very light footprint which means that it can scale to massive levels. So what has happened with Swift is that just about the entire code base has been very deeply instrumented with StatsD so that when something happens, either good things or bad things, then StatsD events fire. And what this means is that you can attach something that listens to that. For example, a graphite cluster or some other monitoring framework that understands StatsD messages and get a very clear picture of what is going on. It tracks things like, in the good case, it tracks what sort of requests are being made to your cluster, how long those requests are taking. And in the bad case, it will fire an event every time you have errors. For example, if it had to choose a handoff node or if it was not able to successfully write something to disk. Another major feature is the ability to optimize for SSDs within the account and container level. It turns out that in very large clusters, and especially in clusters with a very high number of objects, the account and container servers can become somewhat limited by the number of IOPS that they have available. So a natural, there are some longer term ideas about how to solve this in the code, but it's somewhat easy to solve this with hardware. Throw more hardware at it and you can make things go faster. Obviously that has a limit, but it actually gets you a really long ways right now. The problem is that SSDs and spinning drives have different price and performance characteristics. So previous to this Folsom release, previous to when this patch went in, back in Essex and previous, when data was allocated on the account and container, we would pre-allocate additional space as we were getting close to the using up all the space that it was currently using. And what this allows you to do is have a more contiguous piece of data, which is really good for sequential reads and writes when you're talking about spinning media. However, solid state drives don't have that, that performance penalty when you have random reads and writes. And they also are generally much smaller and then each byte is gonna be much more expensive. So you do not want to have that pre-allocation turned on. So with this patch, we have allowed you to configure your pre-allocation to turn it on or off based on the deployment, the hardware deployment that you are using. So with this you can now effectively use your solid state drives on the account and container level and scale that and make effective use of the hardware and take advantage of the random read write characteristics that SSDs offer. Last major feature I wanna talk about that was released in Folsom is the version writes feature. Object versioning is something that's come up as a question many times and this feature is almost there. This is a versioned writes feature. So if you override, if you have an object or a container that is enabled to store different versions, what happens is if you overwrite an object, it essentially pushes that down on the stack. And then when you delete that object, it pops the versions up. And in this way, you're able to keep track of how your object has changed and not worry about that. And the really fun thing about this is if you were able to combine this with an auth system that could give a user write-only access, maybe just write and read access but not delete access, then they can combine that with an object versioning feature and you could safely give write access to say the public or your customers so that you can then write the data on there. They're gonna be able to stack that into the container but you don't ever have to worry about them deleting old versions. They can overwrite things but you've still got the old version there. So the difference in what generally people are thinking about a full object versioning system is leaving a tombstone when you have deletes and that's not quite what this is designed to do. This is just designed to do a versioned writes. So let's go over the numbers. Swift has been quite active over the last six months. So let's play a little bit of code golf. 37, we've had 37 contributors in Swift over the past six months. These are people that have had a patch merged into the code base. And the really exciting thing about this number is we have 70 contributors total which means that we have over half of our contributors active currently within the code base. To me, this says that we've got a growing and highly active code base. Many people offering just a few commits. Some people offering a lot of commits but there's still a lot of people doing it. Of those 37 people, 20 have given a patch for the very first time. And many of these people have done, this is their first patch into OpenStack in any OpenStack project. And so I'm happy that they've chosen Swift to contribute to. These are the names of the people that we have had patches from in the last six months and they're first time contributors into Swift. We also have three new core developers on Swift. These are the people that are responsible for maintaining the code quality and managing what gets merged into the code base. So Sam Darrell and Schmuel are now core developers of Swift and are able to help the existing core developers that were there to share that load or review and kind of shepherd the future of that code base in the way that we as a community are headed. Now we get into the fun numbers that don't mean a whole lot but they're just fun to look at. So we've had 170 total commits into the code base over the past six months. The top committer as always so far is Greg Holtz, unfortunately he couldn't make it this week. He had I believe 40 of those 170 commits, 17. This is the most files modified in a single commit. Most commits just deal with one file, maybe one or two files. This one touched 17 individual files and this was Darrell's stats D integration and obviously if you have deep instrumentation of stats D throughout the system it's going to touch a lot of the system. This is the patch that had the most and it touched 17 files. The most lines are moved in a single commit. It's 3466. This was Schmuel's patch and it is extracting the Swift client. So it wasn't so much of deleted, it was more things just moved out into a separate place. And as a sub note here, this is an important development in the Swift code base over the past six months. We have extracted the command line utility and the Python language binding that into a separate open stack project that is the second deliverable of the Swift project. And what this allows is users, other projects, other clients to download, install and use a language binding and a command line utility without having to worry about downloading, installing and managing the entirety of the Swift code base. The most important thing in the open stack community, the first place this is being used is in the Glantz project. They're not interested in installing all of Swift just to put stuff into Swift. But it's also quite useful for others in the community who are building client tools, who are building all kinds of things that integrate and make the community better. On the opposite side, we've had the most lines added in a single commit. This is again the stats decommit, added 1,159 lines. So what that being said, what's next? What's coming in Swift? We've added a lot of great things. I am firmly convinced that Swift is better in the Folsom release than it was in the Essex release and it is much better than it was at the very beginning. But we're still moving forward. So where are we going? We had some conversations yesterday on the Swift technical track all day yesterday about some ideas of how we can move Swift forward, some feedback from the community on problems that people are facing. And here's some of the things we were talking about. The first one, this is probably one of the biggest, this is probably one of the biggest questions that I get asked all the time. I've been asked this for the past two years, is does Swift support geographic replication? Does it support global clusters? How can I have a data center in London and one in Singapore and have it all just as one logical Swift cluster? And the answer I've always had to say is that's a really interesting problem. I'd love to talk to you further about it, but no, we don't support that now. So, we're moving forward on that. The company I work for, which is SwiftStack, has invested some time in designing out this feature. I fully anticipate that this feature will be worked on over the next six months and implemented by Grizzly. This is something that I've talked with many people here this week about, as far as partnering with other companies to ensure that we have a very good test bed and use case so that we're not simply designing something in isolation that we think is gonna be useful, but rather we can design something that actually meets a real use case. Your commitment, did you wanna sign here? That is my hope as the community leader. I certainly hope that it will be done. I have to stop just short of promising that the commitment will be done by, you know, whenever, but yes. Another support that we've had, this patch was submitted and I think merged right before the summit, right after I made my slides, of course, by Rackspace and this is a new feature that some people have asked for in the community about full core support. What this allows you to do is better integrate with a browser security model and allow your web clients to directly interact with your Swift cluster rather than dealing with an intermediate proxy that you have to independently scale up. This is really advantageous when you're talking about especially things like dashboards and also when you're getting to other web content, you know, mobile games, websites that are doing things like that. One of the things that I'm really proud of in Swift is how well it handles concurrency. However, it's not incredibly good if you are trying to access one single object with a very high concurrency. It works if you're trying to access the entire data set that's spread across many objects, but one single object lives as three replicas on three hard drives some place. So generally you end up either limited by the internal network of your cluster or just the speed of those three spindles and what they can return to you. So there's been some talk on optimizing concurrent reads and figuring out how to do this. One of the use cases that this is designed to solve is this situation. I've got a Swift cluster. It's storing some images for my virtual machines or server images, whatever you may have. And when I turn on a cluster, I need a boot, say I need a boot 100 machines. Well, as of right there, I need to all get that virtual machine image all at once. So how am I able to do that effectively so that I don't have to, so that I can, in a reasonable amount of time, spin up all of these VMs. So there's some techniques that you can kind of work around this right now. One way you can get around this is kind of rename these into individual files and configure your clients to talk to different files. Of course, that puts the burden completely on the client and that gets a little tricky. So there's some ideas we've talked about as being able to incorporate this into Swift itself and able to abstract those many files into one manifest object. So overall, I'm very excited with where Swift has come, where Swift is going. I intentionally kept this a little bit short because I know from the last summit we had a lot of questions at the end and I would like to have plenty of time for that. So I would say thank you for your time. Please, if you have questions, use the mic here and I'd be happy to try to answer anything you have. Awesome, so Swift is good. Oh, great question. So you talked about treating deletes as basically moving versions up and down in the stack. Is there value from your perspective as having delete being a version in and of itself? If someone creates 1.1 and then 1.2 and then deletes that, don't we want maybe to know that that was deleted or? Yes, I think that's absolutely a valid use case. I don't think that's a bad thing. I was just curious about the decision to remove a version versus having a delete being a version in and of itself. When that object versioning, when that versioned writes was originally written, it was kind of looked at from a per object basis. It's actually finally implemented on a per container basis and being able to version those containers. If you do it on a per object basis, the overhead of all of your reads throughout the system actually goes up quite a bit in order to check on those versions and see what's going on. By abstracting that into the container level, it lets have a version to container so that all of those objects in that container are versioned. It allows you to bypass that process. So based on that, all I can say is that when it was first written as far as the per object level, deletes are somewhat special within Swift as in keeping the Tombstone files and being able to manage those effectively. So this method as the versioned writes is less impactful as far as the underlying object system. So the interesting thing about the way this is implemented, the thing I like about it is that you don't actually have to edit the object server itself that's responsible for storing the data and the way things are stored on disk doesn't really have to change because what happens now is that your versions are kept in a separate container someplace just as other objects. And so when you access one object, then it can return that but it can manage pushing and popping onto the stack. Unfortunately, what that means is that deletes, something new would have to be invented within Swift to keep track of that deletes. And in order, and just to not do that, that's the only reason, it's just to keep it a simpler system. That's not to say that if somebody came along and said, great, we're gonna implement full versioning support with being able to keep track of deletes. I think that's a great thing. In the back. Based on the information that session that's an important code, is that true? So the question is, yesterday there was this talk on something called colony and that appears to be a fork of the code. Is that true? No? Yes? The gentleman in the back here is actually doing that. So I would answer that in a couple of different ways. Yes, they have absolutely taken Swift, they have made their own patches on top of it and done that. I think that the talk, the colony talk and the one coming just after that were had very similar goals in figuring out how to solve the globally distributed clusters. I am very interested in looking into their implementation and what they have done. That being said, I'm maintaining a patch set and really with a distributed version control system what's not a fork. So I just get down into a little bit of semantics but it seems to me from what I have seen it is Swift and they've got their own patches on top of that. I don't think that is something that is widely encouraged in any part of the open stack community but it's certainly not something that we can prevent or work on. The code that they have is published for colony and I don't have the link for that. You may need to talk to them or find the talks from yesterday. But I hope to incorporate those ideas into something that can very well come into the Swift mainline and be a very general solution for everyone. Just as a quick summary, the colony was a very interesting idea built to solve the problems of some data center replication within Japan and having data centers on each end of Japan being able to store not two autonomous clusters that have full replication but being able to split your three replicas across the two regions so that it could be done effectively. This is very similar to what we've been talking about as far as globally replicated clusters. Our basic working idea there is on the Swift app blog so you can pull that up if you'd like. Answer your question, that's a question. Yeah, yeah, but basically I was just curious as someone who needs patience. Absolutely, and so I think that's a valid concern. The concern for the recording and those watching is that if you have multiple forks, it's which one is supported and what's the compatibility between them. That's absolutely something we do not want to fracture the community at all. So let's design the community to be open and accepting to other people's patches and also to solve the general use case problems that are applicable to everyone so that we can effectively accept patches in, deploy it at the scale of very large service providers all the way down to private clouds that are quite a bit smaller. Had a question in the back? Unique as possible. So if you go from say one available to another. Yes. So the question is specifically with Uniku as possible replication placement. If you have one availability zone and you move to a second one, it sounds like there's going to be rebalancing done there and what is the impact to the clients as far as that goes. And the answer is yes, except rebalancing is, that situation is no different than simply adding new hardware as always in Swift. So you're going to be rebalancing your data out there. The difference is that if you have a new availability zone, you're gonna have at least one replica move into that other availability zone until at least that fills up and hopefully it's gonna be the same size so then you're gonna end up with a nicely balanced cluster. So being able to handle that rebalancing is something that has been solved in Swift since the very beginning. And the idea there is that you should add things gradually depending upon your internal network infrastructure and what that can handle based on your clients throughput and how much bandwidth your client is using versus how much you have available for replication and things like that. So yes it does, there is a rebalancing effort. There is concern, especially at scale when you're adding in say another 25% to your storage, to your cluster. Don't saturate your top of rack switches and things like that. Swift provides mechanisms so you can gradually do that over time. I think there was another question up here. Senator. The question is, is it possible to have tiering of storage capabilities within a single Swift cluster? And the answer is no, not today. That's again something that people have talked about. I would say that the answer to that question is the answer today is the same answer we had to Global Replication six months ago. Kind of like, no, I'd be quite interested if you'd like to work on it and I'd be happy to review your patches. Please submit them. I'd be very happy to talk overall as far as figuring out how to do that and bringing that to the rest of the community. That's something that I'm asking. Right. Now my gut feeling is that some of the advantages that can come with geographic distribution could actually be taken advantage of there. It seems to me that if you have something that is a high latency connection that because it's on the other side of the world, it's not a lot of difference. If it's right next to it, right next to it, but it just happens to be slow. The question is about how to choose the weights for your devices within the ring. I would recommend that you use the number of gigabytes on that particular device. And the advantage of that rather than sticking to a zero to a hundred scale or something like that is that it's easy to grow into the future as you have three, four and on terabyte drives and things like that. So I would recommend against doing zero to a hundred and say that you should probably just use the number of gigabytes in your drive. Makes a lot more sense. Any other questions? Probably get a response asking for patches for asking this one. This is a long standing one that's been sitting there, which is quotas. Which is quotas? Quotas. Quotas. Surprising, that is the first time this week that it's come up. Congratulations. The question is about quotas. What's the stuff? It came up in the measuring. Oh, well, first time I've heard of it. Quota supports. We have seen, there was a patch that was submitted by Mike Barton to add in some level of this. I don't think it was merged. And I don't remember exactly why. I think there may have been some secondary issues on it. But my general answer is I think that quotas generally belong in your auth system because the auth system is what controls access to Swift. Swift doesn't really care about the actual authorization to use a resource within Swift beyond giving the information to the auth system. This is the thing that I know about this request and the entity that's being requested. Tell me if it's okay or not. That being said, the auth system can easily tie in to some sort of metering, billing, usage tracking system. And tokens can be invalidated and therefore, or just requests can be denied so that auth system can potentially allow read-only access after a certain amount or simply cut off all access. I believe that that works best, especially when you're looking at bandwidth considerations into an auth system, integration with an auth piece. On the other hand, it may be possible to add in some sort of storage quotas within Swift. The bandwidth quotas within Swift itself would be a little more tricky as far as how do you actually keep track of that and things like that. Anything else? Encrypt the data, data encryption. Swift is designed right now to take the bytes that you give it and reliably store them on disk. And when you ask for them, it will return those very same bytes that you saved on disk. So therefore, there are two options for you to do encryption of data. One, you should do a client side. The advantage of doing client side is you don't have to worry about your storage system managing keys and things like that and CPU, all of that. Number two, if, for example, you do have lots of CPU to throw around, you could also add in some Swift plugins to encrypt the data going in and out. It would be a little bit tricky to do that transparently because you've got to deal with checksums and content lengths and things like that. But it would be possible. So for now, the recommendation would be that you need to do that client side. That being said, actually, I'll add one other note to that. It is absolutely possible to run Swift on top of encrypted volumes. So therefore, you could do it that way. And that is something that we've done at Swift stack. The question is, is there a way to disable replication during maintenance drives, things like that? And the answer is, I suppose you could just turn it off. I mean, they're just demons running on the back end, so just kill the process. But more than that, I don't think that would be necessarily advised. There's certainly circumstances where you may want to gradually shut down a machine and say, okay, I need to turn replication off here and make sure that it draining the data out and I don't want to have to deal with that. But as far as general maintenance and things like that goes, you can limit the incoming connections. You can, Swift will treat those sort of boxes as just downed boxes and work around it anyway. So you don't have to, it just kind of works. There's not really a strong need to most of the time, right? The reason, so the question to follow up is about, sometimes there's a known issue that you want to make sure that it doesn't push it straight off of the disk. And my only response to that is Swift is gonna prioritize durability, which means that it's going to do the best it can to actually ensure that you have a full three replicas or your full replication count that you've asked for throughout your cluster. And so if you were able to do that, you would be putting that at risk. And kind of the main thing about Swift is that your data is sacred, you cannot lose it. So yes, you could turn off the replication process in that case, but I would recommend leaving it on to make sure you have your full replication. Rollback features, as far as, yes, that is built into Swift. So if there is, the question was, what happens if there's corrupted data? How can you roll that back? That is one something that's built into Swift as far as auditing processes that go through and continually checks on your data and also checks on what every time it's read off of disk so that if corruption is ever detected, it will quarantine that file and replication will ensure that you have a good copy put through. I think, let's do this as the last question and I'll be around as well later. Upgrades, that is a great question. How does Swift handle upgrades? There's absolutely a plan for that and Swift is designed from the very beginning to be able to allow you to have a running Swift cluster. You upgrade it in place with no impact to your customers. Obviously some of that is going to depend upon your particular deployment. For example, if you just have one proxy server deployed and you need to restart that for a kernel upgrade or something like that, you're obviously gonna have some sort of downtime. But you can absolutely build it with very high HA. And so the basic process of this is that you will shut down your background processes, those consistency processes. You will upgrade your packages, update your config files and reload the servers. And reloading the servers actually tells those servers to stop accepting new connections, but don't just die, finish out the connections you're dealing with. And then when those connections are done, go ahead and spawn up new worker processes. And then at that point, you can start up the background processes again. And so when you do this in a rolling manner throughout your cluster, generally you may wanna start with a single node as a canary node, something like that. Make sure everything's going okay. If that goes okay, work on a single zone or a cabinet or something like that. Once that goes through, you can go through and do your, for example, maybe your entire storage nodes. Make sure that's upgraded. You can go your proxy servers one by one. And since those are gonna have some sort of load balancing layer in front of them, you can either let your load balancer automatically handle the proxies being down while that's done, or take them out of your load balancer config. So absolutely it's possible, because what happens is if you have version one, or version A on one node in Swift and version A plus one on another, or A minus one or whatever it is, on another node, in other words, you have some sort of inconsistent things. All patches that go into Swift that change any sort of configs, any sort of default behavior, any sort of issues like that, schemas must have, and they will not get merged unless they actually have migration paths that are same. And so for example, we had a couple of patches like that in this last release, one of which was changing the way we stored data within Memcache, making that a little bit more secure. But being able to do that, you can't just upgrade your cluster and say, oh, we've got a new way to do this, new way to serialize these keys in Memcache, because on large clusters that essentially is gonna equate to a complete Memcache flush, which is not only impactful for the cluster itself, but any clusters of that system may be dependent upon. For example, your auth systems or other things like that. You can't suddenly just have this enormous flood of requests. So that patch specifically, instead of just saying, great, we're working on the new version now, actually has a new config variable that allows you to stage those changes. So one, it's like, hey, we're just gonna use the old way. The second one is, we're gonna read the old way, but everything we write is gonna be the new way, and we're gonna be able to understand the new way as well. And then the third one, that you can go through these stages once a day, one every day, just to make sure you have everything processed. And the third stage is, okay, great, just read and write the new way, and then we're good. And then in the future, we may remove support for those older two ways so that you are guaranteed to be running on a secured way. And so I use that as an example to say that when you have something coming into Swift, it's absolutely going to be designed to have a migration path, be able to be upgraded in place. And yeah, the code's there. So that's something that you can either just say, yeah, that guy standing up on the stage said that it was gonna work, so it must've gonna work. Or you can look at it and say, there are lots and lots of companies out there actually running this thing at huge scale in production with massive top five websites, major impactful things that if it starts to go down, they're gonna start losing money the very next minute. So let's not let that happen. Those sort of things cannot happen. And I would say you would look at the community and the people that are here, the people that are out of all their boots out there, those are the, that's your guarantee that is Rackspace, is HP, are our customers at Swiftac, are they gonna allow us to have downtime based on some upgrade thing? Otherwise, we can't upgrade and we don't want that. So thank you very much for your time. I'll be around all week.