 Good morning. I apologize for the delay. The train was a little slow, and I thought Good morning. My name is John Dickinson. I am the project technically for OpenStack Swift and this is The most fun talk I get to give twice a year so far that is the state of the project and just looking at what's been What's happened over the last six months and what's coming and it's just been a really fun thing always to put together And so I'm really excited about it But first off important thing I want to talk about is why what's the why behind Swift and what's the vision behind it and to me this is This is key to what we're doing and the first big thing to me is that I phrase I got from Tristan Good at Uptira is the concept of data sovereignty in the fact that Everybody has data It's always growing and you need to have ownership of your everything that touches your data all the way from the hardware all The way down to your toolchain and one of the big important parts of that is having a storage system that is storing your data that you can You can know what code is running you can influence the direction of that code You can even get involved in the community and the governance of that code and see what's going on and that openness is incredibly important and it gives It gives you the ability to take ownership of your data, which is you should really be having it The second thing that I think that is very important for me at least is my goal for Swift is to see that everybody is using Swift every day Whether they realize it or not This is something that I've said before but I want I want your mom to use Swift when she pulls up a Web page and when your kids go home to do their homework and they're looking up stuff on Wikipedia And they're using Swift when you're pulling up bank records and when you're Just doing whatever you're doing in your day-to-day life the storage system that can underlie that and storing user data is Is something that is perfectly suited for Swift's use case and therefore you should be using Swift Basically my my understanding right now of it is that if you're storing user data Somewhere you're building applications that are storing user data You're having to archive that your warehouse all of that data and you're not using something like Swift You're doing it wrong and the reason you're doing it wrong is because Swift is built for scale It's built for availability Durability and concurrency and that doing that sort of extremely wide use case of I've got this mass I've got these data sets that can grow from Relatively very small to very very massive data sets You need to have access to all of it and you need to be able to sustain Access to all of that or any part of that at any time in a very performant manner That is what Swift is built for and so We've had looking at how we've done this over the past six months and where we've come from and I think we've made some really great progress We've had some exciting Contributions in to the code and I'll get to the community aspect in just a minute But looking at some of the major features They've been added in to Swift Since the last time we were meeting down in San Diego The first thing is something I specifically talked about in San Diego and that is global clusters This is something that I've been asked about since Swift began. It's like so I need to have My data stored in a very wide geographic area How do I do this with Swift and we've done a few things over the over the over the lifetime of the project to make this a little bit better and there's still some effort to go but we have done a very good job of Adding in these building blocks into the code base so far and we've made a lot of progress over the last six months So here's what we specifically done in the last six months We have added in the The the ability for you to change your replic account within your cluster Which is good for people using global or non global clusters But the important part here is if you imagine a global cluster and somebody has two regions now You want some additional durability each of those if you started with three replicas now? You've got a two one situation So you need to be able to change your replic account so you can go up to a say a 2-2 for a total of four replicas So this is something that's been added into Swift It allows you to gradually adjust your replic account over time for the entire cluster so that you can go from three three replicas to four replicas On a live running running cluster without any downtime Another thing that's very important in figuring this out how to how to put these building blocks together is The ability to have different replic accounts for your account container Your account your container and your object replic accounts And there were some things that were assuming in the code earlier that they were all The same thing and generally three but they could it could have been other things But it was always assumed that oh if you get three copies of your Container object or your container Then you're also gonna have three copies of the object that limitation is no longer in Swift And it allows you to have a more flexible deployment pattern The next piece which is one of the most prominent pieces in building global clusters is the ability to support a region tier when Swift is Deploying and scheduling where your data actually lives So in Swift the data is placed as uniquely as possible across your entire deployment based on Not just the drive that it's on so that you don't have The same multiple replicas stored on the same drive But actually so that it will be spread out across different servers And then also those servers grouped into availability zones and now those availability zones can be grouped into different regions And still if you have say four replicas and two regions two zones each and in those you can imagine that What what's gonna happen right now is that you're gonna end up having a replica in each of those zones spread evenly across those that region tier and So that's one of the very important building blocks on getting global clusters done the last piece the last building block that was merged in the last six months for that supports global clusters is a affinity on reads You want to not read over the WAN network if you're looking at global clusters if you can at all help it so One of the things that was added in is a timing based affinity and this keeps track of what is the latency between your your proxy server that you're talking to and the different storage servers that are on the back end and Being able to store this in a dynamic way means that your latency timings there can be a very good proxy for how far away those are and then you your cluster will respond dynamically to the current network conditions including if one of those links there is Over a wide area network. There's a couple of things that we still have left to build We've made some really great progress on this we've got but I'm going to come back to those Those things in just a moment But I want to move on to some of the other really cool features that we've done over the last six months Static large object manifests are something that initially I was thinking well, that's kind of a weird thing Why would we want that as it was more explained to me? It I realized it's actually really pretty cool We've had the ability in Swift for a while to store data of an arbitrary size arbitrarily large Objects within Swift, but this was done based on container listings And so your logical large object was dynamically created based on a subset of a container listing And what this means is that you would chunk up your data and name it in such a way that? Let's say it all starts with you know My my object is foo slash one two three four and all you know Enumerate all your chunks out there and then when you requested that dynamic large object They would go and return to you all of those chunks in order It's a really powerful tool that offers you some really interesting new use cases that even go beyond just large objects because you can do interesting things by pointing one object to another and Opens up some interesting use cases But it does suffer from the fact that you are relying on this container listing which historically is something That's a little bit slower in Swift So one of the racks based developers contributed this dynamic as a static large object feature So that you can create a manifest file that explicitly enumerates all of the different chunks and that means that as soon as this Static a manifest file is uploaded Your all of the chunks are verified to be available and to be correct according to their their hash and the hash of the contents And at that point your dynamic large object is immediately available and even if your container listings in that Change over time that static large object will not change And so one of the advantages here is you get a little bit better performance as far as when your object is completely ready to be accessed by you and It avoids that eventual consistency window that the dynamic large objects Is affected by one of the other really interesting features that's been a common request Over over time from the community is the ability to do more than one thing at once with the Swift with the Swift endpoint And so we've added a a new feature into Swift that allows you to do bulk requests And you can do bulk requests either on creating new objects or on deleting objects Creating objects you can't upload a an archive file like tar file and it can be optionally encrypted And when that tar file is Gets to the Swift cluster it is Expanded out into all of the its constituent parts So that let's say you take a backup of a server or something like that and upload that tar You even stream that tar into Swift Then you are able to Access all of the things that were in that archive As individual objects directly accessible after the fact which is really pretty interesting feature. I think Now the converse of that is being able to delete things Deleting objects in Swift is something of course I don't ever want you to do because you need to grow your data storage requirements But One of the things in Swift is that you can only delete containers for example when all of the objects are gone You don't want to inadvertently blow out, you know a million Objects just by an inadvertent delete when you forgot to put in a slash or something like that So That becomes troublesome Unfortunately when you do really actually do need to delete a large chunk of data And you'd send that end up having to send you know delete delete delete delete 10 million times and that gets rather slow and so there's a Functionality now that you can send in a command that will aggregate a bunch of deletes into one Into one request from the clients and then that will be That will be done on the server Before that the result is returned to you and the limits on that so you know Obviously that could be open to abuse for deployers. So to protect against that there are some limits. So by default for example The limit limitation is you can do a thousand deletes at once and that's configurable For the deployers so they can change that to meet your user needs One of the other nice features that's been added in the last six months Into Swift is yet another thing that people have been asking for quite a bit And that is the concept of quotas Quotas are somewhat of a complicated topic because they touch a lot of different things These quotas that we have done inside of Swift are specifically Designed around the parts of Swift that Swift has knowledge about and can do things on so And more directly this is actually based on the number of objects in a particular container or the aggregate bytes stored Within within account or container and so there's two kinds of quotas that we've added in the first one is Container quotas and this is something that a user can actually set on their own container And this is useful for two reasons one is because If there is for example a website that is uploading data directly into Swift You don't necessarily want to on the public internet. Just let anybody who wants to upload anything they want to with no limitations whatsoever And so as your repository of cat pictures grows as they're uploading them into your Swift cluster Then um, you want to be able to have some sort of limits on that and so you can self limit yourself Your own container as saying I don't want to store more than 100 gigabytes within this container or something like that You can also limit that by the number of Particular objects you have in that container So I want to make sure that there are no more than 10,000 objects within this container or whatever you choose to set that as the other really useful Really good use case for this is You don't want to in reality if you're not There's a cost associated with it with all of the data that you're storing in Swift Whether it's your own cost on hard drives or you're relying on a service provider to provide you a Swift storage cluster And in this way, uh, you you you may say that great Swift can scale really really well It can actually scale a little bit better than my pocketbook And so I need to have some limitations on that so I can actually control My bill and have a little more expected limits on that And so that's container quotas Account quotas are something that is able to be set by an administrator on On the cluster and this is something that's a little bit different not a self-imposed thing But you can now set up accounts and Limit those to be no larger than a given number of bytes and what this allows is in a shared infrastructure environment Perhaps where an IT organization where lots of different users or departments are needing to have access to this shared infrastructure Which by the way is a fantastic use case for Swift um You want to be able to make sure that one particular user is not abusing the system beyond your Your ability to add new capacity and things like that And so you can set on your cluster the on your on a particular account within Swift the ability to Or the limits on a particular number of bytes that they can store So you can say these guys only need to store 500 gigabytes within within their All of their containers in aggregate and so these things I think will allow Deployers and users Some very good tools to effectively manage their use of a Swift cluster. We've got a lot of great new features Here's some more One of the nice use cases, especially when you're thinking about user data is that you can use Swift to directly store data from web applications In the HTTP spec there is the Concept of the options verb which we had not implemented But basically it is a nice hook to say The spec really loosely defines it as saying this is a way that you can get some information about the resource Your requesting so you can see what what's possible on this and so at a very very Minimal implementation basically you can do an options request to a particular object and find out oh I can do a git and put a copy delete on this on this particular object There is a draft spec working its way through various various committees that is called cores And cores is a cross origin resource sharing. I believe is the correct acronym And it is something that allows specifically web developers to get around particular browser security models the browser will If you're running some client side code in the browser in some javascript The browser has a security model that says You can only access Domains that are you can only access resources that are at your same domain that you're running at Well, if for example your website is Looking at dub dub dub dot whatever but you actually need to upload your content to images dot whatever Then your same origin a security model within the browser browser is going to limit you to do that So the core specification is a way for a web application developer to say no Actually trust these different domains and we are it will therefore bypass the security model within the browser and so we've spent quite a bit of work and several patches back and forth to Implement the core spec within Swift so that Web application developers can take advantage of this and use Swift more directly without having to proxy all of their Content that is stored in Swift through their own web servers and they can use Swift directly as an upload target for example and It's a resource repository for their content On the deployer side. There's a couple of really nice new features that have been added as well One of them is a better algorithm for choosing handoff nodes And this is very important to deployers because as all deployers know we all know hardware fails And Swift is going to be able to work around that very well But you want to make sure that your data store remains as available as possible going back to looking at one of the original One of the design goals of Swift itself is to remain highly available And so it's very important that after you exhaust your primary nodes or you discover that one of your primary storage nodes Is for a particular object is no longer available We need to make sure that that data gets moved into an appropriate place And so there have been some improvements in this handoff node selection that do two things One it makes a much better usage of the entire cluster and more evenly spreads out your Handoffs so that a particular node does not get overloaded when you start having failures It wasn't really too bad previously, but now it's just much better And secondly what you want to make sure is that if you have failures oftentimes this is going to end up Resulting in a deployer Doing a change in the actual map of how the cluster is deployed in the ring file You're going to be doing ring rebalancing and things like that But with a previous algorithm on the way handoff nodes were placed This probably meant that the handoff node selection was going to get changed as well Which is actually somewhat of a problem, which means that if you've got a server down So nodes have been or data has been replicated to a handoff location Then that means that you've now taken something out of the ring Which maybe your handoff node now goes to a second handoff node that was different Which is now the first handoff node and so it got a little bit of extra replication traffic when you were doing Ring rebalancings in the case of failure, which as we all know is a common thing for large clusters So the new the new handoff node selection actually allows a much more constant More constant, can that be possible? You can have a much more Reliable choice of how How the handoff nodes are selected so that even in the case of ring rebalancings It's probably going to it's extremely likely that it's going to still remain on that same node And therefore as a deployer you don't have to worry about hardware and ring changes Causing extra replication storms or things like that Another thing that's very nice for employers is the ability to have configurable constraints There were Several things within Swift that were just hard-coded constants in a file someplace for example the number of the Largest object size the length of an object name the amount of metadata that you could store on a particular object and Things like that and so that has been pulled out and is now set on a On a config file and it then can be Configured to a particular deployments Requirements and even if necessary changed over time And then there's the giant other category because there's been a whole lot of other great stuff That's coming along and I don't have time to talk about all of the really great improvements that have been here But just to highlight a few of these We've got support for custom log handler so you can integrate with some extra external third-party log processing tools We've got the ability to Make multi-range requests into a particular object so you can request two different sub ranges of a particular object with one request We've done a lot of great improvements into the stats d metrics generation within Swift Things like First byte timing adding in timings for errors Some better calculations of how often those things are generated And then there's the other other other category of things like we've replaced webob as a dependency on our in our code We've added the ability to run replication against a particular drive or a particular server to allow you to recover from errors much more quickly We've added a lot of improvements in logging so that you can have a much more sane log results when you're dealing with Middleware that can do extra requests like static web and form post and things like this That a lot of people are using so overall we've had a lot of really great improvements over the last six months I think we've made some really great progress in improving Swift both for Use cases that people are actually using the end users are actually taking advantage of and also make some great efficiency improvements for deployers themselves And allowing people to more effectively run clusters of all sizes The great community aspects that we've seen over the last six months is one of the most exciting things I think Because this is where you get into a little bit of just numbers games, but you get to see actually how much we're growing and so Over the last six months. We had 65 active contributors in into Swift 65 individuals uniquely contributed a A patch into Swift that was merged The really great thing about this is that the previous six months to this we had 37 So we've seen some really great growth there and that's 65 Active contributors over the last six months out of a total of 109 that we're at as a small little subset I'm going to call out Joe Gordon here for being the 100th contributor into Swift. So thank you very much Joe He gets a free download of Swift In those 65 numbers we have had a total of 31 Brand new contributors into Swift and here are their names actually as you see my notes here This compares to 19 In the previous six months of Swift So we've got from 19 new contributors to 31 new contributors and we've gone to 37 total contributors to 65 Just in looking at equivalent six month over six month periods And what really is exciting to me about this is we have almost as many new contributors in this past six months As we had total contributors in the entire last six months These are the new contributors. We've had so if you see these people around or you are one of these people Thank you very much And it is it is through the group the effort of all of these developers and the companies that are paying them And contributing in their use cases their their coding time and everything that really Make Swift an awesome storage system. So thank you very much. I really appreciate all your work And to call out one person in specific I'm looking at I generally have done this in the past a little bit of code golf but the the number one contributor both in number of patch unique patches merged and number of Reviews that have been done on all of the patches in aggregate is Sam Merritt I'm one of our core developers. He works at Swift stack and I'm very happy for his His work and thank you very much So where do we go from here this has been a really exciting summit Not only just because there are so many new people that are coming in but also there has been so much so many signs about How many people are actually deploying Swift and coming and bringing in their actual use cases And being able to have a packed room full of people at the beginning of this week in the design summit sessions talking about Not let's learn about Swift, but actually Let's take Swift to the next level because we've deployed it and now this is what we're seeing and we're going to make it better So where do we go from here? We've got a few things. What first we've got to finish up a few of the global clusters issues Add in the last little building blocks of this One of which we is extremely close The first one listed up here is to be able to segregate out your replication traffic onto a separate network And this is going to allow the employers to more effectively manage their their particular Their their traffic for replication versus their traffic for clients And especially when you're looking at a global network and that's the coordination between these two clusters having over a WAN Likely to be metered in a different way than your internal network as a employer Then this is something that's going to be very important to you so that you can Ensure that you have the most cost effective deployment possible according to your particular needs The last the last major building block here that will be Riding is the ability to do Affinity on writes we've got the affinity on reads that we can get a close copy More more likely than we've got we're going to get a remote copy But we need to be able to do that on writes as well so that your right does not automatically go over the WAN and rely on that connection to be up and Increase latency based on that connection so to be able the ability to write into a particular Region and have that Replicated there to get your full durability, but then asynchronously replicated over the WAN to your other region Is the last major building block and so we're going to be working on these These are something things that I believe are coming very very quickly And I'm excited about being able to do this so we would again as the community it is you who make these things possible and Make sure ensure that they are built to meet your actual use cases And so I look forward to all of your contributions in that The second major thing I want to talk about as far as where we're going Ford is another thing that we were talking a lot about in our design some of the sessions here Earlier this week This is specifically around our api and I'm really happy with the fact that Swift actually has A very stable api that actually predates Swift itself based on where Swift came from and replacing another Product at the time, but over time we've added to the the We've added to the api and we've added things that may or may not be optional It's kind of hard. It's not been strictly defined And there are again for some historical reasons and some just different people working on things A few warts That would be really nice to be polished out and made made a lot better But to do this We've got to do a few important things and one of those is to figure out What actually is in that api? How does that work? What does What does that formal api look like? We've never actually come out and said here's the actual formal spec for this is The Swift api. It's always been kind of a an emergent thing from this is what people are actually deploying So we can assume that this is what clients are going to be writing against and you obviously just can't break things that people are using So I expect there to be quite a bit of work over the next six months from a lot of people in the community on figuring out Some of the base level things like things we can pull from other pieces of the open stack community Like how do we do an api discoverability and versioning of that and how do we make that a consistent story across all of open stack? We want to be able to figure out How do we what sort of things are we going to be polishing? What sort of things do we need to be changing? How do we how can we ensure that these changes are not going to break the Many many many many end users who are using swift clusters deployed throughout the world And how do we how do we go forward in a sane manner to say this is what the api and this is what isn't So i'm looking forward to a lot of work there. I think there's a lot of questions But it's something that can only really be answered not by any small subset of people But actually the community as a whole coming together The last thing I want to talk about with lfs is Something that again, there's still a lot of questions about but it's a very interesting idea And it's not entirely new within the swift community. It's been talked about before but lfs Lfs stands for local file system And it's generally just become a catch-all term within the swift community to say How can we optimize for particular Types of storage on a swift cluster and Let me give you an example Right now we are assuming that you are using a posits file system to store your actual object data And we're recommending in our docs, but there's no restriction on this that you use xfs And so for example within the auditing Functionality within swift we take advantage of some particular xfs isms To accelerate some auditing passes and things like that. I think this is really great I'm a big fan of xfs. It's been really really good. And I don't expect people to stop using xfs at all however We can't go in and make very xfs specific changes that end up breaking it for people who are running on other file systems And so being able to abstract that out so that we can make an xfs specific one a generic posits one or even something else that comes up that is A different file system that has particular semantics or even a different storage medium That looks different than using a traditional file system But still provides really good use case for swift is something that would be very very interesting And there's a second place where you can look at this is The communication between the proxy and the storage nodes And figuring out how to abstract this in a very clean way so that the communication there Can take advantage of of some other things that people are working on in the community I think there's still like I said a lot of questions around this and I think we can easily go too far one way or the other as far as Trying to embrace too much and biting off too much and not being able to actually digest that within the community And within the code base, but at the same time I definitely want to avoid Pushing away too many people as well and saying that no, this is the one true way you have to do things We've got a large growing community with lots of people doing lots of very interesting things with very diverse needs So let's figure out how we can best interact with all of those people to make In a lot of ways If you're talking to an object storage system you need to be speaking swift And this is the using the swift api and being able to Do these sort of things internally in the code is something that's really going to accelerate that I think The last major thing I wanted to talk about as far as Improvements and where we're going forward for swift is particularly Things that are attractive to developers But ultimately end users as well and these are having to do with Efficiency improvements and there's a few ways you can think about this First you can think about it as just latency on request for end users users want faster reason rights So let's make those faster number two You want to be able to employers want to be able to spend less money on hardware And so how can we make sure that you have a very very nice spindle to core ratio? So that you can get extremely dense storage nodes on extremely wimpy cpus There are certain certain parts of swift right now That's are actively being improved for example replication, which right now are Rather chatty protocols and have particular Requirements well if we can make those better Which is currently in progress Then we can allow people to use better things. I just as a weekend project I put swift on a raspberry pi which I don't recommend running that in production But it was still a fun thing. It's how how much can we actually cram Crams with down and you know, I perform rather poorly shocking But we need to make these things go Work really well for a lot of broad use cases and The other thing that is going to be specifically done is Looking at improvements on how we actually talk to the to the storage medium itself. How do we talk to drives? You know getting into the extremely technical details on this It's a big problem when you're dealing with massive concurrency and things like this But dealing with asynchronous io on the drives and it's rather poorly implemented in Linux all the way up through python and eventlets So we need to make this better and this is something else. It's actively being worked on right now There's some proof of concept patches for both of these out there For example to improve to implement a thread pool for just your disk talking Um, we had quite a few presentations at the beginning of this week Which was just fantastic on people benchmarking swift and Sharing their results and showing where some of the bottlenecks were and how we can make those things better Both from the client perspective and saying that okay I was trying to throw a bunch of things and my clients got slower as You know One particular drive got bad and then we also had some great feedback from seagate Looking at it from the drive up and saying that what is actually the drive seeing how many how many actually drive operations You know abstracting away the kernel and the file system and all that but what is the drive actually doing? And there's going to be some great follow-ups on that as well, but the the long and the short of it is that While swift has proved to be remarkably good, especially for these very broad long tail use cases and being able to effectively stores Large numbers of objects large concurrency and things like that There are some very good improvements we can make that will affect both You know the hardware that you can deploy on and make that cheaper for you And also the end user experience of making that better for everyone else And so to sum up the the three things that I think are really important going forward is figuring out The improvements on the api coming together as community there improvements on the efficiencies and then also Finishing out the global clusters and letting people effectively deploy those So I'm really looking forward to this. I'm really looking forward to those numbers getting a lot bigger over the next six months And I think that's something we can all do together So I think we have a few minutes for questions five minutes, maybe I'm not sure on the time check One minute left. Okay Thank you very much. I will be around I'm flying out tonight, but I'll be around today We've got some workshops later that I'll be a part of specifically around swift We've got a swift book out now, which is really kind of cool. And I think clay has a few copies here if you don't have one yet So I look forward to your continued involvement. Thank you very much and have a good day