 My name is John Dickinson. I am the director of technology at Swift stack and I am also the open stack Swift project technical lead and Today I'm going to give you a little 101 on Swift So if you are not sure what Swift is how it's put together or what it can be used for or how you could make use of it Then you're in the right place right now. So that's great Today I want to go over basically What Swift is trying to do who's using it's tell you a little bit about some use cases that I think are pretty exciting and Then go talk about the architecture a little bit You know high-level overview of how the code is put together and how it works Some of the technical details of it touch a little bit on some deployment details And then if we've got enough time after all of that we'll have some time for questions So with that being said one thing I know that I looked at the schedule for the summit and Just searched for Swift because that's what I do is a swift pdl and there were a lot of cool sessions and one of We're at the end and so there are still a few more sessions left today if you're looking for more information on Swift itself I myself actually this is the first of three talks. I'm giving today. So it's gonna be fun But I'm talking about Swift to 101 right now immediately after this in here. There's some very interesting use case of Swift with very large just globally distributed clusters Building applications with Swift that would be more along the lines of I'm writing an application I don't really care about the storage infrastructure, but I want to use Swift and write something on top of it We've got a pack 12. I'm gonna touch on them a little bit today They are going to be telling you a little bit about their use case Extensibility in Swift is about how you can actually add your own functionality and integrate Swift a little bit with different in different ways and then tomorrow There is a full-day workshop. It actually starts with nine with Time Warner Cable talking about their whole open stack deployment and then it's starting at 10 full-day free workshop about in very very high level of detail How Swift works you can deploy it and run it on Yourself you have you can get it installed and run it play with it See how failure handling works see how things work together talking detail about building Production clusters and what those look like like what does the networking look like? What do you put in your rack and things like that that is in the Hyatt Hotel free tomorrow? I've got some information over here if you'd like something to take with you to remind you then please come pick that up That is Very close in the in the in the Hyatt tomorrow So we're at the Palais de Congress and the Hyatt of Hotel is in the same building basically Also, if you want a book I Think the Expo Hall is just about closed if it's not closed when we're done here Then you can go by the Swift stack booth and pick up a book if there's still some there If not go to swift.com slash book and you can get no Riley book on all kinds of the same sort of stuff But that way you can have a dead tree version to walk around and read and learn all you need to know So with that being said why are we here? Why why do we even have this thing called Swift to start with it is one of the founding projects of open stack? But why did why is that important and what is what makes it up? So the reality on why we did that Why Swift is here was really conceptualized and brought to the forefront at least in the technology world by Amazon their S3 product was one of their first pieces of AWS that they deployed and it Basically means that you've got this Extremely good way to store a lot of data cheaply and it's a different model than we're generally used to specifically it means that you don't have to think about your storage anymore and Then you're doing it in the on-demand pricing you kind of Although the the terms have been around for a while. It basically, you know, I did be us kicked off the whole cloud thing So there it's it's been around a while. I think things. We've been adapting to and that's Kicked off in the minds of a lot of people who are running and deploying IT infrastructure. Okay, this is something we need to consider and how do we do that? So speaking of the IT infrastructure, they saw this happening and they saw this change They heard this demand from people who were building applications and running stuff that they wanted this new kind of consumption model and they themselves the IT people if you're running IT for a company, then you see that you want to take advantage of The economies of scale you want to consolidate you want to get rid of vertical silos you want to be able to manage your storage pool, but you'd really like to do that with no hardware lock-in you'd like to take advantage of commodity hardware and you'd like to of course do it cheaply, but it needs to be durable and so You need to be able to support a lot of different things rather than saying I want to stand up a silo for this application And then later on I'm gonna stand up another silo of storage for this application Let's put those together and support the different use cases that we have So The reason we need that is because data is exploding and that's that's the canonical up into the rights graph of we're getting a lot of data everywhere So there's a couple of ways you can actually solve that If you look at the traditional way that people were solving this it was go by a hard drive and then when that fills up you go by another hard drive and then when that fills up You go by another hard drive and then you realize this is not going to be a very scalable model and then one of your hard drives fails and So you and you need heart you decide I'm gonna go by a raid cart and then you go put those together And the whole thing is you're still your applications are still having to deal with this massive overflow of data More and more drives you're having to manage you have to remember where you put things you're having to deal with all of this concurrency, but you're doing this on top of like POSIX semantics on file systems and It just it just gets to a pain point that you just really didn't have when a lot of data was measured in terabytes And when a lot of data starts getting measured in petabytes and beyond then you're thinking this is really starting to hurt and Ten hard drives was not hard to manage Ten thousand hard drives is hard to manage. So we need a different way to think about things We need a different way to consume things and there's a couple of different ways you can do that If you're talking about systems that are spread out because you need failure You need availability. You need You need to Support these kind of these kind of concurrent things you have to have more than one system doing it So when you're building more a distributed system, there's a rule that basically says that of the three principles of that system You can choose two of them and you cannot have three. So either you have to have something that is strongly consistent Which means that in the case of failures The system will not will not be available But it means that any time you ask a question of the system You're going to be able to get the answer that the whole system knows about right now on the other end of things is eventually consistent and eventually consistent means that even in the If there is failure in the system The system is still able to respond and it will later eventually Move or resolve the any discrepancies in what's been going on in the data The messages it's received. So generally to to simplify it there strong consistency Is something that is really good and actually vitally important for block storage for underlying file systems? Things that you don't want to you need to put your database on something that's strongly consistent because you don't want over time something to Change out from under it and you get corruption and things like that eventually consistent systems are fantastic when you need a high availability and the The unstructured data that you're storing on top of it isn't really there they're independently It doesn't matter when I upload a picture if it happens before or after when you upload a picture or when server a does a back up Compared to server B. So the object itself that the data itself is is stored correctly But now you can scale this in such a way that you have extremely high availability Across your your distributed system. So that's that's where Swift lives It lives in an eventually consistent distributed storage space and it gives you a really nice things Because of that it gives you the ability to have a globally distributed cluster. It gives you the ability to Do this with a simple API not something that's very very chatty like It wouldn't work in a globally distributed scale. You don't want to do SIFs over a continent sort of thing it's something that allows you to consolidate your storage and offload the hard problems of storage from the application so you don't have to think about the You know file system semantics of locking and how do you deal with concurrency and what happens when you have failures and corruption on that? Swift will store that or it will take care of that for you and most importantly At least especially in the context of being here at this at this event this week Looking at open stack and me in my mind the the open part is what's really important about that So you've got an open system that allows you to have full ownership of your data because you know everything that is touching your data That storage system that's managing it you have influence and control over not only can you see your code? but you can get involved in the community and And influence that and that being said please submit patches. I would love to have all of your patches for Swift So who's using Swift? Why if this is so great? Who's using it? So there's a lot of companies and a lot of these are actually everyday names, which is one thing. I really like about it My goal my vision for Swift that I've said many times if you've been to other open stack summits and heard me speak It's now say the same thing my vision for Swift is that everybody will use Swift every day Whether they realize it or not which means that when you help their kid with their homework and you go look at Wikipedia you're using Swift when you get back to the office next week and You're scanning in your expense your receipts for your expense report I want those to be using Swift because it's unstructured data that grows without bound and needs a lot of concurrency and support of new application models like the mobile and Shared storage and all of that sort of thing when you go deal with banking when you deal with your document management of your work When you're dealing with video and watching videos, I want you to be using Swift every day every Even if you don't realize it So let's let's talk a little bit about some of those use cases. I've got a few very specifically that I wanted to highlight Pack 12 if for those of you who are not familiar is a is an organization sports Broadcasting Network on the west coast of the United States So a lot of college football and college sports things like that The reality is then that they have their calendars dominated by the School season the university calendar, so they have a sports season that this is this is when people play American football And this is when people play baseball and this is when people play soccer football So they record over 800 sporting events every year and they need to store this data and then they are broadcasting that out So they were running out of space in their sand It was rather expensive and they needed something that was cheaper and it was better more scalable for them So they chose Swift to store that which means that every time they store they were they were able to save money off of moving from their sand on to Swift for their video storage and They were able to scale out a lot better and a lot a lot new I'm sorry the They were able to scale scale out in a graceful manner. They didn't have to worry about downtime On upgrades. They didn't have to worry about any of that sort of thing And the really exciting thing I like about this is that when they were using their sand They would archive their stuff off onto tape. Well tape while it is cheap It is certainly not available and certainly not in the sense of hey, I'd like to watch this old sporting event Let's watch it right now. Well, if that's on a shelf someplace even if it's in some sort of tape system It's the the latency on reading that that is rather high The really great thing about them putting all of their stuff into Swift means that now all of their data is highly available Which means that they have not only saved saved money, but their marginal cost and their marginal cost on adding that old archived content into Swift is Actually lower than just sticking that off on the shelf and it actually enables new revenue models for them So imagine if you were able to say well, I really like this particular sports team and I want to go watch some old Recording well used to if that was archived. There's nothing to incentivize pack 12 to go try to Satisfy your one request, but if it's already in something that's highly available They can create new revenue models because their storage system is now highly available And that's something that's actually really exciting to me It's not just this cost thing But it's like we can do new things with our with our data now that we have a new way to think about it and reason about it because the storage system is Is built for that Time Warner Cable is another use case they spoke and on the first keynote on Monday morning and They're using lots of open stack all over the United States They're doing a lot of New and interesting and big things specifically around video and they've got the set top DVR boxes And they've got a store a lot of that stuff. So they're using Swift for a lot of backups in conjunction with some of their open stack deployment their compute deployments they are doing using Swift as a repository for video you would imagine that a cable network would probably have a lot of video and Overall, they are able to build new internal implications on that that can then take advantage of again the highly available storage Now this third one is a name that's not too familiar to a lot of people It's an American company and even in the United States. It's not an incredibly popular name And it's certainly not a technology company. So Bud Van Lines is a moving company like the big giant trucks They go to your house. They will pack up everything put it in the truck and then move it someplace and unpack it for you So that's interesting. So they're a logistics company They have some very interesting use cases of what they're doing with their storage for example They are taking your things in your house putting them in a truck and then going and parking the truck in a warehouse So it's very important in them that they have 24 7 video surveillance of all of the parking lots and all of the warehouses To make sure that your stuff is secure physically Well, all that video has to be someplace. So they are storing all of that video inside of their Swift cluster another thing that they're doing which is very interesting is They're putting Tablet devices inside of all of their trucks and so when things are packed up They're able to take a picture if something's damaged all of that can be put immediately into The Swift cluster from the mobile device from the field and more than that all of their documents can be stored there as well and then accessed on the mobile device and One of the one of bud van lines employees is here this week and he was we were at dinner the other night He said yeah I'm in Paris, France and I can pull up my phone right now and look at the status of any order on any truck Anywhere in the United States right now and that's pulling directly from their Swift cluster So they're also using this with their other additional compute instances. They're Putting all of their backups for all their servers into their Swift cluster as well So basically what they have is a flexible storage highly available flexible scalable and highly available storage system that is able to be used for their The the consolidated storage for their unstructured data within their company. So that's one that I'm pretty excited about oh And also it spread all over the North America. So they've got lots of locations in different ways and they can actually have a DR disaster recovery failure scenario That they keep the data available even in the midst of losing losing an entire data center So speaking of hardware and where can be deployed and things like that what kind of stuff does what what do you put? Swift is software. So where does it actually run? What what's the what does it look like and the advantage of the nice thing here is? It's whatever you got so It's two examples one. This is a very small Very small five-node Swift cluster that's running in our lab back in our office and this is used actually for To tests on every single patch that is submitted to Swift. It's a community QA cluster But what I like about this is this is using the new Intel avatar chip that is a system on a chip That's very low-powered, but it's got a lot of connectivity on it So in one you they've got 12 drives and we're using the htst helium Not helium helium drives. Sorry the six terabyte drives in there And then this one you unit has four one gig ports and two ten gig ports, and it's all in a 400 watt Power supply that's including the drive. So it's it's very dense very low power And it's very well connected on the network So this is one thing that we've got running right now if you submit a patch to Swift your code is gonna go run on there and Be validated with the tests another thing if you're thinking about more of a larger or more dense setup You could go for just some sort of for you JBod or something like that put in For example things from Silicon Mechanics and Those sorts of vendors you don't have to buy these giants Proprietary pieces of hardware in order to run Swift If your preferred vendor is vendor X, then you can use vendor X and that's okay So when you're building out your Swift clusters, this is one one way you might want to do that This is a picture a little high-level network diagram of how You might lay out your your initial deployment and initial small deployment of Swift and in this case we've got Three racks in one rack. I've got a couple of proxy servers I'll go into detail about how these pieces work in a moment We've got a couple of proxy servers and in the other rack We've got some of our storage nodes is where you'd have those basically your hard drives And then at the top you've got a load balancer and your your agar switches and your top of rack switches and things like that So the advantage of Swift is that because you've got different components of the system You're able to Expand where you need it Which means that if you needed to add some more capacity You can just add another rack that looks like that and that can be your things roll in your rack plug it in You're good to go right and then you can add a little bit more capacity on your proxy server Or another way you can do this is you can actually backfill your existing capability So if you needed another expansion level you could put in Some more storage across your existing racks and the nice thing about this is that and we'll cover Talk about this in a little bit is that because these racks are behind our in different actual physical failure domains They have a different top of rack switch. You are actually protected from You still have your data available to you and durably stored even if you lose an entire rack Which is pretty cool. I was talking to Tom Fifield The other day and that was tech community manager and he used to run some Swift clusters in Australia And he would say yeah, we'd regularly walk into the data center giving a tour and say oh, yeah Here's our Swift cluster and pull the power out of one of the servers just as a demo on a production server And you're just like whoa. No, don't do that. He's like the hardware is under warranty. That's fine And Swift is going to take care of it and it just works. That's the way it's supposed to be So that was just kind of one of those fun little anecdotes that yeah, it's It's fun to handle all of that kind of stuff So how does it do that? Let's talk a little bit about the The how Swift is put together So to start with Swift has a rest-based API Basically, this means that it uses standard HTTP response codes and verbs to talk to it Which means that it's speaking the language of the web. This is basically what a Reference to something in Swift looks like it starts of course with standard HTTP or HTTPS talking to your domain And then we've got the v1 protocol. We've always had v1 For the past five years or so and then there's three important pieces of Swift. You've got an account as Container and an object. So the account is not necessarily user identity That's that's different The account is really a place where you can store your data And so it can be treated in a couple of different ways if for example you go to HP or rack space or soft layer and sign up for a public Swift account or You You buy the service from them you will get your identity will get access to a Swift account perhaps One in different regions or things like that, but on a Swift cluster you will get an account now another way that I've seen this happen is That you may have an account per application So here's my document management application and that's going to have a Swift account and then I'm going to be able to Do stuff inside of that So the account stores a little bit of aggregated metadata How many bytes are total how many objects and how many containers total you have and you can also set some user metadata on it as well This account was created on Tuesday by Bob, you know, whatever you need to do there Inside of the account you can create containers and containers Similar to folders except you can't nest containers and the containers similarly contain or a little bit of metadata and either user-setable and some aggregated metadata for the system and they also contain a list of all of the objects inside of them and Containers are unique to your account so that I can have it have an image as a container and you can have an images container and that's okay They will not conflict Then inside of containers is where the data goes and that's your objects name So if you're using If you're using Swift to store pictures of cats because the internet is for cat pictures Then your cat.jpg is your object that's stored there and that's what that looks like So when you're talking to Swift again, you're using standard HTTP HTTP verbs and response codes just like almost everything else in in OpenStack So if you want to write a new object, that's a put to the object name and you give it the data That's it if you want to get it back if you want to read it to get To that same object name and you get it back and it's good One of the really great things about the fact that this is basically speaking the same like native language as the web is that it Means it really integrates very very well both on the client side Meaning that you can have a web browser talk directly to Swift and Even on the operational side It means that it's really easy to put in existing tools that you can just get off the shelf from anywhere else So you have some really hot content. You need a caching solution Great you can put varnish or squid or whatever you want on top of this because they already know how to speak HTTP So you can easily have that content Cached in fact, that's actually what Wikipedia does they have their Swift cluster and they got some caching layers to do some stuff there and that's It just works very well You don't have to worry about having a custom solution for your particular problem that it's the same problem that everybody else has so That is a good question. I I will answer that one now The the question was about deletes I I think in general what I'll say is that when you are able to in Swift with the Swift API You can overwrite a particular object so I can write it one time and then I can use the same name and put new data on that That's fine And if you delete it then you will delete that object name and that will be gone It's not there's that we do have the feature inside of Swift for versioned rights So you can kind of push it down on a stack and then a delete will pop the stack But without enabling that particular feature for a particular container, then you will any data in there if you delete it It's it's gone. That's what you ask it to do So here's the basic parts of Swift. It's a very simple Very simple diagram the user talks to a proxy server proxy servers talk to storage nodes The proxy server is responsible for handling the user requests Of course implementing most of that API and then coordinating all the communication with the storage nodes the proxy server isn't Doesn't have any storage in and of itself It doesn't have any drives attached to it or anything like that Really, it's shuffling packets and making sure that failure scenarios in the content in the cluster are handled The storage nodes are where the data is written and basically there's three different kinds of storage nodes and really put together on a deployment There's different ways to do that The object the container and the account knows match to the account container and object in the API Again, I hinted at this earlier But the nice thing about this design is it gives you a modular design where you can if you need more you can add more So if you need more storage nodes, you can add more storage nodes And you can do that independently of adding new proxy nodes If you needed more client connectivity and bandwidth and things like that You can add more proxy servers and you don't have to go buy new hard drives to do that And so normally a cluster is going to look like several proxy servers put behind some sort of load balancer talking to a set of storage nodes and What this means is that you've get you get a very highly scalable system that actually Improves as it gets bigger because there's more pieces that fit together or that help Offload any sort of problems that you may have I mean hard drives fail quite a bit. So you want to You want to make sure you have the entire cluster helping handle those sort of failures and It means that as you add more and then you add a lot more Your cluster is linearly scalable. So if you need more, there's none of these pieces share any state So there's no central metadata layer that is kind of a bottleneck for everything So it means that if you need more things you can add more things just exactly where you need it There's no single because of that. There's also no single point of failure in the system, which means that if you Pull his power out of Iraq or a particular server then it automatically works around that and can fail it It's also the same thing if you need to upgrade something you can upgrade one piece of the entire cluster While the rest of it is continued to be continued to run even if it's patching a kernel and having to restart that Restart is basically like pulling the power because it's now not available So you can so you can easily have these operational methodologies that just assume this cluster is going to stay healthy And it's going to keep running so If that happens if you've got all of these systems you've got all of these different storage servers out there doing all of this Placing placing the data throughout the entire cluster. How does Swift know where to do that? So There's a few different ways Hold that thought just a minute Swift is optimized for For being highly concurrent What this means is that it's not going to be optimized for a single stream throughput But you can do 10,000 streams at once sort of thing So Swift is basically built for scale and optimized for durability. You're not going to lose your data Availability, which means that even if half your cluster falls over you can still read and write from it and Availability, which means that I'm sorry. I already said availability Concurrency across the entire data set which means that especially when you have things that deal with lots of web content Or mobile data you user-generated content all of that is a Aggregating aggregating that throughput together is how Swift scales rather than saying I'm going to do 10,000 Request to one particular video. We're going to do 10,000 videos at once or whatever the number is for your use case. So That a placement. That's where we were going earlier. So sorry about that How does it Swift know where to put the data? There's basically two things you can note you can two questions You can ask about any storage system when you get the answers to those you'll know a whole lot about how that system works and where the Failure modes are going to be those questions are first how to how do you do data placement and number two? How do you handle failures? So let's talk about data placement Swift uses something called a consistent hashing ring to place your data a hashing ring is Sort of complicated topic, but we're all kind of familiar with the concept of hashing things up into different buckets You can think about that from an insight perspective of an encyclopedia if you need to go look for an entry on Flowers you go look in F if you're gonna go look for an octopus you go look for oh It's the same way with Swift but instead of doing it based on kind of the problem with this kind of naive hashing and bucketing of your data is that not everything starts with a queue not everything not too many things start with an X And so you have unequivocal uneven sizes of things So there's a way you can kind of you can you can get better than that So here is basically what a consistent hashing looks like You take a hash function that is going to distribute your data when you when you pass in something This is something like md5 or shot 250 shot or murmur hasher at these other things that have this a Good distribution of bits you put something in you get you get Shuffled bits out as As a quick footnote here I want to point out that we are using these hashing functions only for splaying data and not for security purposes So don't get scared when I say we use md5 It gives very good data distribution that's what we're using it for so the point is what happens is when you hash something you get a big number and If you get the biggest number possible and you add one you basically look back around to zero Which gives you the concept of a ring because you can keep going around and around and around it So what happens is if you have a basic consistent hashing ring you will hash some characteristics I'm unique characteristic about each of the things that are represented in the rings in this case We've got a couple of different nodes And so we hash them and it's basically random locations throughout the ring and then we want to find out where something lives We take something say a hilarious cat photo and we hash that and we realize this is where it lives On the ring. That's where it maps to and at that point then you start walking around the ring Sorry about that. You start walking around the ring until you get to something So in this example here, this hilarious cat photo will be mapped to node one because that's the first thing in encountered So you can see that it's not a very even distribution around here But it does work and especially you get more and more things it kind of averages out But you still have to walk around the ring and that that had some complexity and some slowness on things So you can do a little bit better So Swift does a little bit better and evenly divides that hashering into even size partitions Just pieces of the key space So in this case when we hash something we get out that big number some big long stream of bits And we can take a prefix of those bits and do a direct look up and say that oh that directly maps to node one So in that case is nice fast constant time look up and we get nice even placement around that now inside of Swift we've done a few more things which is kind of nice When we when we place the data We place it according to a hierarchy of failure failure domains and at the very top you've got a region Under regions can have zones inside of them and zones have servers and servers have drives So ultimately data lives on the drives. So we need to make sure that it goes well across the drives So when something is placed into Swift We choose a subset of those drives and a triple replicated sense You're going to have three replicas you're going to choose and that's going to be placed as uniquely as possible So in this example here, we've got a lot of drives. We've got four servers And in this case every every replica is placed on a unique server because we have enough servers to do that And we know that we can see that there there are two replicas in one zone and one replica in the other Because we only have two zones when we've got three replicas to place So that that makes sense and in this case they're all in the same geographic region and if you added More if you if you go wide on your deployment, then you get a wider displacement of your data So the next thing to talk about I said earlier is failure failure domains So as of right now, and I'll cover this in a little bit Swift is a replicated storage system, which means that we're storing multiple copies of your data For durability and availability, which gives you very nice cheap recoveries it gives you easy reads and writes without having to have a lot of cpu behind those and It's really good, especially for a lot of the web and mobile and smaller content that people store So we are working on erasure codes right now. That's something that's in development We're actually going to be talking about that quite a bit tomorrow during the design sessions And we've been working on it for a while already It's based on top of the concept of storage policies that we have inside of swift that are part of the juno release We also continually ensure Durability of the data by Check-summing it and ensuring that it still matches the same checksum as when it was originally stored Which means that you're protected against bit rot. You are protected against file system corruption You know, maybe the power was lost in the middle of a right. I don't know why that would happen Maybe there was a tour in the dc that day And so it will it will automatically check that and it will do that every In the background Continuously and also every time the data is read and in response to a read request. It will validate again that that's done Now what happens if something fails or Actually, which way do I go there? Yeah, so the auditor is always running in the background and checking that and if something fails If there has been failures and that's going to be moved out of the way It's going to be quarantined So that particular bad copy of the data is is now out of the way and not going to be served to the user at all And then the way we fix that is With replication. So we've got a replicator process that runs in the background and notices It's a push-based model. So that one particular server can look on its local drives and determine This is the data that is on my drives What data is now supposed to be on my drives that I have? Okay, that's great But there's these other things that aren't or let's make sure that the other I find a particular piece of data where the other places It's supposed to be check with them to make sure that they have it and if they don't Push it out there, which means that for example, if the if the auditor quarantines a particular object Then One of the other replicas will notice that oh this one doesn't have it anymore push it back over there and therefore That entire that's happening continuing the cluster and the entire the entire cluster is participating in in In rebuilding failures If you lose an entire drive and put in a blank unformatted drive, it'll automatically fill it back up Um from the entire cluster, which means that it happens quickly. It doesn't overwhelm your system and it works really well So swift and open stack uh started off with this swift is one of the as you know is one of the founding projects of open stack, um and One of the things I love about being at these design summits is the fact to being able to closely work with the other people So that it integrates Well with keystone we have keystone v3 support in juno now It integrates well with salameter and uh glance and the other projects that need to place to store large-scale application data So As I said, I work for a company called swift stack. We do a little bit more Beyond swift but not modifying swift itself. It was a very quick overview We we sell uh Monitoring and management servers support. I'm sorry monitoring and managing Excuse me. We sell monitoring and management software for swift so that you can know what's going on in your in your cluster And how to take care of it We also have a gif file system gateway that you can put on top of swift So you can speak to swift through sifts or nfs. So you can talk you can integrate with all systems if you're using um and and your existing um Monitoring and learning and and so it works. It seamlessly integrates into your existing it infrastructure So if you want to know more info at this point, which of course I hope you do Then there's a lot of places you can go So if you want a very detailed Overview if that's a thing then uh, check out a swift stack.com slash open stack swift. It's probably the best single page on the internet of Here's all you need to know about swift with diagrams and pictures and links to videos and things like that Of course, you can find the api docs with the open stack api docs If you are interested in being a contributor Love to have you join in and we hang out on open stack swift channel on free note My nickname there is not my name. So I'll be happy to help out and answer questions and things like that Of course, you can get the book. Um, and again, I think the expo hole closed Go to swift dot swift stack dot com slash book and you can request a copy there. You can get it both in Book form and also electronic form um swift stack also allows you to get a A free trial for dev and test so you can just go download that See it run the software yourself easily easily very very quickly get up and running And have a swift cluster that you can now play with and see what happens And then we have a swift stack videos youtube channel that will let you We're going to put all the recordings from a swift related sessions up on there and Things from elsewhere in the ecosystem Try to track those and put put a lot of info there. So it's a great place to spend a few hours on videos So at that point what questions do you have? I'll be happy to answer anything for the next About five minutes Okay, two questions. So the availability zones are those kind of arbitrary like a nova or those Do you do swift expect that to be kind of mapped to some? Real physical thing. Okay, great question The I expect that the availability zones Zones for short Are representative of your physical failure domains So it's not an arbitrary grouping. It actually should reflect the facts that you've got actually different racks or different rows or different utility power supply or different dc rooms or even different dc's in a metro area and If you don't have different physical failure domains Then you use one zone if you're putting a swift cluster in just one rack And it's all behind the same top rack switch and the same power supply Then that's just one zone and that's okay. Swift will completely take care of that But as you add more it will continue to grow out But it's up to the swift administrator to say which Storage nodes are in which zones exactly So what happens is when you're creating the cluster and when you add capacity to the cluster You're adding hard drives basically storage volumes And so when you do that You add what identifies that to the data placement is this is the the name of the drive It's on this particular server in this zone in this region And you add that piece of information and then there you go And then my other one was could you talk a little bit about how the How the hashing algorithm the placement algorithm handles When you add in more hosts to this thing like how it reshuffles or whatever. Yeah, good question So how does Consistent hashing deal with the rebalancing of data So the idea is I have 99 hard drives Let's just assume because it's easy. So they're all four terabyte hard drives all equally sized So at that point I add in another four terabyte hard drive now. I have a hundred total What happens at that point and there's the really great thing about consistent hashing is that The amount of data that's moved around is proportional directly proportional to the amount of capacity that you added So in this case swift is going to move about one percent of your data because you added about one percent capacity And that's that's the reason you choose consistent hashing it means that if you plug in a new rack And a second rack then you can gradually add that and you increase its weight in the system so that it will gradually move things over and then It won't immediately it won't cause downtime because it's automatically overwhelmed Your networking because everything's like I've got to put everything over here Now let's put it back or something like that. It's It's going to go gradually. It's only going to be proportional to the amount of capacity you're adding So again, that's one of the examples of kind of gets a little bit better the bigger it gets Because if you've got 10 hard drives and you add one well now you've got Roughly 10 of your capacity you just added But if you've got a petabyte and you're adding in There's another server. Well, that's almost a non-event. You just kind of put it in there and it takes care of it automatically Anything over here? So based on your experience What you've seen in the industry do customers create separate backups for this For the objects. I know there is three blocks and all to recover, but you know, do they create still right? So I won't lie to you that I've definitely had people ask me. How do I back up my swift cluster? Which the answer is generally, well get another swift cluster That's that's more for ceasars than than practical No, you don't really need to back up swift because swift is a durable storage system. It would be backing up your backups Because swift has Replicated storage and soon erasure codes It is storing those across those failure domains automatically managing and handling failures in your hardware or software for that matter and Ensuring that it's durable and available. So You could but you're not really gaining a lot from that It would be tricky. It's one of those things. I mean swift is built for petabytes of data. So it's like I don't need to 20 petabyte backup. So what do I put that another 20 petabyte swift cluster at rest encryption? That is a good question So, uh, actually that wasn't a question. That was three words So, okay, so, uh, what about encryption the way swift works today is that you hand it bites It dutifully stores those bites on disk you ask for the bites back and it's going to give it back to you just as you stored them So if you need to store encrypted data the intent principle says that you should encrypt the data and then store it in swift That way, you know that it's encrypted On the other hand, there are other use cases that say well, we don't control the application necessarily But we still need to have this data at rest encryption So, uh, you can mount swift on top of encrypted volumes You can have seen little cards from vendors that encrypt the sass channel or say to say to a bus Or you could put it on top of lux volumes or something like that Another thing we are actually working on this is kind of the third point there Is adding in the ability to have some encryption inside encryption in the swift proxy So that it will encrypt as it goes through and the decrypton it goes out Which means that you get to the advantages of fast elites and you can You can have untrusted or unknown clients connecting that you know, the data is still going to be Encrypted I think we have time for one more Unfortunately, I have another talk immediately after this. So I have to it's only one more third question actually There are no limits in the swiss code on how many objects can be put in the container But are there any practical ones? That is a good question There aren't you are correct that there are no limits that on any On the cardinality of objects that you have inside of a particular container or even just in the cluster overall But are there any practical limits? It is highly dependent upon your particular infrastructure and use case The if you have high cardinality of Of objects inside of a particular container and I'm thinking on the order of many millions of objects at that point you You may not be able to sustain as many writes per second to that particular container But it's completely does not affect any other container and it completely does not affect the read path So it only is if I have to sustain a Thousand puts per second to a particular container At that point then I need to appropriately size my container and size my hardware so that I have enough Scale to to deal with that sort of thing and that's again the the advantage of the modular designs You can actually improve those particular pieces and do that and your application can appropriately Splay the data across because remember swift is designed for concurrency across the entire data set So use that use that in fact i'm going to be talking about that in My next talk in about Five minutes. Thank you very much Have a good day