 Good. Hello everyone. So hi. My name is Joe Arnold. I'm the CEO of Swiss stack I'm John Dickinson director of technology at Swiss stack and also project technical lead for Swift I've been working with Swift for a couple of years now We've formed Swiss stack to be a center of excellence for Swift. So we provide tools for deployment management monitoring that sort of thing along with support and training for open stack Swift So today We're gonna talk about Swift and this is gonna be an introduction talk to Swift So we're gonna cover a broad base of things and go at the surface of a few things So what we're gonna talk about is Swift and open stack an overview of open stack Swift Swift use cases and Swift architecture so first Swift and open Swift began development in 2009 at rack space. So Rack space was building something to be competitive to Amazon s3 and what they came up with was cloud files in 2010 Open stack was announced and Swift was one of the two projects Nova and Swift and that's what was became open stack two years ago and The context at which it was developed was it was there was already a large scale scale system at rack space that needed to be built To handle the amount of data and the amount of users and so that was the context in which it was developed currently there over 70 developers on the project and There's a lot of momentum You can go get Swift today from lots of different service providers, you know rack space, of course Where it's all started? HP cloud has an application of open stack Swift internet KT and Korea Software they've done some cool things around metadata search and Swift Hylex in Australia Inovance in France So you can get Swift today so I'm gonna pass it next to John who's gonna go over an overview of Swift and of object storage Great, thanks So the reality is that everybody has data and it's always growing and you've got to figure out a way to reliably store it and All of your data though doesn't cleanly fit into just one kind of storage system You've got customer data that's really good in a relational sense. So it works well in relational databases. You've got Backups you've got web content that you would use on your websites Potentially customer information static files that you need to display The reality is though that these things don't fit cleanly into one thing You're not gonna put your web content into your database And you're probably not gonna put your customer information into flat files someplace And you're certainly not gonna want to store all your backups on expensive SSDs So what it turns out is that there are basically three general types of object of storage That you can use so let's go over each of those in turn and figure out where Swift fits in The first kind is block storage second type is file storage and the third type is object storage Block storage is something that's probably familiar to several people here and it takes a raw Unformatted device and exposes that directly to the application. So a common use case here is databases They really it block devices are very important when you need Very fast access and highly customized access patterns onto your data databases make great use of this And of course databases are going to be very good for storing relational customer information File storage is probably the most familiar to all of us because what we it's what we use every day You see it every time you open up your laptop you work on a desktop The concept here is you have a formatted drive and that you deal with files and and directories. They're nested within one another But the limitations of file systems are that they don't generally work exceptionally well when you're dealing with a very large amount of content or or a Specific piece of content that has to be accessed quite a bit Some of the restrictions around this have to do with the POSIX compliance layer that file systems conform to However when you lacks these These POSIX constraints one thing you can do is start to scale extremely large and that's where object storage comes in It's probably the least amount least familiar to most people object storage takes a blob of data just a chunk Oftentimes distracted as a file, but just a blob of data and is designed to store that reliably and cheaply and massively object storage is very good for Unstructured data that can grow without bound. So when you're talking about backups Snapshots archiving those sorts of things very Large blobs of data you you're gonna always take your back up so that aggregate content is going to grow quite a bit static web Set of web content Documents these other sorts of things very good use cases for object storage So where to Swift fit in? Swift is an object storage system. It takes unstructured data And stores it very cheaply and reliably and at very high concurrency Swift is highly scalable It's highly scalable in a couple of different ways one is that you can continually add Add on your back-end storage to Swift and it will continue to grow and number two You can continually modify you can continually expand the front-end piece of Swift so that you can meet the Data concurrency requirements that you have for your particular use case Swift is also extremely durable. We use three full replicas of all of your data and beyond that Swift is smart enough to know a little bit about the layout of your actual infrastructure and Optimizes the data placement so that each particular replica is in a distinct availability zone or even if it can't choose Distinct availability zones. It's gonna choose different servers and different hard drives and what this guarantees is that you will not have a Data you'll not have data loss and you probably won't even have any data availability in the case of Certainly in the case of common hardware failures and even major hardware failures losing entire racks of or even data center rooms You can still have a functioning and running an available Swift cluster Swift is highly concurrent. It is designed with zero shared Shared knowledge in the system. There's no single point of failure And so you can continually add to Swift and it is horizontally scalable It's optimized not for a single stream throughput, but rather You can do 10,000 streams at once or however many you need and this this is reflected This is this is reflective of where Swift came from and being built for a large public service provider Swift is built for operators Coming out of rack space and I was part of that team that helps build that helped build a swift at rack space We worked extremely closely with The guys who were going to be actually running the system in production and the guys who actually had to answer The pager call at 3 a.m. And so what it meant what this means is that we had to design this thing so that a It's going to be up and running and suitable for the very large production use case But be guess what you've got to wake up tomorrow morning and sit next to the same guy who just had to Wake up at 3 a.m. Because of some bug that I introduced that I don't want that to happen. So It was designed so that it would be very robust in the face the case of failures It would implement some self-healing properties so that your operators don't have to wake up at 3 a.m. Or come in on come in to work on the weekend if they in the case of common common failures Beyond this Swift is designed built on top of Standard technologies that have been around and been well tested for for very long times So for example, it's uses standard file systems to actually implement the data storage piece at each on each object It also uses common technologies that are very familiar to operations and sys admins Everywhere things like using our sink is a data transport mechanism for efficiently moving your data between different object servers thank you John and So on to on to Swift use cases and what I'm going to do here is I'll just go through a few use cases You know that we're seeing with customers and in the field with with with how they're using Swift and so We're going to talk about web and mobile applications private file-sharing applications people using it for data analytics and Infrastructure as a service and so first web and mobile applications and so within that very popular website and mobile gaming application and so the The big top five websites is Wikipedia. They're using Swift And what they're doing is they're storing images audio video into the Swift cluster and The facility that they provide for their editors is that they can size an image on how big it'll appear on the on the page And that gets resized on the fly lazily and gets stored back into the Swift cluster and With Swift what you can do is you can write middleware to perform actions like that in this case imagery sizing and they also serve a lot of traffic and how they're doing that is well Remember, it's an object storage system and the API that's being used as HTTP We've been in web infrastructure for a while. We know that there's a lot of solutions for caching content via HTTP, and I think they use a combination of squid varnish, I think I think so And that sits directly in front of the Swift cluster That means that hot content can be cached Stored in memory and you can build that out So in that case most web cases you see like five percent ten percent of the content is the most popular that gets cached But then the rest of that stuff That's infrequently accessed. It's still readily available. It's it's served from bit disk But just not as frequently so you might see like rover curiosity gets cached a Diagram of how to implement d-plus trees and maybe not so much and that's that's in the long tail of content, so that's what enables them to do the next mobile gaming so Big mobile gaming site and so what they're doing is they're serving out game assets Directly out to their users. They're handling ingesting uploads directly from those devices into Swift They're serving the content of the games themselves the content themselves And all the while having to handle massive concurrency And so remember HTTP is served directly from the device But that also works in reverse right so you can you can upload content directly up from the device so in the case of Mobile devices those things are producing. I mean a lot of content right there People are interacting with them. They're playing games on them They have cameras now on them to video record a lot of data is coming from these devices and That's getting stored in the Swift cluster Now if there's a common asset that they need to distribute and it needs to be very snappy one of the facilities Swift has is content delivery Integration built into the API so you can have an object and ask for its CDN URL and then embed that into the application and so then that will be the Swift cluster will be used as the origin for the subsequent requests and Then massive concurrency right so because you can as John mentioned You can scale out each of those air each of the levels of the the Swift cluster you can handle a tremendous amount of users all within the same Storage system and what what what typically we're saying is people have built out that Infrastructure and they're using some sort of hash on the user so they're sending users to multiple different storage systems And that adds complexity because now the application has to understand which storage system Do I put data in for this particular user? And with with with how they're using Swift you can remove that have a single namespace to put data into Private file sharing applications in this case. It's a consult big consulting company and they need to support enterprise authentication what that means is active directory and fortunately Swift has a a very pluggable authentication mechanism so that you can swap out different auth components to service the requests data analytics So we have a there's another consumer devices company and what they're doing is they have Lots of devices and they're all sending up data into the Swift cluster But they need to process that data And so what they're doing is integrating that with Hadoop and the marriage kind of works like this you have Swift which is like a tractor because it prioritizes Durability availability High concurrency so it takes advantage of those features Hadoop is like the sports car Because they're tuning it for performance turning off all the right caching so they can crunch through lots of data very quickly And so the storage between HDFS and Swift are very different and tuned for different purposes Then what they did was they wrote a client to Take data that's in Swift tell Hadoop about it That client then feeds the results of that job back into the Swift cluster. So it's really similar to how Amazon the their map reduce Functionality integrated with S3 really similar So that's what they're doing and then Infrastructure service and it's I guess it's not really a category so much as and I mentioned all those other service providers that are using Swift But what we're finding is that in eternal it is trying to act more like a service So instead of building out infrastructure that's tightly coupled with an application They're trying to provide infrastructure as an internal service to service multiple users multiple applications So now they have concerns like charge back to those different services being able to do on-demand storage Multitenancy because now they're servicing multiple applications and the confluence of all that is now they have a larger storage pool to That data is being stored in and Swift so Swift can help out with all of this Onto Swift architecture Thanks, so let's go over how all of this works very high-level view We've talked about what the use cases are basically what problems to Swift solve so let's figure out how they solve it Swift basically has four different Pieces there four pieces inside of a fully functioning Swift cluster The first piece is the proxy server and if you're around any These open stack summits and other open stack conversations. You'll hear it's talking about these individual pieces The proxy server is the front-end piece of Swift This is what implements the Swift public API and this is what customers direct their clients directly deal with The proxy server when it receives a request will choose the appropriate back-end servers from these From these storage nodes so for example if we're talking about a cluster with three replicas Then it will choose three appropriate replica the three appropriate servers in the back-end at that point It will send the request concurrently to the back-end clusters What this means is that the proxy server is not doing any sort of spooling to disk. It's not doing any sort of caching for you and that means that the proxy servers can actually be Horizontally scaled out so if you need to handle more concurrent requests per second more aggregate data throughput Then what you can do is add on more and more proxy servers Each of the proxy servers will then at send the concurrent requests down to the object servers And then wait for that response to come back The final thing that the proxy server is responsible for is coordinating those responses from the storage nodes Swift works on a Quora model So in a three-replicit cluster if you're doing a put request for example It requires that at least two of your replicas have been successfully written to disk before it will send a success back to the client What this means is that Swift will never send a request a success back to the client until that it knows that With a very high degree of certainty that your data is securely and durably stored within the cluster the second piece of a Swift cluster is are the object servers the object server is the basically the heart of of What is actually storing your disk the object servers receive the request from the proxy server and then write that data out It turns out that it's quite easy to abstract the concept as I mentioned earlier of these Blobs of data as files and so when we were designing Swift we were looking in to figure out how to efficiently store the These files on disk and being able to do it reliably and simply and then also store the associated metadata with that And it turns out there's a really good technology that's been around for decades that already does that it's called a file system so we implemented The object service to write out to the file system and then all of the metadata on this object is stored in The extended attributes of a file system and what this means is actually that Swift can run on any File system that you have as long as that supports extended attributes We have a few recommendations of what we think is best based on our testing and our uses but you can use Whatever you find is most appropriate for your infrastructure the third major piece of a functioning Swift cluster are the account and container servers These servers are responsible for maintaining listings of the thing underneath it So Swift's data is organized in a fairly flat namespace But it can be delineated and broken up into the different accounts So that's how it handles the multi-tenancy the accounts store a listing of each of the containers that are in that account and the Account namespace that you give to a customer is therefore also further broken down by that customer Into containers the containers therefore will also store a listing of the objects that are in that particular came container The accounts and containers store a little bit of additional Metadata as well for example the total number of bytes stored in the total number of objects and containers in the the container and account These together Provide the listing supports that you get with Swift But for the most common use case the most common request that you generally see in a Swift cluster the read requests These are completely out of the data path and so therefore Are not a limitation on your request scalability the final piece of a functioning Swift cluster are the Consistency processes these are the background processes that run on your storage nodes and ensure that Swiss that the data that Swift is storing is both correct and is Fulfills the entire durability guarantees that Swift is is giving you so there's a few processes one of which is a An auditing process the auditors will continually walk the file system and ensure that the data has not suffered any sort of file system Corruption that suffered any sort of bit rot that is introduced by your disks And if it detects anything like that it will quarantine that bad data and ensure that repla and allow replication to Replace that particular replica with a good copy so replication continually runs on these back-end servers and Continually works to push out to good copies to the other to the other servers So for example if you have one object server running It looks and sees what data it has and then it checks and sees the other two places that that data should be and pushes that data out there the final piece of the consistency servers is the The updater processes this is what ensures that the objects actually live inside the container listing and it aggregates some of that metadata like the Total bytes used up from the the container level into the account So when we put it all together Swift has a modular design It's a fairly simple design and it's based on reliable technologies that are in and each of these tiers in Swift can be independently scaled out to meet your particular use cases the most common question I probably get about Swift is well what how should I deploy this what what is my hardware and my answer is always the same It's I don't know it depends a lot on your data And the ability the the advantage of Swift is that it's able to actually be flexibly deployed to meet your use case Joe mentioned quite a broad variety of things if you're looking at archiving versus Mobile gaming those have vastly different usage profiles, but Swift is tunable to do that We're gonna be sharing some more information about this on Thursday We have some summits and would encourage you to attend Starting at Thursday at 9 9 in the morning We will be talking about how to install Swift on your laptop if you show up with a laptop We'll walk you through the process of getting that up and running I will follow that by running a Swift cluster and we'll take that Version of Swift that you installed in the first workshop And then we'll go through some failure scenarios and the typical operations procedures that you would see in production And then finally we'll end the day after lunch at 1 30 on building an application and actually a server side application They can run in your Swift cluster and be interacted with by a Standard client web browser and so the really cool thing about this is that you can show up at 9 a.m On Thursday morning and by 3 p.m You can have a functioning Swift cluster running on your laptop and see the complete end-to-end process of how everything works And with that we are the last session before lunch So how about this for questions? We'll just stop now if you have a question you can come up and that way all of us can get early into the lunch line How does that sound? Thanks