 Hello, everyone. Thank you for coming to the last session on the last day. Good for you. My name is John Dickinson and I am the OpenStack Swift PTO and I am also a the director of technology at a company called Swiftstack. So I want to talk about this. I've wanted to give this talk for a lot of different summits and this is now the time to do it. This is great. So the point of this talk is to answer a few questions and hopefully provide some answers that we can point people to later and you know resolve some outstanding questions potentially. So I want to talk about what Swift is. What Swift is not. I want to talk about where Swift is extensible, how people are using this, and then kind of what the future holds. So to start with I want to start I want to talk about what Swift is not. Swift is not a product. Swift is a storage engine, which means that if you need a product, if you need, there's lots of other pieces that go into a product that are not developed as part of Swift itself. You've got to deal with things like billing, you've got to deal with things like integration and you've got to deal with user identities. Those are kind of some of the big things. You've got to you've got to deal with kind of the operational putting it together and making it making it a whole system. Swift, I'm going to tell you what it is in just a second. It's not a product though. It is also not, absolutely not, a provisioning layer for other storage systems. You do not go ask Swift, give me an object storage cluster. That's not what you do. It is also not a file system, nor is it directly mountable. So you do not have a Swift cluster and then mount it directly on your Nova instance or your your bare metal server someplace. So if that's the case, then what what what is Swift? What would you say you do here? Swift is an object storage engine and it is an implementation of an object storage engine. It is responsible for durably storing and maintaining the durability and availability of your data and supporting massive concurrency across that. It's built for scale. Very large thousands, tens of thousands of hard drives, petabytes of data. It is it is optimized for durability, availability and concurrency across the entire data set. Just a little bit of an overview of how Swift works. How we do that, the pieces of Swift, which is going to come in as very important when we talk about the different areas where Swift is extensible. Start with users. We always need to start with the users. Users talk to a proxy server. The proxy server is responsible for implementing most of the API and coordinating all the discussion with the storage nodes. Storage nodes are responsible for persisting the data to disk and they provide a little bit of the API, but that's that's where that's the basic idea there. And the really great thing about this is that these two components, the proxy servers and the storage nodes are stateless. They're independent. There's no no single point of failure there, which means that you can if you need more you can add more. You can add in different proxy servers. You can add in different storage servers and often times what you'll see is that you'll put the proxy servers behind a load balancer So you have just a single endpoint that you talk to and that works. That works pretty well. So that should give you on its own right there looking at the clients talking to a proxy server, proxy servers talking to storage servers And we'll get a little more detailed in a second, but that that should give you a very good idea of where the extensive ability points might be. There's two major pieces, kind of like two pairs that go together nice harmonious melody. We've got extensibility in middleware and I need to talk about that in a little definition a little bit. In the storage world I've learned. I'm very new to the storage world in the big screen of things and there's there's a common industry phrase of talking about middleware. Generally, you know things your your data is flowing through and stuff like that. Swift middleware is a little more specific to Swift's implementation and I'll cover how that works in a little bit. The other major piece in addition, so we've got middleware and the other piece where Swift is extensible is with storage volumes. Those are kind of things I want to I want to highlight and go into some depth on today. So together with those things you get a really powerful extensibility that allows you to I mean they complement one another. You can you can do what some things with one and some things with another and together you can you can use what you need. It's it's a whole idea that we've got with the modular Swift design that says that add more where you need it and if you need more functionality here well, maybe that's implemented in one way and you don't have to worry about the other way and so on and so forth. So let's talk about those. Middleware first. What is middleware? That's a big pipeline going through Alaska and the reason I chose that picture is because the way that middleware is implemented inside of Swift is always associated with a pipeline. A pipeline in in the world of Python servers, which is where which is how Swift is implemented, is a set of pieces of code that the request flows through and then goes does its thing. The pieces of middleware can intercept that along the way that request and gets finally down to the application and then the application generates a response and the response comes back through in kind of the reverse way and at any point a piece of middleware can be inserted and it can intercept and change if necessary the request on the way in and the response on the way out, which means that you say I want every 200 response that says good to actually be a 404. Well, you could do that by changing the data on the way out or something like that. If you want to rot 13 all the text you could you could do something like that. So looking at some of the implementation of what this actually looks like. This is a snippet from the proxy server configuration file that kind of highlights these two major points. There's a section in a in the config file that designates the pipeline and here I've cut out a couple of things, but I've mentioned the catch errors pipeline to get a middleware the gatekeeper the proxy logging and then at the very very end we actually end with the proxy server and then each of these different things like catch errors refers to a particular section in this config file. So in this case the catch errors middleware is referring to this catch errors filter in which case we're referring to a specific piece of Python code that is part of Swift and imported with as catch errors. And we have the same thing on the storage nodes. So here's a an example from the object server default sample configuration. In this case, we've got something a much shorter and we've only got two pieces of middleware health check and recon and here's the config files with their with their default values. So again in this case the thing I wanted to point out here. Let's take the health check example. So the health check just like the previous example in the Python. We refer to the piece of code, but health check here also has another piece of configuration. It allows you to have a disable path and it's just a file on disk. The health check functionalities is kind of nice. If you refer to the slash health check end point on a cluster and everything's okay, then it will return with okay, and that's it. It's kind of good for load balancing and seeing the servers alive. But if you have this file on disk, then it will no matter what always return with an error return with the 503, which is kind of nice when you can drop this file on disk, say in the midst of an upgrade. So if you reboot it doesn't automatically start up the servers again and saying, hey, I'm ready when you're actually not ready. You still have to do some other configuration after you reboot or something like that. So this is what the health check is done, is doing, and this is how it's configured. And then we have another example here of recon. It's own independent pieces of configuration. And so when they're loaded, this configuration is parsed and then passed into that middleware and they can do what's necessary for that configuration. So where would you use this? What kind of middlewares can you have? So I have some examples of things that have been done, third-party in the ecosystem, and then a lot of things that have been done inside of Swift itself. If you've been to any of my talks over the past however many years we've been doing this open stack thing, you'll notice that I really like these guys. I love the use case mostly because I can tell my mom that any time you go see a picture on Wikipedia, it comes from their own Swift cluster. And that's just really cool. So Wikipedia in using Swift has added some pieces of middleware. One of the things that you can do on Wikipedia is you have different sizes of images and they want to be able to generate those, having hundreds of millions of images inside of Wikimedia Commons. You don't necessarily want to store every single arbitrary size possible thumbnail in there. That's just actually impossible. So instead they have a piece of middleware which will intercept some, I think it's query parameters on their, on the path, on URL. And they can on the fly check and say, oh do I have one of those thumbnails at that arbitrary 200 by 200 pixels or whatever size generated? And if not, they can generate it on the fly and then return that back and then they can cache that as necessary. So they have added some middleware for generating arbitrary thumbnails for their images to their Swift clusters. So that's kind of one example. It's generating content from content that's in Swift and returning that. Another example, SoftLayer. It's by IBM, but they've been running public Swift clusters for a while. And one of the things that they do as a public cloud provider is they provide Swift, but they also have some metadata searching and indexing on top of that. The way they've glued those two pieces together with Swift is they've added some middleware that can squirt out the right messages to the indexer and then intercept any of the appropriate API requests to search on that data and send that to indexer and so on and so forth. Which means that if you store data in Swift, they can index it on arbitrary metadata and then you can use their API to search on that. And that is done inside of middleware. Which means that, again, that's something that they can do, they can provide a value add on their thing that is not in conflict with just using the upstream code. And it's not in conflict with, say, IBM doing their, I mean Wikipedia doing their thing depending on what the use cases are. Rackspace is obviously where Swift started, so they've been using Swift quite a while and they have some fairly large Swift clusters. A piece of middleware that they have that Rackspace has been using is part of their specific value add, which is different than soft layers. And Rackspace is, Cloud Files is tightly integrated with CDN providers. And so they needed a way to, the only way you can make content public inside of Rackspace, Cloud Files, is through their CDN, through the CDN partnership that they have. And so they needed a piece to bridge the gap between those kind of public, publicly available things and the authenticated request inside of Cloud Files. And so they wrote this middleware called SOS, Swift Origin Server, that is available and I believe it's used at HP as well. And so that was adding that functionality in a little bit of their CDN provisioning API for Cloud Files. Another one, large user, and I believe they were giving a talk earlier today. NTT data has no, NTT, no data, just NTT. Cover up that DATA, excuse me, sorry. NTT, point is NTT is a very, very large company and I get confused on who works where. Okay, the point is that NTT has also contributed some other things inside of the ecosystem. These are some things that some people are really interested in. There have been some contributions on how do you have a different API to Swift itself to translate it in. And so there's a Swift 3 middleware that provides a subset of the S3 API so that you can use S3 clients to talk to Swift. And that is done with a piece of middleware called Swift 3. That's actually currently hosted on Stackforge. There's been another one that I've seen in the past that I think came from IBM originally and it was, I'm not sure if it's currently being maintained or not, but it supports a CDMI plugin or API implemented via middleware. So the point is you can actually, forget just mutating the data, you can actually implement entirely new APIs on top of this. There were two talks yesterday that were very, very interesting about from IBM that were using middleware to run Docker containers on the Swift cluster. And so I figured that hit all the buzzwords and it's actually kind of an interesting, an interesting merging together of the computing storage. We've seen some other things. I'll talk about the other guys in a second. But the point is adding completely outside the box functionality of solving those analytics problems of keeping the data and the compute tied closely together. But again, being able to write middleware to do this. And then the last example I have of people in the ecosystem doing something is the company I work for, Swiftstock. We've kind of the most commonly used external middleware that we have is off integration to being able to integrate with LDAP and Active Directory without going through an external service. So all of these different pieces here, off to CDN integration to mutating the data, different APIs, all of that kind of things that have been over the last few years developed in the ecosystem already. And most of those are open source, which is kind of cool. So let's flip it just a little bit. And I want to talk a little bit about the parts of Swift itself developed as part of the OpenStack code that are implemented as middleware. Referring to Swift as pluggable and just talking about middleware and saying that, well, middleware is the way it's pluggable is true. But it's not completely true. It's not the complete story. Because middleware is a fantastic way to structure code in such a way that you can say this is an isolated piece of functionality that needs to mess with the request and we can know how to do that. So this is a list of pieces of middleware inside of Swift that are actually implementing things that are seen from the client that have to do with the API. So two of them I want to talk about here, or a couple of different categories. One is cross-domain. This one is a very simple piece of middleware. It implements the endpoints, cross-domain.xml, which for those of you who have done media online know that that's key for flash content online. It allows you to do your browser security model stuff. And so you can, with this piece of middleware, with Swift out of the box, have access, if you have turned on this cross-domain middleware, you can now access cross-domain.xml, and it will return the appropriate thing as you've configured it. So that's just a new endpoints added to the Swift API that is implemented inside of Swift as middleware. And the other two together that I want to talk about are for large objects. There's two pieces of middleware, one called DLO and one called SLO, for dynamic large objects and static large objects. So the way large objects work in Swift. So you take a piece of data, like a file locally, and you throw it into Swift, and it's an object in Swift. Well, the object itself has a size limitation in Swift, but you can tie together different pieces, different objects, into one logical large object that gives you essentially unbounded size in Swift. So if you need to throw up your two terabyte object, you can do that by tying together smaller pieces and tying them together with a large object file. So the way the large objects are implemented is really interesting because it's all in middleware. So what happens is that the request comes in for, let's say, to read the large object. The request will pass through, and then when the response starts coming back the other way, the middleware sees that it was responded and sees that, oh, that object was actually a manifest file. Okay, I know how to deal with that, and then it can parse the manifest file as appropriate, and then it can make other requests back into the system, generate new subrequest into the system, and start refetching the data for the referred to files, which means that then the response for the first file comes back and that's what's sent back to the client. And as soon as that gets done, it can make that second request to the cluster and then start sending that and thus concatenate the contents of the referred to files of the manifest. So those, the large object support inside of Swift is done entirely in middleware, which has a lot of advantages. Not only does it just move code out of the proxy server itself, and the proxy server really has to be simplified and just worrying about coordinating the responses for storage nodes, but it also means that as you add new functionality, you can start composing it, which means that you don't have to, you're generating these other subrequests and if you need to make sure things are authenticated properly and if, well, here's an idea, what if you wanted to nest different kinds of manifest objects, then that is kind of automatically taken care of, or if you wanted to compose functionality like, what was I just thinking of, the large objects and versioned writes. So the code complexity inside of the proxy server trying to figure out the orders of those and what order they could be is the manifest referring to versioned files and what happens if the manifest itself was versioned and how does that work. But if you've got those composed as middlewares, then it becomes much simpler to reason about it and figure out how those things work. Now there's another set of middleware that's inside of Swift itself which aren't directly seen as far as API goes, but they're more seen from the operator side and these are operator functionality pieces that are part of Swift itself that are incredibly useful but you're not going to really see it writing a backup utility or something like that. And these are kind of things that are, again, vitally important. The catch errors, I think, is one of the most important pieces of middleware because it wraps up the entirety of things to ensure two things. One, that the client will never actually see a traceback. So if there is some uncaught exception, it will catch it right there, it catches the errors. It will appropriately log it so that the operators can deal with it, but then the client will get a nice 500 message and not have to, you don't leak details about your code to all the way to the public client. And number two, catch errors also ensures that every response, every request all the way through and then the response has a transaction ID on it so that being the leftmost middleware in that pipeline means it's the first thing there and the last thing it touches on the way out means that it can just wrap everything up and put a nice neat bow on it. The other one is Gatekeeper, which basically ensures that there are certain pieces of middleware that need to be in certain order. For example, you want the auth to be in a certain position relative to large object manifest to make sure that that works properly. And the Gatekeeper makes those things. You want to make sure, Gatekeeper makes sure that, say, catch errors is in the pipeline. Things like that. If you wanted to take an example of a piece of middleware that exists inside of Swift, it's very simple and it's like, that's a really good place to start. Health check is perfect. It's really simple. It's really easy to digest. It's much simpler than, say, the logging one. That one's tricky. And it's a really great place to get started just like, oh, I can do this. And in fact, how easy is it to do? Christian, one of our core devs, gave a talk at the last summit in Atlanta and on stage he wrote a piece of middleware. And you can watch the video of him doing that. And his middleware that he wrote there was preview images. So it's kind of like the resizing images, kind of like the Wikipedia model. But obviously, a little more limited because he was just doing it from stage. But yeah, he was going through and saying, yeah, this is what you do first. This is how we configure it. This is how we install it. This is how we deploy it. And then put it all together, and now we're running it, and it works. So it's a very simple... To get the basic functionality, it's very, very simple to do. And you can do it very quickly. So it's a great place for extensibility in the slide. So remember the basic design. We've got proxy servers, we've got storage nodes. Now, here's the cool part. The cool part is you've got middleware in both proxy servers and in the storage nodes. Which means that if you have something that needs to modify requests and responses, dealing with the API or some sort of wrapping up of that that has to do with the whole system, put it in the proxy server. If you need to do something that is good to be done, distributed throughout the entire cluster and is dealing with the persisting of data, well, then you can write middleware to do on the storage nodes themselves and do it that way. So one example of this would be that you need strong encryption on your data. One possible way to do this would be, not the only, not always the best, but one possible way would say, every time I get to a replica, so three times for every piece of data, I'll put some encryption middleware on the object storage node. Yes, it's doing it three times, so it's also going to be distributed throughout your entire cluster. And your proxy servers probably have less proxy servers. There's obviously lots of caveats and things like that. So only do that if you're doing ROT-13 encryption. Okay, so that's middlewares. Let's talk about the volume abstractions. We have something inside of Swift. We've always had this inside of Swift. Since the very beginning days, there's been a class called diskfile. I mean, the Python class called diskfile. Diskfile is how the object server, specifically the piece of Swift that's responsible for persisting object data to disk, talks to that particular storage volume. Normally a storage volume is equated one-to-one with a hard drive. But when you start talking about diskfiles, that's when it gets a little bit fuzzy. So, I showed you that design flow, so let's zoom in just on that storage node. The storage node is talking to the hard drives, and it's a little bit different than this, because actually you've got that diskfile right there. So the object server has a diskfile instance, which is what it uses to talk to the individual hard drives. Now, the simplest way to think about the volume abstraction inside of Swift and being able to say that this is an important point of extensibility is you can buy a hard drive from any company you want, and that shows that Swift is plug-able as far as the hard drives go. Which is completely true, and it is in fact a way that a lot of people are able to customize things, because Swift expects to talk to a storage volume, and oftentimes it's a local file system there. And the diskfile, in reality, is just speaking POSIX to what it assumes is a local XFS volume. Now, what happens if you change that? Instead of just saying, I'm going to buy HGST instead of Seagate, or Seagate instead of HGST, let's add some more functionality there. So this abstraction, you can write your own abstraction that is responsible for talking to that storage volume, and that's where some very interesting things come into play. And these are people I know of in the community already who have been using this level of abstraction within Swift already existing. So I mean, I know there's going to be more people in the future, but even just starting today, ZeroVM is interesting because in addition to, I mentioned the IBM Docker middleware sort of things, ZeroVM has some middleware that they're using for their implementation of moving compute to storage. But they're also using diskfiles to provide that sandboxing of the compute with compute process to execute right next to the storage. So that's one kind of way that they're doing that. Seagate is interesting because it is a completely not POSIX sort of thing. They have that new storage platform called Kinetic, and it doesn't speak POSIX. It's a key value API that's on the other side of a network rather than a locally attached thing that has a file system on it. So there's a diskfile implementation that Swift Stack has written that speaks that Kinetic protocol. So you can run Swift on top of Kinetic Drives. HP has been using something that is integrated with their store once line, I believe, something that they've been playing with to basically looking at that a little more traditional but kind of integrated hardware software product, and they can now run, they can use Swift with that to talk to those kind of more traditional storage systems. And then there are other storage systems out there who have looked at this and have said we want to be able to have the Swift API because OpenStack is awesome, and we want to be able to talk to our own particular storage system. Now remember, Swift does not provision storage systems. You're not going to go ask, make an API call to Swift to get a scality or sep for Gluster Cluster. What you can do with the implementations that these guys have been working on through various states of readiness is to say I'm running Swift and I'm running actually Swift code, the code that is part of the OpenStack governance model. I'll come back to that. And I will be able to use that to talk to this other system. So those are the kind of things that are very interesting and it's not entirely surprising that this has happened in the ecosystem, but I have to admit it's not something I completely expected when we first started working on this sort of thing because in my mind Swift is always designed for commoditizing the hardware and abstracting that away from the data that stores so they can be swapped out and you can have this nice durable scalable system. What's really interesting and what's really exciting is seeing how much is happening in the community that are taking it above and beyond that places we didn't even really know that it would be going to see that sort of thing. One of the things along those lines that I'm not aware of any work today on but something I do want to, with an eye to the future, look at is figuring out how to have, for example, some Flash optimized disk files. So the object server doesn't need to be concerned about what the data looks like or how that is actually laid down on disk, but yeah, the Flash has different characteristics than spinning media and most people use spinning media for Swift because it's cheaper than Flash. So what if we could do something that gives you, especially now that we have storage policies in Swift that allows you to say this kind of data is going to be here and this kind of data is going to be here and separating hardware. What if we said we have a storage policy that's Flash based and that gives us a certain SLA and cost model and things like that. Let's have a disk file that is optimized for that. I think that would be kind of cool. Again, I think this is something that we'll be talking about next year in the community and kind of after our racial codes are done. So the final thing I want to talk about here and I think it looks like we're going to have plenty of time for questions. The final thing I want to talk about here is another topic that's been kind of big in OpenStack recently and it's this thing that the board's been working on called Defcore. Without going into a lot of backstory or history on this, basically the point is to within OpenStack, how do we designate extensibility within that? So with that kind of picture on things, yeah, there's a lot of trademark and concerns and other issues that are tied up in it. But basically, if you're saying you're using OpenStack, what are you saying you're using and then that also is a very strong implication if it is not explicit to say that, oh, yeah, here's the places where you can extend things. So while I'm going to put this next with the words up on stage and they're completely not official at all. This is an ongoing process but kind of the latest iteration of what we've been working on through that specifically with Swift is, yeah, if it's going to be called Swift, if it's going to be the Swift definition, remember, not official, not finalized at all, it's basically you're going to use the proxies and storage nodes and you're going to be able to add your own middleware and you're going to be able to implement your own disk file and use that if you want to. And that's kind of the general rule. Now, again, I know I'm putting this on stage and it's recording for posterity and all that kind of thing, but this is an ongoing process. OpenStack is working together in the community. So this is basically the thing. This is the areas of extensibility that we have that mean that we can both continue to be a really awesome implementation of object storage that provides scale and durability and high availability and massive concurrency and that sort of thing. So those are awesome things, great for applications but we also know that we can integrate with both forward-looking things but also we can provide a really great bridge for people who need to have a way to migrate off of more traditional things. So unfortunately we can't just change the world and start all over from scratch. We have to look to the past, build a bridge from there so we can get to the future and that's these areas of extensibility inside of Swift. With the middleware, with the disk file, volume abstraction are where we can do that and how we can do that. So that being said, you've got this nice little composable set of things that you can put together and so the idea is that's great and you've got Swift as a modular extensible object storage system so the question then to you is what are you going to build? What questions do you have? I think we've got a little bit of time. What kind of questions do you have? Just a little bit, five minutes. It is 5.52 on the last day of the conference. Design Summit stuff is tomorrow. One thing I do want to specifically point out if you'd like to more information about running Swift and using it and playing around with it, understanding how it works, how do you actually deploy it. There's a workshop tomorrow in the Hyatt Hotel put on by Swiftstack that is a really great overview of a lot of technical details about that. You have a question right there. The lights are right in my eyes. Hey John, I'm Lee Calco from Cisco. New to Swift, but was intrigued by your description or by the kind of on stage brainstorming you were doing around new use cases for I guess an SSD, a solid state specific disk file. Did you have specific use cases in mind for how it would behave, how disk file would speak to solid state differently? So the basic idea is that you've got different, even if you look at where flash and rotating media are today, the read and write characteristics are very different. Just as far as the physical durability, you can only read and write a certain time, overwrite a certain amount of time on the flash and the way that everything is grouped. It's not sectors and spindles and things like that. It's more of a direct access and blocks of stuff in the flash. And I'll be completely honest, I don't have a deep expertise in that world. So even looking there and then looking forward on some other things that people are talking about, about the single managed recording drives, which is completely different than the way flash would done. And then looking at the trends of stuff, you've got yes, spinning drives are getting cheaper and bigger, but flash is getting much cheaper quickly. At some point, looking to the future, I want to be ready when there... I've already had employers come to me and say, I want to not have any moving parts in my data center. That includes spindles, everything they're doing. The density and the power savings you can get on certain things today and especially looking a few years down the road. Basically, I want to be ready for that. And so I don't know exactly what it is. I want to work with flash vendors and people who are working in the OpenStack community to say, this is awesome. So the use cases in my mind, I'm sure will be, well, are very much similar to what we're doing now, except you're going to get different characteristics, different... I mean, it's going to look a little different. It's going to behave a little differently. I don't know exactly what those are going to be, but it is the future, so let's get out of the way of the train and ride it instead of getting smashed by it. Yes. Is there any community place to discover out-of-tree middleware? Not really, unfortunately. So, well, okay. Yes. There is a place inside of the developer documentation that we keep track of, like, just associated projects. And we have references to lots of other projects there. You can get to that at swift.openstock.org. If that needs to be updated, the problem is it's updated with a code patch. So you're free to do that. It's very simple to do. It's very simple format or just tell us and somebody will do it and it's very fast to do, but it does require that. There is also some things like the Swift 3 middleware that is managed inside of Stackforge. That's not a great fit for everybody. So there might be some stuff there. And other than that, my experience thus far is that most people who are doing this sort of thing are rather proud of it and right to talk about it a lot. So keep a Twitter search running for OpenStack Swift, and yeah, you'll find it. Any other questions? So workshop tomorrow in the Hyatt. And then the design, the design summit is ongoing. Our Swift design sessions are tomorrow, starting around noon for the contributors. And then we've got kind of a morning on Friday for that. And other than that, thank you very much for coming. And I'll be around for briefly for any questions. Thank you.