 Good afternoon everyone. I'm Tim Burke. I'm with Swiftstack. I'm a core developer for Swift and Swift 3. And today I'm talking to you about using Swift as an S3 endpoint. Now you might be asking, why do that? Swift has an API. It's a great API. I love Swift's API. You can reach in and grab individual segments out of a large object without needing to mess with range requests. You can aggregate logs dynamically based on day or month using dynamic large objects. Swift's API is great. I love it. But you may have already invested in S3 and suddenly realize that, hey, wait, I need to start moving some of my workflows, my applications back into my data center. Maybe this is because of cost. Maybe it's because you want to reduce latency. Or alternatively, maybe you're working on a hybrid cloud strategy and you want to be able to have a single API that you can use, whether you're working on your internal network or bursting into the public cloud. And of course, it's always useful to have a ready to spin up test environment when you're developing that application, that workflow, such as my laptop, which is currently running Swift with Swift 3. We're going to have a nice demo later on. Now, separately, I have another reason to want to run Swift 3 and develop it, hack on it. And it goes even beyond just trying to make sure that my customers are successful. I do it because it makes Swift itself better. There have been several occasions where I noticed some opportunities to increase concurrency, whether that's with the bulk deleter, with SLO uploads, or I noticed that, hey, wait, Swift client needs to buffer its reads from disk to be able to match the throughput that we're seeing to the same cluster, but with an S3 client. And it goes on and on. There's just so much great stuff to developing a separate object store on top of Swift. So we're going to look at the comparison between Swift and S3. We'll look at an example server config to enable it. Client config, in particular for AWS's own client and Bado 3. I have a nice little demo and I'll cover a few caveats around using Swift 3 because you know that there have to be a few catches. And finally, I'll try to close with some future directions for what I want to do next. So in S3, top level, you've got your buckets. These map pretty much directly to containers in Swift. There are three major distinctions, though, between the two. In S3, there is a global bucket namespace. So if I create a bucket name music, that's it. Nobody else can have a bucket name music. Whereas with Swift, that's not the highest level. Above that is an account and each tenant gets their own account. So within that, you can create your own containers willy-nilly. You can have music. I can have music. Those two containers are completely separate. Second major distinction is S3 only lets you have a certain number of buckets. I believe 100 is the default. Swift has no such practical limitation. I suppose if you start getting into millions, tens of millions, you're probably going to hit the same sort of problems that Matt's solving with container sharding. But to my knowledge, no customer has ever asked about that. And third, what was that? Yeah, just two. So, oh, no. The third one, thank you, Matt. That was exactly the person that needed to call that out. So with Amazon, buckets are effectively infinitely deep. You can just keep throwing objects in there, knock yourself out. It's fine. But with Swift, there is going to be a limit. After about 10 million, 100 million, you're going to see significant performance problems. I hear reports at a billion of just having container DBs lock up. And it's problematic. So within a bucket, you've got objects. And fortunately, names even match between the two. Swift object, S3 object, basically, same idea. You can attach custom metadata. You can set expirations. It's great. They have the same five gig max upload size. So to have larger objects, you use a multi-part upload in S3. And now the analog of that in Swift is static large objects. They offer nice consistency guarantees. Make sure that what you are reading back out is what you originally put. And both S3 and Swift have ways to generate a shareable URL that will expire after a set amount of time. The exact signature algorithms differ between the two, but very similar concepts. So let's see how we configure this. Like most client-facing features of Swift, you go ahead and change your proxy server config. Just drop the Swift 3 middleware into your pipeline. And all the configuration options are just like you've come to expect from Swift. There are three that I'd like to call out. Most of the rest have to do with tuning limits and the like. First is location. This is reported in the bucket location API. But the more important reason to note this is that clients need it if they're going to be using v4 signatures. That gets baked into the signature. And if you have a different region for your client than the location that we set globally in Swift, you're going to get 403s. That's usually the first place that I look if I hear reports of access denied. Second is storage domain. You can use this to have subdomain-style access. So instead of having the bucket in the path of your URL, you can put it in the front of the host name. Third, we've got forced Swift request proxy log turned on for this demo just to see not only the client request coming in, but also how that gets translated into a Swift request on the back end. On the client side, it's kind of hard to give specific detail, recommendations for the wide variety of clients that S3 supports or rather that support S3. But four common features are needing to provide the access key ID, which you might recognize as a classic Tempoth user. Secret access key, which again Tempoth password in this case. Region, again matching our server config and an endpoint URL so that we don't just go talking to Amazon. To do this demonstration, I'm using a pretty standard Swift all-in-one. I've got one extra patch layered on top to get v4 signature support for Tempoth. I need to bug some people in this very room about landing that. And I've got a patchy reverse proxying to event lit so I can have the client pointed at port 80. Some clients have issues talking to non-standard ports. Additionally, Amazon makes it difficult to talk to a different endpoint. They do provide a command line option, but because I don't want to have to type that in all the time, I found a AWS CLI plugin to support alternate endpoints as you had seen in the previous config. And all of the scripts I'm going to be running are available up on GitHub. I'll be sure to add a tag for this is the very set. So fingers crossed. First, we'll just start tailing the proxy logs that we see requests as they come in. I've got an echo and sleep in there so we get some new lines like that every so often. And let's just start with Amazon's own CLI. Works great. You see, we set a default profile so we don't use my ordinary Amazon creds. List all the buckets. There aren't any good. I cleaned up after myself. Create a bucket. List and see that, yep, got a bucket now. Upload a dummy file. And see that, yep, appears in container listings. Hello world. And then clean up after ourselves. Now, this was with v2 signatures. Same thing works with v4. And even you can have it go with pre-signed URLs as the command line client, which is a little curious, but hey, why not? Now, over on the proxy, you see a whole bunch of requests have flown by. Let's slow that down a little and see it go with an interactive Python session. Got our standard imports. Instantiated session so that we pull in all that profile information. Unfortunately, with Boto 3, I didn't find a good way to have the endpoint read from config. So instead, you have to specify it manually. But that's a fairly minor change in scope of trying to much better than trying to rewrite your entire application to talk Swift instead of S3. We can list the buckets. And you see that the S3 request here got translated into Swift request there. So get on slash. Just becomes an account get. Go ahead and create a bucket. Again, we see the S3 versus back-end request and confirm that, yep, shows up in our listings. So Boto has a default of 8 megs as the point at which it starts switching between an ordinary upload and a multi-part upload. Just to show that multi-part uploads work great, let's upload 16 meg. Sure. See a flurry of requests. It's a little harder to pick apart the S3 versus Swift requests. But there's the finalization of the multi-part upload, some bookkeeping, the SLO put, yeah, individual segments getting uploaded, all of that. We see that it shows up in the listings for the bucket. And if we read it back, we see that we got the full 16 meg works just as you would expect. I see clays checking the math on this one, yeah. And go ahead and clean up after ourselves. Confirm. Good. So there are a few caveats with using Swift 3 to emulate S3. It's not going to entirely match all of the features in S3, to be sure. There are a lot of them and there are only two of us core reviewers. In particular, though, some of the things that we do support, such as object-level ACLs and honest S3-style bucket ACLs, are going to incur some performance problems as you need to head each object before you do the get, so you verify the client was actually authorized to read it. Kind of at odds with that is if you want to support bimodal access, both Swift and S3 clients can look at the same containers, you're going to have to play a bit of a balancing game because the S3 client isn't going to have entirely the experience that they expect if they want some advanced features such as object ACLs because as much as possible, Swift 3 will just get out of the way when a Swift request comes in. So even if you set that, oh, this particular object should be private, if a Swift request comes in for that request, for that object, it will use the ordinary Swift mechanisms, which will just go to container ACLs. Next, if you're using Keystone as your auth service, only Keystone and the end client know the signing keys. The Swift proxies never see it. This is good for security, but it means that every client request has to go through Keystone to be validated. This is better than it used to be where we used to have two Keystone requests for every client request, but there isn't a good way around that currently. And finally, V4 signatures are great. That's definitely the direction that clients are typically going, but they can be a bit finicky. You need to know that the headers you receive as Swift are exactly the headers that the client had sent. And I've encountered issues with running Swift behind nginx, say, where expect 100 continue headers get dropped, or as I had said with the port numbers, I had a client that would double up port numbers in their signature. If you're running on non-standard ports, and it just gets to be a little problematic sometimes. Be careful. As far as future directions, there is a great outreach intern that had been working on adding support for bucket versioning. This has been kind of a long time coming. This is one of the features that I wanted to get support for and needed to push some upstream changes to Swift so that we have all the infrastructure to try to make it work. I want to improve multi-delete performance, because currently we delete everything serially. I've already done the work of adding some concurrency for Swift. I just need to pull it back into Swift 3 as well. And the last two are a little more pie in the sky, but I'm hopeful. I've got a bit of a plan for a global bucket registry so that you can support anonymous access through the S3 API, which would be great, but it's complicated. And I actually have a proof of concept for an STS secure token service like endpoint. So you can issue temporary credentials to that you can then hand off to applications so that they can use them and then they'll expire and the applications themselves never actually know your password. So I just wanted to thank Kota in particular for being a great core reviewer. Thank you for keeping me honest. Thanks to Andre for driving the V4 signature support and thanks to Karen for being a great outreach intern and working on versioning. Go, guys. Any questions? You talked about some of the caveats or compromises that you had to make during bimodal when you're using both the Swift API and the S3 API. Do you have any idea what you're feeling? How common that is for people that do they, like I've heard a lot of times where the reason that you want S3 support is because this application only knows how to talk to S3, so we use Swift 3 for that app and then everyone else uses their own accounts with the Swift API. Do a lot of people go back and forth or they migrate? I have no idea. It's something that I would like to get more feedback on. We have users that can only work in Verizon. Swift provides a nice user interface. You can look at the containers. And then we have traditional S3 support for the rest of the users and they're on the command line and they want to go to both Amazon and to the Swift stacker. We actually run S3 as well. So we've got multiple tiers and we need to be able to shuffle things between all of them. So I guess the user basically have the partition into those that can only use the web browser and those that are comfortable on the command line. And those that are comfortable on the command line also have legacy S3 buckets. They're using Amazon S3. They're using Ceph S3. And now they need Swift as well because it's a new tier of storage. And so everything's got to move between the three layers. Timmer and Joe's presentation from yesterday might be handy for that. They had a presentation on syncing between Amazon and Swift. But it sounds like the users themselves are kind of partitioned between, okay, either we want Swift or we want S3. Still? Yes. Now, for a lot of reasons that you mentioned, we like Swift. I mean, we would prefer to have things to be more visible. ACLs are another big hang up on S3 is that you have to apply the ACL all the way down to the object. In the Swift case, your container is either open or private. And that just makes it a whole lot more user-friendly. We do have a use case, too, for having the per object ACLs. So it would be nice to have full compatibility between the two APIs. I've debated about trying to plumb off details all the way down to the object server, but it gets ugly. Yeah. So besides the traditional request like putting, getting objects and all that comes with that, did you ever have requests for supporting other S3 features, for example? In Swift itself, we have expiring objects, for example. In S3, you can define a policy that basically also expires objects. And as far as I remember, you can do that also using the API. Were there ever requests to support that? So I've not looked into supporting it yet. I'm willing to bet the customers have asked about it. I'm sure Joe could give me some confirmation there. Yeah. But yeah, the policy documents, as I recall, are kind of complicated. Yeah. Okay. Trying to expire it to another storage tier or something would be interesting, but hard. So, Tim, what about customers that aren't running in a keystone environment, and how is authentication handled in those cases? Short. So like the rest of Swift, there's a, you can use any off middleware you'd like. Many are styled after temp off, like swath or our own Swift stack off. And you, for middlewares like that, where you have the secrets in memory in the proxy server, you can add v4 signature support very easily because we've added some hooks in Swift 3 itself to help out there. Well, thank you all for coming.