 Check, check. Hi, everybody. We're ready for the next presentation here in the demo theater. So we'll start here in the next half a minute or so. For those of you that are interested in object storage in the open stack environment, please come and join us. OK. So maybe a quick just show of hands from people. Who knows about object storage? A few of you? OK. So what we're going to talk about here is the need for big scale out storage in the open stack environment. And I think most of you that have followed OpenStack know that there are a variety of storage services for OpenStack, including an object storage service. It's called Swift. It's perfect for storing big payload data, potentially hundreds of terabytes or petabytes of storage. However, there is a trend in the market now that we need to take a look at. And that's the emergence of Amazon's S3 protocol as a very powerful alternative for building this class of application and storing large scale data. So that's what my talk is about here. OK. So there's a few things we want to kind of chat about as a precursor to the actual capabilities of S3. And that is we're building these new data centers. They're centered around the cloud. We want a lot of flexibility and agility what's the right way to build storage for these environments. So maybe just a couple of comments on that. There is also a trend certainly towards software. We all know that software is kind of the paradigm that's eating the world, so to speak. And that's true also in storage. So you'll hear the term SDS or software defined storage as the way to build these systems within the cloud. And then we'll take a quick look at how the evolution of object storage has happened in the last few years. Swift and S3, and that will kind of be the heart of what we talk about as some of the advanced features that are available in either protocol. OK. So certainly those of us that are building clouds are aware of the fact that this creates new challenges. We now have OpenStack as a common framework. Of course, the goal intent of OpenStack is to host applications. Without the applications and our end users, there's really no point to all of this. The real thing that happens when you start creating cloud environments, though, is you create an environment where you get a lot of diversity in applications, in workloads. You have new challenges in managing your network and in managing the environment overall. But one of the problems that we're focused on is how do you manage all the storage that results from a very dynamic and agile cloud environment. Certainly we've seen in the work that we've done with our customers that there's a tendency to start provisioning lots and lots of different VMs, lots of workloads. That creates a storage management problem. You start having the need for scalable storage for the VMs themselves. You have file storage. You have object storage. The net net is there's just a huge and growing problem in managing all this. And moreover, if you're starting to manage things like images and videos and big archives and backups, you're pretty quickly going to see a problem where you have hundreds of terabytes or petabytes to manage. So that's the reason for the advent in storage protocols like Swift within the OpenStack environment, but also in other protocols like Amazon's S3. So the key thing that we recognize here is that we need a very powerful and advanced set of functionality to deal with this type of data. So the model for storing all this now is software. There's a number of different approaches to the problem in the market. But certainly distributed storage seems to be the model that everybody's embracing. So what does this mean? First, it means that I need the ability to scale capacity as I see the requirements increase. So I may be starting with hundreds of terabytes, but I see the horizon coming into petabytes. So I want to be able to scale out the number of services that applications can talk to. These may be object storage, file storage, block storage, and certainly a choice of different protocols, things like Swift and Manila in the file world for OpenStack. I also need the ability to scale out the capacity. So I want multi-dimensional services on all of that. But the heart of what I want in software-defined storage is a set of intelligent services, kind of the control plane side of this. So I certainly want data protection at scale. And this is very different from what we did in the past with RAID-based systems, where we had maybe a few tens of disk drives to manage. Now we're talking about managing hundreds and thousands of disk drives. So managing these things where I have replication, I have erasure coding, but the system is managing the fact that components may be failing. And it just deals with that in the normal cloud model where it routes around the failure conditions and can self-heal them. I want to be able to grow without disruption. If I add capacity or I add additional services for storage, I want to do that without any service disruption. And then I, of course, want to reap the benefits of frequent advancements. So that's really one of the key advantages to decoupling this functionality in software. I don't want to be tied to the cycles of hardware-based appliances or arrays to depend on new functionality coming my way. And the final thing we want is that we want to manage all this via common frameworks. I do want to use OpenStack, but I also want to manage things via Nagios and many other open source environments. So what that really means is that the system needs to export its monitoring management capabilities via APIs. That's more important from an orchestration and management perspective than having a closed UI. OK, so with that in mind, then let's think a little bit about the different storage that you might see in OpenStack. You certainly see a range. So we see the need to store things like operating system images. But I also have my application stacks. I have my common stacks for databases and for other application components that I want to assemble together. But beyond that, the capacity starts to grow, right? Document repositories, web servers, leading all the way up to the new media content that we see with video and image data. I'm not sure if people have tracked this, but video data is getting huge, of course, with 4K. It can be tens of megabytes per frame now to store a 4K video. So imagine having to keep 24 frames per second online. You certainly quickly get into the petabyte scale with that. So OpenStack does have solutions for all of this. Certainly, we can put the high performance tier one operating system stuff on Flash, on local storage. I have Glance as a service with an OpenStack to store my images and keep a repository. And now I have a choice. I can use Cinder or I can use Manila to create shared file services. And then ultimately, for the very big scale stuff, I do want to start taking a look at Swift and as an alternative the S3 API. So there is interoperability now with S3 and Swift within OpenStack. So I think that becomes an interesting decision to think about whether we need the advanced services that are available to us in one of the protocols or the other. OK, so with that in mind, how do we talk to object storage? So object storage, what is an object storage API? It's essentially a simplified way to address storage through a key value store. I have a key, which is an idea of an object, and I have the payload. That's the value. And I want to map these two together. And essentially, the application wants to be able to put payloads and to get them through these handles. So in a very simple way of speaking, that's the idea. I also want to be able to associate metadata with the data, things like my attributes, like if I have a video file, who created it, what's the author name, et cetera. But all of this gets presented as a flat namespace. There isn't the concept of folders and subfolders as we would have in a traditional file system. So if you look back, this all started maybe 15 years ago. There was what people called the object storage 1.0. And that really was the world of what people called content-addressed storage, or CAS. This really came about with products like EMC Centera, which were purpose built for storing data for a long period of time, immutable data. So we would store an object, associate a CAS signature with it, but we never wanted to modify it. So that was a little bit of a vendor-specific or proprietary way of doing things. That emerged into object storage 2.0 with a bunch more vendor-specific object storage APIs. So they came about from the big vendors and the little vendors. But notably, this was also the time that Amazon S3 launched. So this was in the 2006, 2007 time frame. It was a very basic API at the time. It had the ability to deal with resources, like buckets and objects. Very simple put, get, delete, style, verbs. But they were functional, and it started the process of everybody thinking about using object storage in mass. However, there was a problem. All of these were vendor-specific. So that's when the notion started of standardizing some of these API efforts. And there were, of course, two that came about that the industry started looking at. One of them was SNEA, the Storage Networking Industry Association, came up with their own called CDMI. It is a REST dialect, so it has the concept of doing key value storage. But of course, this was also the time that people were starting to talk about OpenStack and RackSpace was innovating with CloudFiles, which launched the Swift initiative. So this was around the 2010 time frame. We do have to notice now, though, that there is another big force. And that is that Amazon has invested a lot in S3 in the last few years. And while it's not a standard API, because it's vendor-specific, it is kind of becoming a de facto standard. And so what do we mean by de facto? What we mean is that there's a lot of usage from it, both from ISVs and from application developers. And a large part of that seems to be driven by the fact that it just has a very, very rich set of functionality, as you can see here. So it is a big growing factor. We do see a lot of specifically ISV, independent software vendors jumping on the S3 specification. As an example, big vendors like Veritas with NetBackup. That traditionally wasn't a product that would use object storage on the back end, and now it does. Commvault is another vendor. There are many, many of these. By some accounts, there are now 4,000 ISV applications embracing the S3 API on the back end. We've also seen the developer community really swell. This is a big revenue center for Amazon itself now. We ourselves saw it with 20,000 attendees at the re-invent conference. So I think we can now say that despite the fact that it's single vendor driven, it's certainly emerging as a de facto API specification. So what are the differences between Swift and S3? And if you start looking at it, I think the difference is becoming smaller, right? So Amazon is innovating, they're doing it very rapidly. They can do it fast because it's really themselves, right? They're sort of the ones that make their own decisions. But there's a lot of advanced functionality in Swift that now maps to S3. I think there's a few things that we wanna look at kind of in a little bit more detail. And those are on the security side. Amazon does have a very rich model now for identity and access management. Kind of the counterpart of what OpenStack does in Swift. But they do go a little bit further now with things like encryption. So we'll talk about that. A lot of customers are also interested in managing their data over its life cycle, right? So data is hot for some period of time after it's created, but over time it's less and less hot perhaps, right? So how do we manage that life cycle, okay? And then the third thing we do wanna take a look at is a little bit about things like versioning. How do they manage versioning in S3, okay? So let's start with kind of the security side. So I mentioned that Amazon is making innovations. They now have this notion of identity and access management. There are three ways to manage access control. One is the traditional kind of ACL model. So within what S3 calls buckets, which is the analogy of Swift's containers, there is the older style of doing access control lists, ACLs on buckets and objects. The newer paradigm is to actually create policies. You can put a policy on a user or on a group and essentially guard what that principle is able to access. So what resources does he have access to and what type of access, whether it's full access or read-only. Those are the kind of policies that span services so they can go across storage and compute. However, there is the notion of just a policy on an S3 bucket, okay, which gives me the ability to do very fine grained access control. Okay, on the further more on the security side, there is an object encryption specification now in S3. So this is something that we see a pretty big demand for, right, as people roll out cloud environments to deal with things like financial data or healthcare data or perhaps in government where they have very specialized requirements, encrypting the payloads. The objects within the containers becomes very, very important. And usually the discussion becomes one of how do you want to encrypt? Do you want to use specialized drives like encrypted drives or something in the hardware array? And the answer is very clearly no, right? They want this to be software driven, something in the software layer that's managing everything. Amazon has two ways of doing this or one way of doing this now but there seems to be another way of merging. The first is that there's an application API, okay? They call it SSE or server-side encryption. The idea is that on calls to store the data, put calls and calls to retrieve the data, there are specialized headers that instruct the application and the backend to actually do the encrypt and decrypt at the right time. So that's a good model because it gives me lots of control. It says I can encrypt certain objects and not encrypt others. On the other hand, it forces application level changes. So there is an impact on the code. The one that a lot of the customers that we're talking to are also interested in is one that's more transparent. It says at the container level or at the bucket level, I'd like to just turn it on as a policy, okay? So if I turn it on, all the objects that go in get encrypted. If I don't turn it on, that object is not encrypted. This is not in either the S3 or the Swift spec today. It's something that could be implemented fairly straightforwardly and in fact could leverage the same model of key management as the SSE API does which is that the customer really provides the encryption keys, okay? And then finally, this concept of lifecycle management is certainly a very big topic that people seem to really embrace. So application data clearly has different value over time, right? As I created, I need to access it quickly. It needs to be low latency. It needs to be fast. I want to have access to it really quickly. So that type of data may justify putting it on something rather expensive, right? A high end array or maybe a flash pool of storage. As it ages, I want to be able to think about putting on something that's more capacity optimized. Something with bigger media, bigger disks, something that has the right economics to keep data on for a longer period of time. And then of course we have this growing notion of cold storage, right? Something that lives further away. It's slower to access, but there's a huge economic advantage. It's certainly durable, but maybe if I need to access it, the access times are a little bit longer. Within the S3 API, there's now something called bucket lifecycle which supports this kind of capability. It's essentially a way of saying this bucket and its objects within it, I can apply policies to that deal with its lifecycle. For example, put a date on the data and say I'd like to expire it at that point in time or I'd like to transition it. And the transition rule can be to move those objects from the original bucket to a target bucket which may be in another on-premises storage or it could be in a cloud environment somewhere that supports the API. Okay, so very powerful capabilities and we see the richness in the rules for this growing over time. Okay, so I think that's kind of in a nutshell what we wanted to talk to you about is that S3 is certainly tracking as an advanced set of functional APIs for doing object storage. It does seem to be being embraced by the ISV community and it has lots of advanced features and we certainly see the innovation not stopping. However, Swift is interoperable with the S3 API. So to a large extent applications can choose the right choice and certainly open stack applications can leverage S3 as a vehicle for storing object storage over the long term and at petabyte scale. So if this was of interest to you, please stop by and come see us at Scality. We do support both Swift and S3 in our system which is the Scality ring. We're right down the row here in booth C9 and as well as I'll stick around and take a few questions if anybody has some. Okay, thank you very much.