 I've got to take my volume down a little bit, I think. It gets pretty loud in there. All righty, welcome very much to the final day of OpenStack Summit here in beautiful Barcelona. My name is Ingo Fuchs. I run the cloud marketing group at NetApp. And I'm here today to talk to you about object storage. It's really crowded in this room, I can tell. So if you would like to sit up in front, I think there are some seats available. So again, thank you very much for joining me. I'm going to spend about 40 minutes talking about object storage and how to implement and design a massive multi-geography object repository. Just to give you some ideas on things to consider as you go through the decision-making process of how you should build that object storage repository. And one of the reasons why this is so relevant is that object stores are quite large. So when we look at the number one object store in the world today, it's AWS S3. They started out with 262 billion objects in 2010. They hit the 2 trillion object mark in 2013. And then they stopped disclosing numbers. Maybe they ran out of space in their Excel spreadsheets. I don't know. But it's obviously very, very large. So the idea that an object store can be used to store large amounts of data, I think, has been proven by AWS. And the decision of going into an object storage infrastructure has multiple reasons. One is the internet of things trend, the pervasiveness of data, the data growth. It's really driving the desire to have object repositories that are very, very scalable, and they can span multiple geographies. It also means that application developers are rethinking how they interface with storage. How do they store their data? So in the past, you would either use block storage, or you would use NFS or SIFs. And now you're thinking about, oh, maybe I should use RESTful APIs. Maybe I should use S3 or Swift. I'll talk about that a little bit. One of the questions then is, how do you manage your data? So a lot of application developers think, well, I'm in charge of managing where I'm storing my data, how long I'm keeping it, how many copies I have, how I'm protecting it, how I'm archiving that data, maybe if it gets stale and old. But the thing is, as an application developer, you want to focus on your application. You want to focus on your business. You want to focus on your users. You want to focus on user experience. You don't want to have to worry about all of that stuff that happens in the backend. And that's what an object store can do for you. It can take your data and manage it. All you've got to do is you tag it correctly and you set up some policies. And I'll go through that in a bit as well. And then oftentimes the question comes up, okay, should I use file or should I use object as I'm storing my data? And there are just a few things to consider. One is we're using objects, using RESTful APIs to store it. Very easy to use, using HTTP. Everybody has it. It's very, very common. It's completely independent of operating systems or what programming language you use. And you have the capability of using metadata to identify data. So what you do with that is that you say, okay, here's a video file. Well, I can store metadata. That describes what this is. I can document what project it is, where the shoot happened, what camera I was using, who the director was, who the cameraman was, who the actor might be, if it's a dog or a pony or clouds or whatever else was filmed. So you can use that metadata. And then when you store the metadata, your object repository can say, okay, I got a policy. I know what to do with this. I know where to store it. I know how long to retain it. I know how protected it must be, how many copies I need to store and where those copies must be stored. Also, you have that idea of a global namespace. And that's not a global namespace just in one data center across multiple storage devices. This is a truly global namespace that spans the entire globe. So you might have data that's stored in Europe, data that's stored in Asia, that's stored in North America, and you get a namespace that spans all of these different areas. And then finally, you can do things like versioning. Versioning is the object way of doing data protection, meaning that if you're deleting an object or you're maliciously or accidentally deleting an object, you can go back to a previous version of that object. So even if somebody has the ability to somehow get to that object and delete it, you can still go back to a previous version. Also, inherently, you have the ability to do multi-tenancy. So if you have multiple tenants in your object store, you have the ability for every one of those tenants to have their own access keys, to give themselves their own privileges that only apply to the data that they have access to. And if you're using S3 versus Swift, so if you're using S3, you're also able to delegate access to your object. So if you have data sets, you wanna give access to those to another tenant in your environment, you can do that very, very easily, specifically when it's read-only. You can only do things like warm light. This is not a traditional warm environment like you would know from the archiving world, but rather an environment where even if it's still, somebody has administrative access and might still be able to destroy data, the user can't. So you have the ability to give somebody that read-only type access without giving them the privileges necessary to change or delete any information. And keep in mind in an object store that objects are immutable. So once they're created, you cannot change them. So you can actually meet a lot of these warm and archiving and compliance requirements with an object store. So Swift is a little bit different, right? So you probably know S3 is a dominant object protocol out there. Swift is, you know, here in the OpenStack community quite popular. There is actually an S3 emulation layer in OpenStack so you can use S3 as well. But S3 and Swift work in different ways. And in a lot of ways, S3 is a more complete interface than Swift. So you've got to make a careful decision at some point if you're using S3 or Swift. And because of the dominance of S3, actually what we're seeing is that most people deploy S3. But you can do certain things in Swift as well. There's the same multi-tenancy concept that you have, but you're losing certain things like the ability to provide another tenant with access to your data. That's something that Swift today doesn't support. So you've got to think about that. Talking about Swift, the other things I want to point out is is that with Swift, you've got to separate between the implementation stack and the API. So the Swift API today is separate from the implementation stack. And that's why we as NetApp we can provide Swift compatibility through the Swift API without having to resort to the implementation stack. So when you're thinking about an object store, there's some things to consider. One is, of course, compatibility. So do you need S3? Do you need Swift? Do you want to orchestrate with heat? Do you want to deploy using Docker? Things like that. You want to make sure that the product that you're choosing can be managed completely through APIs. Because yes, the Graphical User Interface is important, but the fact is that most people want to control a large distributed object store through APIs makes it easier, especially if you're a service provider or a large enterprise. You also want to make sure that you have a very strong policy engine, and that's key. So if you're thinking about storing billions and billions of objects, maybe across multiple data centers around the world, how do you manage where that data's being stored? And you know, the initial storage might be easy, but what if after 10 years things change? We give you an example, Brexit, right? So the UK leaving the European Union, suddenly that has implications to where people are storing data. Now, if you have a policy engine that's done right, you can just go to the policy engine and say, okay, I have a policy here that used to store data in the United Kingdom. I can just change that and say, you know what, instead of United Kingdom, store it in Italy, or while we're in Spain, so let's store it in Barcelona. So all you have to do with storage grid, our product is that you just go in there, you change your policy, you point it in different direction, and off it goes. And it moves the data for you, you don't have to do anything. So even after your data has been stored and has been sitting there for years, you can still go back and just change it. The other thing is legacy applications. So what do you do about those applications that are using traditional file interfaces like NFS and SIFs? Is that compatible to require that? Well, obviously if you don't have legacy application, that's fine, but if you do, you might want to think about that. The other one is integrity checks. So if you're thinking about, many people use object stores to store data for decades. So if data is sitting there for decades in this repository, how do you ensure that the data is always correct and intact and hasn't been changed, either accidentally or otherwise? You know, bits change, cosmic radiation, whatever you want to blame for it, things change. So the ability to have a system that goes back and routinely checks if the data is intact and repairs it if something is broken is very important. So now the consequences of that is if you have very large objects, think about a multi-terabyte video file. If you're retrieving this multi-terabyte video file and after a terabyte and a half, suddenly something is broken. How does the system deal with that? So what we do is we stream the object. So that means that we're streaming the first terabyte and a half and then when we detect some kind of corruption, some kind of problem, we just stream it from somewhere else. We just continue going. The application never knows. A lot of object stores use store and forward. What that means is that they got to retrieve the entire object first, then check that it's intact and then hand it over to the application. So there's a big difference here. If you're streaming immediately and fixing problems while you're streaming, or if you have to retrieve the entire file, the entire object first, and only then start fixing and maybe retrieving it from a different source. Scale, and this is a funny topic because I always love it, the marketing claims from people that say, oh, we're unlimited or we're storing trillions of objects. I like to ask, so how many trillions of objects have you actually tested? That's another question to ask. If you're considering storing billions and billions of objects, has anybody ever done that before? That's a question you might wanna ask as you're going through this process. Efficiency. This is a topic that comes up quite a bit, is how efficient can you store your data? How can I reduce costs? And one of the ways that people do this is through erasure coding. Erasure coding is essentially taking rate and spreading it across multiple data centers. So that's geo-distributed erasure coding. The reason why you do that is that you don't wanna store full copies of your data in multiple data centers, having three times, four times, five times the capacity needs of your actual data. So you see erasure coding, which maybe you need 50% more or 30% more or 20% more depending on what code you're using. So with some systems like ours, you have the ability to say, you know what, I'm gonna replicate the data. I'm gonna keep multiple full copies of my data as I'm first ingesting it. And then after three months or whatever your timeframe is, then I'm moving that into erasure coding. So I'm reducing capacity, I'm freeing that capacity for new data that's coming in. But when you do that, when you do geo-distributed erasure coding, you also gotta think about what if something breaks? So if you do erasure coding across physical hard drives, what that means is that if a hard drive breaks, you have to recreate the data that was stored on this drive. And if you have to do that over a wide-area network connection, that's gonna take some time, it's gonna take up some bandwidth. You gotta think about how you do that. What we do is we offer a hierarchical erasure coding, which means that we have hardware erasure coding and appliances, and then we do geo-distributed erasure coding across data centers. But the important piece is it's your choice. You decide what you wanna do. You wanna do your application, you wanna do erasure coding, you wanna do hierarchical erasure coding, you choose what you wanna do. But you have to have that option. And then finally, what are the deployment options in your environment? So for this product that you have, do you want software defined? Do you can use wide-box servers? Do you wanna use third-party storage that you might already have? Or do you want to have dedicated engineered appliances, which offer you the highest possible density for your environment? So what's most important? Is it rack space or is it leveraging standard hardware? Ideally, in our case, you can do that. You can mix that. So you might have one data center where you use software defined, you use what you already have, and you might have another data center where you go and just rack and stack some appliances. Here's an example for a customer that has switched to object storage. It's quite an interesting scenario. So this is a customer that runs photo processing. And so essentially if you go to a store to create a picture book, they run the software that is behind the covers and actually enables that service. And they were using file-based storage for a very long time. And they had NAS systems all over the place. They were adding NAS systems every year. They had multiple petabytes of it. And their challenge was that they were just outgrowing their abilities to manage that. So they were storing data about where their files are in huge databases. And ultimately they said, you know what, we just can't deal with this anymore. So they switched to object storage and were able to grow without having to manage that database anymore. And the interesting thing is that they're now able to actually provide different services because now the databases have gotten so much smaller because they're not tracking file locations anymore, that they're now able to interactively search the database and provide results back to their customers. The other thing is, and this is from a different customer, they were asking themselves the question, okay, I know we're going to go object, but what we don't know is are we gonna do it on-premises with the product that we're installing in our data centers that we have? Or are we going to go into the public cloud? And they evaluated the cost of that infrastructure and they determined for them that, yes, in year one, public cloud is cheaper, but after year one, actually, the cost of an on-premises object store is more cost-effective than running it in the public cloud. So that's another consideration as you're, you know, a lot of people start with cloud applications and they roll out these applications in the public cloud, but after some time, hopefully your business is running so well that public cloud becomes too expensive and then you go and you take that on-premises. So the NetApp product is storage-good-webscale. This is the product that we offer and essentially what it is is that virtualization layer between your applications up top, so that's your S3 and Swift and NFS and Syspace applications. And then at the bottom, you got your storage infrastructure where your physical data is stored and that could be on our storage, that could be on other people's storage, that can be on tape, that can be in the public cloud. We can manage that. So we can manage the data placement across whatever storage you might have and even into the public cloud or, you know, for those of you that still have it, tape. So we take a little bit of a deeper look at storage-good. One is what I mentioned before is that global namespace. So the ability to manage your data regardless of where it's physically stored. In this case, Amsterdam, London and New York are three data centers and they're all part of a single namespace. So from an application perspective, it doesn't matter. Now these data centers are just connected using IP. So nothing secret here, no fiber channel, anything like that. Just an IP connection between these data centers. And then we use something that we call an admin node. That's how you administer the system. That's how you set it up. And actually when you first install it, that's the first thing that you install is the admin node. And from then on, all the other nodes, all the other software and the environment just gets installed automatically from that first node. And that's what we call the storage nodes. Those are the systems that are actually storing the data. And then of course, you got to add your applications, right? So you can have S3, you can have Swift. This is another product that we have, Ultimval that's just back up to cloud, including on-premises. You might have your NFS and Swift legacy applications. Now, when you're ingesting data, you're looking at that S3 application up there on the left. As you're ingesting data, that's a put request in RESTful HTTP. So you're storing the data in the object store. Now, as soon as it is in the object store, now it's accessible to all the applications, we just add an application over here in New York, and that application wants to retrieve that object. Well, remember, I was talking about that this is a global object store, it's a global namespace, so it doesn't matter where the data is physically located, it will always be retrieved back from wherever it is stored. It doesn't matter. The application has no awareness over where that data is physically stored, whether it's in the public cloud or one of those three data centers here. So I mentioned policies. So policies are key because they determine where that data is stored. So in this case, we're gonna store three full copies. So each data center has a full copy of our data. And now, if the data is being retrieved in one physical location, it's coming back from the closest location that we have. In this case, the application sits in New York, so it's gonna get the data back from New York. So you can place your data close to your application. Remember, I was talking a little bit about internet of things. That's an application that we see a lot, is how do I get the data close to the application? Because it's coming in somewhere, but it gotta be used somewhere else. So next one, I mentioned erasure coding as another option. So we have multiple different codes that we can use in erasure coding. If you have very large files, especially that becomes very useful. So what I can do is I apply an erasure coding policy where the data is essentially chunked up into data chunks and erasure codes. And those are spread across multiple sites. So that means I can lose an entire site. I can lose a complete data center and I can still recreate all of the data to bring that back to the applications. I can lose individual storage systems, individual nodes. I still have that redundancy without storing full copies. That's what erasure coding does for me. I can add, I can expand at any time. I can do that in the GUI. I can do that through the API. Remember, I was talking about you can tier data to the public cloud. So the way that customers are using this is that maybe at some point in time, you're saying, you know what, I'm gonna keep one copy in my data center. I'm gonna keep another copy in the public cloud. And because the cloud really does become expensive and I'm retrieving data, I'm not gonna plan on retrieving data unless I have to. So I still have that copy on premises, but if something happens to that copy, I still got another one in the cloud. And at that point, if I need that data, I don't care about the cost. So I can bring that data back. And again, all of that is handled in a single namespace. And then of course, so I mentioned you can use appliances or you can store data software defined, we support OpenStack, we support heat for orchestration, we support Docker, we support KVM, we support VM there, we build our own appliances, that's what you wanna do. So it's really your option and you can mix and match in your environment any way you want to. So I was talking a little bit earlier about policies. And so if you're thinking about policies, now imagine you have a hundred billion objects, petabytes and petabyte of data stored across 16 data centers across multiple continents in the world. Now you're going in there and you're changing the policy, you're editing the policy. Remember, we retroactively apply the policy to older data that matches that policy. Now if you're making the wrong change, so I call this the heart attack prevention feature because what this does is we allow you to simulate what happens in your environment if you're changing a policy. So if you're changing a policy and suddenly there's petabytes of data moving over a wide-area network connection, probably not what you want it to do. So with simulation, you can see when you change the policy what is going to happen to your objects afterwards. Personally, I find this is a very, very useful feature. Another thing I was talking about was versioning. I was mentioning that that's really your data protection feature that you can get in object storage. So let's say you're creating an object, report a document. Now if you're deleting that, that object, if you have the privilege to delete it, you can still go back to that original object. And if you're creating a new object with that same name, that doesn't replace the old object, but rather it's a new object, it becomes a new version of that object. So it has its own identifier, you can go back to previous versions. So that makes sure that even if you go in there, you create an object, you delete the object, you create a new object with the same name, you can still go back to the old version. You can configure how many copies, how many versions you're keeping. So that just gives you the ability to always manage what's your protection feature? How do you protect yourself against somebody hacking into your system and starting to delete data? If you're deleting data, if you're deleting objects, they're not actually gone unless you retire them. So wrapping it up. So I was talking a little bit about some of the things that you should consider as you're implementing object storage, right? I was talking about compatibility, that's important, and I hope I covered that. The API-based management that you can use your orchestration tools that you have in your environment to orchestrate the deployment, to orchestrate tendency, to orchestrate who has access to that infrastructure and what they do, that you place your data with a policy because what you want is a hands-off system, right? You want a system where you can say, all right, I set a policy, now I'm just ingesting data, I don't have to do anything. Our experience is that for most of our customers, they use the GUI to install the system and they don't ever touch it again, you know, unless they're maybe expanding it. And by the way, with our system, if you're expanding the system, you're adding a new data center or you're decommissioning a data center or storage infrastructure, if you want to, you can do all of that in the GUI. How cool is that, right? You can just go into the GUI and say, I want to add a data center to my environment. Policy-based data placement, so not just the initial data placement, but also retroactive. What are you doing after the fact? If something has already been ingested a long time ago. Data integrity, this is really, really important, right? So we have a lot of customers in the healthcare industry. So think about if you're having healthcare records that are 30, 40, 50 years old, you got to bring them back and they're corrupted. You know what, after 50 years, how are you going to get that data back? It's gone, right? So you got to make sure that you have an object store that is able to go back and find out if data is corrupted and fix it if that's the case. That's scale, you know, what is your scale? Number of sites, number of objects, capacities, how big can objects be, how small can objects be? You know, sometimes systems are really good at large objects, but not very good at small objects. So those are questions you might want to ask. That also feeds into performance. What is the performance of the system? How much hardware do you need to get the performance that you want? By the way, we just improved our performance by four times for small objects, which is a free upgrade to our customers. So because of software, right? So you do fix something, you improve the software, it's free. What's the efficiency? So sometimes systems support erasure coding. Swift, for example, doesn't support erasure coding across sites for production environments. You want to think about that. Do you need erasure coding? If yes, does your system allow that today? Do you want to do full object replication? Do you want to have the flexibility to do full objects? Do we erasure coding and bring it back? Just move back and forth, depending on your requirements. And then what are the deployment options? Hardware and software. A lot of times the conversations with our customers start about software. It's software. Software defined, we can use third party storage, all that good stuff. But then typically most customers buy appliances because they're easy to procure, easy to deploy, easy to manage, very high density. So it's good to have the options, but you got to think about what's the right way for you to work through this. So hopefully I got you interested in having a deeper conversation about object storage. And I know everybody is probably tired of looking at PowerPoint slides. I'm gonna be here for another 20 minutes or so. If you have any questions, please let me know. And thank you very much for joining me today. All right, thank you.