 All right, microphone is on, beautiful. Good morning, it's still morning, just a few more minutes. Good morning everybody, thank you for joining. The last session before lunch, so I gotta get everybody excited. I hope you're a little bit hungry. You're a little hungry for information, but certainly hungry for food. I've heard it's decent. My name is Inko Fuchs, I work at NetApp, I work in the cloud group, and my team's responsible for a number of topics, including OpenStack and object storage. So I thought, well, that would be a good idea, let's talk about object storage in OpenStack environments. And so throughout the session, I'm going to talk about things like Swift, I'm going to talk about things like proprietary solutions, I'm going to talk about some of the benefits of each. But ultimately what I want to achieve is that you come out of the session and you have some idea about questions you should ask. I've seen quite a few customers implement object storage and then some of them were disappointed. And often it was because they didn't ask the right questions in the beginning. And sometimes it's hard because object storage is a relatively new technology. And a lot of people are not as familiar with it as they are with NAS or SAM, with block or file storage. Some are totally new to storage because they have been worried about applications and they didn't care how the storage guys did the storage stuff. So maybe a quick question, who comes from more of an application background running or developing applications? Who's coming more from a storage background? What is the rest? Both systems? All right, great, perfect, thank you. It's an interesting mix. Another quick question, who has already implemented object storage? Whether it's Swift or something else? Not much? Okay, great, perfect, thank you. Really appreciate it, that helps me. So let me start the discussion with talking about some of the use cases. Why is object storage so interesting? And one is cloud applications. So with the emergence of applications that are either born in the cloud or have been transformed to run in the cloud, we see more and more applications that were written to actually work with something like object storage. So if you look at Amazon S3, where you look at Google Nearline or Glacier from Amazon, those are all object stores and they use object-based protocols to work with them. Restful HTTP, it seems a natural choice when you're building new applications. It's much easier than doing a file or block interfaces, much less complicated because the object store does all the data management and where data sits and keeping it available and what the location is and all that kind of stuff. And it works for a lot of the workflows and workflows very well. The second one is where we see object storage a lot is data archives. So for those of you that worry about not just the transactional data right now, right this moment that is supporting your application, but also about storing data for a long period of time. That's very relevant. If you're building applications that serve, let's say, the financial industry, you might have data that you need to store for 30 years, financial records, loan applications. When you look at the healthcare space, life of the patient plus X and that X varies by region and by law, those are very long retention periods for a data. And so you need to think about, are there technologies that I can leverage that make sure that data is actually intact and accessible 30 years from now, 60 years from now and object storage has a great answer for that. And then some of you, like the previous customer, some of you come from the media entertainment industry. So actually, there's a lot of data being created in this industry and it needs to be retained for a very long period of time and you're reusing it all the time. It's data that pops up after 20 years because an actress just had a child or something. So you have a topic that comes up and suddenly you need to bring data back out of an archive of some kind and we bring it back into your life online storage. Those are some examples for use cases. Now, if you look at OpenStack specifically, Glance is a great example for use case that benefits from object storage. You can build these large scalable environments to store your Glance images. When you're using Cinder, Cinder's block, right? So you got these big blocks of data. A lot of times you have a project where you need the Cinder volume when you're working with Cinder and then you're done. A question is when you're done, are you throwing it away or are you keeping it? More and more of what we're seeing when you're running into an enterprise IT requirements that you need to hold on to that for compliance reasons or because nobody wants to sign off on the deletion that happens often. So you can use a very scalable object store to just move your Cinder environment over there and leave it there and archive it, keep it for a year just in case you need to bring it back. The board of the cloud applications already talked about. Big data, more and more use cases about data analytics. So you have this internet of things that is creating all of the sensor data. I'm not gonna bore you with that, but you're creating a lot of data that you need to collect because you don't know what kind of analytics you can do maybe five years from now. Maybe data that is not or that you're not able to analyze today or that doesn't give you relevant results today, maybe five years from now it does. So you want this big bucket that you can throw data into and keep it. And then long-term archives, which I already mentioned. Now, one question that I got a lot is so what is object storage? You know, how is it different to file and block storage? One way to look at it is how do you connect to this storage infrastructure? And typically what you use is one of three interfaces. It's either S3, that's what the vast majority of the world is using. There is Swift, of course, very closely aligned with OpenStack. And then there's CDMI, the Cloud Data Management Interface. CDMI is an interesting one because CDMI was developed by SNEA, the Storage Networking Industry Association and it is an ISO standard. It is a very, very powerful interface but very few people use it. So pretty much everybody decided it's S3 and or Swift. So it's interesting when you evaluate solutions of this environment and maybe think about, well, maybe today I'm gonna do Swift or today I'm gonna do S3, maybe tomorrow I'm gonna do CDMI. Depending on your application that you use case, you might choose different interfaces. The second one is how do you address the data inside the storage infrastructure? And so most of you are probably familiar because most of you raise your hand with storage, are familiar with file and block. So block, you need to know the sector on the drive. Great, sounds complicated. The second one is file. You need to know what server or NAS storage system your data is stored on. You need to know an IP address. You need to know a share name. You need to know a directory structure all the way down to the sub directory and then you need to know the file name and the file extension. Sounds complicated. With an object system, you only get that object ID and that's it, that's all you have to know. Well, plus one IP address. So I liken this to comparison to valet parking. Who has a car? Who has driven a car ever in your life? I'm mostly just making sure that you're still, you know, physically capable of moving. Now, when you look at the way that you normally go and you drive your car into town and you find your restaurant and then you're trying to find the closest parking garage, closer to the restaurant that you picked for dinner and then you go up on the third level, you go on the fifth row all the way at the back, you park your car, you go off dinner and then you have to find your way all the way back to the parking garage and find your way all the way back to your car. You have to memorize what level you're on, what row you're on and, you know, I'm assuming you know what your car looks like, right? But if there's, you know, three Chevy's parked next to each other, you actually have to remember license plate number. Now with object identifiers, what you do is you go to your restaurant, you drop off your car, you get your valet ticket back, when you're done eating, you give the valet guy your ticket, you get your car back. If you define the correct service levels, it might be washed, maybe the tires are rotated, all that kind of stuff. But that, you know, it eliminates having to know where your data is. All you know is I have a ticket, give me my data back. I don't care where you get it from. I don't care where you parked it as long as it didn't get stolen or vandalized or anything, against service levels. The nice thing with this is if you're flying home, you know, or when you travel here to Vancouver, let's say you checked in in New York, you dropped off your car at the airport, you got your valet ticket, you came to Vancouver, you use your ticket, you give it to the valet, you get your car back. Here in Vancouver, how great is that, right? Wouldn't that be nice to have such a valet service? So that's something that object storage can do. It completely virtualizes and abstracts the physical location of your data from your application. And then you have metadata. Well, that's when you drop off your valet, your car worth of valet guy, and you lose your ticket. How do you get your data back? How do you get your car back? Well, hopefully you know the make, the model, the color, maybe even the license plate. That's all metadata, that describes your data, that describes your car in this case. So if you know what that metadata is, you can find your object back, even if you lose that ID. Not a great idea to lose your ID, but you can get your data back. So that was kind of just a very, very quick overview of what object storage is. Now, when it comes to a natural implementation of an object store, there are a number of criteria. And I'm just gonna touch this really, really quickly because hopefully by the end of the session, you can just go through all of those without my help. But reliability, durability, and availability, when you run an enterprise IT shop, those are very important criteria, right? You want to make sure that your data is available, that your applications are running, otherwise you might be losing your job. You want things like policy management, that determines automatically in the background where your data is being stored. You don't want to manually move data around. Multi-tenants, obviously in large environments. Elasticity, you can grow, you can shrink. You want to lower costs. Well, who doesn't like the cost model of the cloud, right? So if you're thinking about us, one cent per gigabyte per month if I do it in public cloud, can I beat that price with an on-premises infrastructure? Whether that is your enterprise IT data center or if you're a service provider, you're a service provider environment. Yes, you can. With object storage, you can beat the cloud economics depending on the use case and all of those asterisks and footnotes, but you can. So don't walk away from it just because you think you can't beat the price of public cloud. Suffer defined. We get this question a lot. Can I do software defined? The reality is most people, most traditional enterprise IT shops don't go software defined. They go with an appliance, easy. You plug it in and runs, but you need to have the option to go software defined and use the infrastructure that you already have. And then finally, that global namespace that what I just described with the valet parking, it doesn't matter where my data is stored. I want to get my data back whether it's stored in New York and LA and Vancouver. I don't care. I need to get it back from there. So let's talk about the obvious choice, Swift. Now Swift, really interesting, and the first one I really love is the OpenStack community has decided to separate the Swift API from the Swift implementation stack. So you get a choice. You can use the Swift API without having to use the implementation stack of Swift. And that means that it gives you a choice to determine what you're going to use for the implementation. What are you going to use to store your data, to manage your data, to ensure data durability while still using the API that you want to use. You get the global namespace. It's very scalable. It offers some to your distribution options and there are limitations to it, but it's there. And they just rolled out erasure coding. Now to be very clear with the erasure coding is the erasure coding is great. There's a lot of stuff with the devil is in the detail. So if you're comfortable with picking your own library and doing your own quality assurance testing, the erasure coding beta is certainly very interesting. Now there are some things that you can do to improve on a Swift deployment. So this is when you go from a Swift deployment on just using individual hard drives to a Swift deployment that's using something a little bit more enterprise storage. The first one is usually when we talk about enterprise storage in Swift deployment, the first question is, that's more expensive. Well, that's not true and I'm going to show you why. The other one to keep in mind is if you're running, if you're building an open stack environment as an experiment and then you move it into enterprise IT, one of the pushback that we get is it takes up too much space. I don't have room in my data center. When was the last time if you're a service provider that you talk to your data center guy and he said, oh, I got tons of room in the data center. Just ship me another 200 service, no problem. You don't hear that very often. And so the ability to reduce rack space by 70% can dramatically lower your costs and your complexities. And the other thing is predictability. If you're running an environment that needs to run all the time, you want to minimize the disruption to your environment. You want to minimize performance impacts or availability impacts in the environment. I'm going to go through details for all of them. And then of course, HAA, you want to make sure that there's no failure. Nobody wants to lose data. So typical environment that's Swift. So if you just read like five minutes of Swift and how it works, this is one of the fundamental principles that we see in most Swift environments. And that is that you store three copies of data. Why do you do this? Swift is designed around running on commodity hardware. Commodity hardware fails. You have no protection. So what you have to do is you have to store multiple copies of your data so that if something breaks, you can repair it, right? So what you do is similar to HDFS, if you're familiar with that, Hadoop Distributed File System, you're storing three copies of your data. If something happens, you can rebuild it over the network. But that means that if you store a petabyte of data, you need to have three petabytes of capacity because you're storing three copies of your data. Now you can do things a little bit differently and I'm using our product as an example, this NetApp E-Series Storage, Block Storage that you put underneath Swift. And what we're doing there is that we're using an erasure coding algorithm that we call Dynamic Disk Pools to spread your data and your redundancy information over a large pool of disks that can be dynamically adjusted. That's why it's Dynamic Disk Pools. What it gives you is an advantage of even traditional rate in that you'd have very, very minimal impact if a disk drive fails because you're only hitting very few segments of a particular data set. So if you're using something like Dynamic Disk Pools, you only need to store 1.3 times the capacity of your data. Because now you don't need three copies anymore, you just get that little redundancy effort, that's 30% in this case, on top of that. And you can even do a lower if you wanna do some other things in different rate levels, et cetera. So benefit is your system runs more smoothly, runs more predictably, without creating all those redundancies that you need. And that helps with rec space and everything else. So if you look at the business impact that you have in this environment, if you use a traditional rate, rate six in this environment to protect your data, and you're looking at how long it takes, we stopped doing the math of three terabytes because once you get to four and six and very soon 10 terabyte drives, the numbers just kinda start looking a little bit ridiculous. But with Dynamic Disk Pools, you see that your impact, how long it takes to repair a drive is much, much shorter. That translates to a more optimal operating environment for your enterprise IT shop. So instead of having a business impact where everything slows down dramatically for a fairly long time, you make it shorter and you make it less impactful. And then ultimately if you look at how much data is actually affected by it, there's just because we're doing the slicing across a lot of disk drives, you have much less. Now, I talked a little bit about density. So availability is great, performance is great, predictability is great. So tell me again about density. So this is a typical Swift environment built on regular disk drives. And if you do this with, in this case, our box, this is a 4U storage system, 4U 60 drives. If you're interested in hardware, there's actually really neat layout. It's kinda like these multi-level ovens if you do Christmas cookie baking. Nobody does that, all right? So you have these trays that you pull out and all the hard drives are kind of laying in there as cookies. So if you need to replace something, you pull it out, put a new drive in while the system is running. But you only pull out a few drives because you have that tray. So you look at some of the competing options 4U 60 drive means you have to pull that entire thing out. You didn't make sure I'm not falling off the stage here. And then you have this butterfly mechanism and you have to go in top. Now if this is at the top of the rack, you need to tall ladder and you need to hope that there's no ceiling above you. So it's actually very, very cool hardware design. So you need the storage system, a couple of service and you're done. So that can dramatically improve your workspace. And then of course you can scale vertically because of our infrastructure and you can scale horizontally because of the intelligence and Swift. So if you wanna make Swift run better, run more efficiently and less space, that's a great option. Now what if Swift doesn't give you enough? So Swift software is great. There's a lot of movement and it moves very fast. Very dynamic engineering team, good stuff happening in there. But there are some things that are missing, a dynamic policy engine where you can adjust where data is placed over time. The data durability framework where you do regular health checks, make sure your data is still intact. So you're not going back in 30 years and recognize the data got corrupted 29 and a half years ago. The flexible tiering over time, what about support for tape? Now tape isn't sexy at OpenStack, I get that. But a lot of customers ask because they made an investment over the years and spend a lot of money on tape. So if you can tell them, well, if you really want, we can actually include tape. You don't really want to do that probably, but you could, it's investment protection. I'm gonna talk a little bit about Eurasia coding and S3 support and all that stuff in just a moment. Appliances, very, very easy deployment. What about auditing and reporting capabilities? So I want to introduce a product here that we built. And I think you all know or most of you know that NetApp has been doing a lot of work in the OpenStack community, but one of the products that we provide for the OpenStack community is called Storage Good Web Scale. It's an object storage product. Goes back more than 10 years in production environment. It originated in the medical field and that's relevant and interesting because that's where a lot of our data durability features are coming from. If you think about somebody that does surgery preparation, he's gonna take out a kidney. You want to make sure that he's taking out the correct kidney. So that's where a lot of our algorithms are coming from making sure that the data is not corrupted, it's not changed, it's not lost, it's always exactly identical to what was originally stored. And this is a solution that we're proposing in this environment. Let me walk you through how we answering some of these requirements that we're hearing from our customers in the OpenStack space. One is data placement. The value of data changes over time. So storing all the data in one single spot is maybe not the most efficient way to do it or not the most cost effective way to do it. So if you're thinking about just your on-premises deployment or hosted deployment, you can have flash, you can have disk, you can have geo-distributed erasure coding. I'll get back to that in a moment. You can even have tape. Plus you have the option as a tier to use public cloud. So if you have an S3 compatible target, either on-premises or hosted that you can write to that you want to leverage because maybe the economics of the cloud makes sense for something, this is just one tier of storage that we support. And you can manage it across multiple data center locations. And that can change over time. Maybe when you're ingesting data first for the first four weeks or so, maybe you say performance is the most important factor in determining whether this data should be stored. Both physical location as well as tier storage. But maybe after that month, cost is the most important factor. So you want to move you to something that's more cost effective. The way that we're doing this is with a dynamic policy engine. In a policy engine, you go in the policy and you say, well, here's how this particular set of data should be treated. And we can do this on a per object basis. Every single object can be treated differently from policy perspective. You don't have to do it on a bucket basis or container basis. You can do this on a per object basis, very, very flexible. Say this is my policy and this policy determines where data is being stored and for how long. You can say it's gonna be stored here. You know, let's say in our data center in New York, performance very important is going to be stored there for the first 30 days on very, very fast disk. After 30 days, I want my copy in my much, much cheaper data center in Oregon or in Nevada, wherever it might be. I'm gonna keep it there for 30 years and then I'm gonna throw it away. But during those 30 years, I'm going to do a health check every six months to make sure the data is still okay and hasn't been changed. One important thing here is that what if you change your mind? What if you acquire a company and there is a new data center? Or let's say cost factors have changed. You now want to go to public cloud with Amazon, have it hosted or something else. With this policy engine, you can actually have all that data being moved dynamically in the background automatically once you change your policy. You don't have to manually move those hundreds of millions or billions of data objects that you stored. It happens automatically. I talked about data integrity. That's very important, right? To make sure that at every single step, ingest, transport, replication, readout, the data is always assured. It's always correct. Nothing ever changes. If it breaks, we automatically fix it. We regenerate the object on the fly. The application will never know. Also, playing into this is data availability, making sure that data is always available. Regardless, even if you lose an entire data center, you want to make sure that all the other data centers still have data that you can recreate or just create another copy. Data availability, even if sites go down, it's very important. Especially if you're looking at a truly global deployment. Sometimes internet connection to Asia goes down. Can you still operate even if the copy of the data was originally stored in Taiwan? You want to be able to do that. One of the concerns that customers have, and I think one of the drivers for the adoption of OpenStack is security and security in the cloud and security across clouds. We offer encryption. So if you're not encrypting data on the client side, well, we are doing it on our side. If you combine it though, so if you're doing data encryption on the client side, and then you're doing encryption again with us, your data is very well encrypted. That's a great way to look at it. And then obviously, we support the standard security access mechanism from S3 and CDMI, for example. And then the option to actually take your data and tear it to the cloud, if that is what you choose to do. We have a lot of customers say, you know what, ultimately I want to do public cloud, but I need to go with an on-premise infrastructure today. There might be legal reasons, might be lost, might be compliance rules, whatever it is. Maybe sensitive customer data. They're saying I want to do on-premises today, but I want to have the option to leverage the public cloud. So with us, transparently, you can use the public cloud to store your data. And this might just be like a disaster recovery copy. Maybe you can afford to have one big data center, and that's where you run all your on-premises environment, but you choose to say, you know what, my disaster recovery copy, instead of building a data center or leasing one and having hundreds of servers to sit in there, maybe I'm just going to use Amazon for that, right? So that is one option. Now I promised I would talk a little bit about erasure coding. With erasure coding, erasure coding is something that's been around for a long time. If you look at rate, rate storage, that is a form of erasure coding. So we use rate and we use dynamic disk pools. Again, those are different ways of doing erasure coding. We also do something that's called geo-distributed erasure coding. But the way that you can imagine how this works is look at the way, you know, a rate system works. You got a bunch of disks, you got a rate controller that does all the calculations and everything in data access. So you can spread your data and erasure coding data, your redundancy data across a lot of disks. Geo-distributed erasure coding works exactly the same way, but across physical locations. So now you can geo-distribute erasure coding across, let's say, three data centers or five data centers. So if you lose an entire data center, you still have access to all of your data. But you're doing this without the cost of storing individual copies of your data in each site. So traditionally, what you do is you store full copies in each data center. With geo-distributed erasure coding, you're just spreading your data and redundancy information across data centers. If you lose one, you can recalculate and repair that data. And again, it's transparent to your application. Now what that means is it's probably gonna be a little bit slower. And your overall availability rating is gonna be a little bit lower. But if you think about the cost and the impact in your environment, this can make a lot of sense. Think about you're storing a lot of data and you're moving from full copies to geo-distributed erasure coding, which you can do automatically with our policy engine. Once you move to geo-distributed erasure coding, you have roughly 30% hit on your capacity as opposed to a 300% hit. That means that suddenly you might have a petabyte of data storage just available for the next application. That's what it really boils down to is you have a more efficient way of storing data, meaning that once your data is in that right mode, that means that you can free up capacity for the new data that's coming in. And all of that happens automatically in the background transparent to the application. So when you look at deployment options, I talked about software defined before and I said, well, there's kind of the I want to do software defined versus I want to have an appliance kind of discussion. It's a little bit philosophical. With our product, you can actually do both. So you can mix and match your software defined and your appliances. So I had a customer that I talked to two weeks ago and he was running a project and be close to the deal. And he said, you know, I just invested into two petabyte of your competitor storage. And I thought I was going to roll out my applications this way, but I was wrong and so now I made a mistake and I, well, I can't give it back, right? I bought the stuff. He said, that's no problem. We run software defined. We support third party storage. You can use that storage investment that you already made and just run our software on top. So you got that investment protection. No problems there. We have the four you 60 drive appliance. And I talked about the supply. There's the same one that I talked about earlier, you know, with those cookie racks that you can pull out and you got your cookies, your disks stacked in there. And so what we did is something actually very neat. We put all of our software right inside that appliance. So you don't need a whole stack of servers on top of that. So from a density perspective, now you're talking about a four you system that has 60 drives in it with up to six terabyte each right now and the entire intelligence to manage all of that data, the policy engine, the data durability features, the interfaces, all of that is inside that appliance. So again, it's just very, very efficient, very easy to deploy. Now the question is, well, that's all very interesting and go thank you for that. But can I use this for OpenStack today? Yes. So we support Glance today, we support S3 and CDMI. I think most of you know that, you know, with OpenStack application stacks, you have the choice if you want to use Swift or S3 and to some effect CDMI. So you have that choice today. And we are actually doing an implementation of the Swift API right now. Now we have made a decision not to go with the gateway, not to go with the bridge, not to go with the shim but rather implement the Swift API natively inside our core software stack. This gives you a much better integration, it's a much more efficient way of doing it, a much faster way of doing it, but it takes a little bit longer. So that's why, you know, I'm not here today telling you that it's available today, but it is coming very, very soon. And if you're thinking about implementing something, if you're thinking about a proof of concept or a pilot, let me know and we can provide the software to you. I want to give you a very quick, very quick customer example. And for those of you that come from the storage world, you're probably thinking, okay, well, why would I go to object storage or what about network attached storage? You know, well, in go your storage system for NAS, they score, they scale to 70 petabytes. Why would I even consider object storage, right? So this is a customer that I actually used, previously used about six petabytes of our NAS storage, network attached storage, storage file based. And what they recognized was that they were running into problems with their applications because their intellectual property is all about their software. And they're ingesting billions of files every year from all of their 19,000 retail locations, 8,000 kiosks across the globe. They're getting all of this data coming in. I'm not allowed to tell you what that data is, but they get all this data coming in. They need to store it, they need to manage it. They need to do some cleanup work with it. They need to do some alterations. They need to store it in different ways. And they need to keep track of where all of these files are. And for every single file they need to store which server is it on, what's the file share, what's the directory structure, what's the name. And they need to keep this redundant in case something goes wrong. And then they just have to hope that nobody goes in there and manually moves the file. What they recognized and what they decided to do is, okay, these database are getting so unwieldy that I need to do something differently. I can't do interactive lookups anymore in my data. My customers are getting frustrated because they can't offer certain services to their customers. And so they decided to say, okay, we need to go to an object store. And they decided they needed to be on-premises. And so they implemented storage with WebScale. And one of the side effects that I thought was very interesting is that they so massively shortened the length of the rows in their database where they used to store all the path information and how many copies they had and where they were physically stored by replacing all this information with object IDs. So the rows have gotten so short now that now they can do interactive lookups. They can provide interactive services to their customers. So their choice to go to object storage was not the scalability, but it was the simplicity and the automated and transparent data management that allowed them to optimize their applications by going with an object. And I thought that was an interesting perspective that I wanted to share. Now, I've been going through this at 100 miles an hour and I apologize. A lot of stuff I wanted to get across and wanted to make sure that I get all this information across here at the session. Now, we have a booth here, S13, with a beautiful view of the ocean out there, of the bay. There's another session actually this afternoon at 410 where my colleague Alex McDonald and Mark Carlson from Toshiba and they have, I don't know if you have seen that, the Toshiba booth. They have an object-connected hard drive. It's kind of a cool idea. They're going to talk about Swift and extending it with S3 and CDMI here and they're going to go much more technical. What I did here is really more of an overview and introduction. They're going to go really, really deep into those protocols, what the differences are and why you would want to combine Swift and or S3 and CDMI and what the benefits would be. If you're more interested in erasure coding, there's a great webinar that the Storage Networking Industry Association did explaining erasure coding and why you want to do hierarchical erasure coding which combines local with geo-distributed. That webinar is available. It's free. It's available without login. So check that out. And then of course, I have some product links here as well. If you have any questions, please contact me, either email or Twitter. I respond, trust me. I really do. And with this, I want to thank you very, very much for attending the session. I hope this was useful and I hope you have a great show. Thank you.