 Okay, why don't we get started? Thanks for coming to the talk today. My name's Dan Green. I'm the VP of Engineering at Scalely. We're a software-defined storage company. And joining me today is V&A Ransorel, who is our director of research involved in the advanced development topics that we have underway at Scalely. Today we're gonna talk about some of the challenges that we've heard from end customers related to storage and OpenStack. I'll cover some introduction and some of the use cases that we are hearing. And then I'll turn it over to V&A to talk about some specific solutions that Scalely has put in place in order to help solve those customer issues. So first off, touching on what we're seeing from customers. We've been involved in OpenStack since the Grizzly timeframe. That's when we published our first Cinder driver. And since then we've expanded into a couple different areas that V&A will dive into, both Cinder, Swift, as well as some open source work. And as we discuss with our customers that are interested in OpenStack, we're hearing specific challenges that they're seeing as they move typically from an enterprise view of the world into the cloud view. We'll then talk about OpenStack and the Scalely ring and V&A will spend most of its time there and wrap up with some Q&A. So first off, what are the challenges that real customers are having? Think of an OpenStack deployment. They've got thousands of VMs doing all types of jobs and what we're hearing is that only the top 15 or 20% of those VMs are actually doing what I would call tier one workloads. They need that hot edge of storage that is either flash based or typical fiber array. The majority of other VMs fall into that tier two, tier three workload space where they have IOPS requirements, somewhere around the 150, 200 IOPS, but definitely not at the level of the tier one, tier two. There are also a significant number of different applications that they're trying to run. What are the workloads? Are they running email infrastructure? Are they running various as a service for their internal businesses or external as well as what is the type of data that needs to be stored? Is it a typical object store, massive amounts of data that fits well into an object REST type interface or is it actually legacy information? Things that access data through file systems and is it small or large? Are we talking gigabytes or terabytes? And all of this builds into what we're seeing from customers especially standing up OpenStack is the movement towards petabyte scale storage. And when you actually move into that petabyte scale storage cost becomes a significant factor. You know, standing up traditional arrays, you quickly drive both capital expenses and operational expenses very high. And for many of the use cases that 80% that I talked about, you don't actually need that level of performance. And along with that, you need to basically deal with you know, how durable and available is that data? You know, from the enterprise world, you're used to the traditional, you know, gold, silver, bronze type storage levels. As we move into the cloud, how do we put those storage policies in place to allow the customer either to purchase from you if you're an infrastructure or a solution provider or internally in their cloud to say, you know, I'm willing to spend this much amount on storage and my business units or my external customers will pay me this and then that drives the overall business. You know, how much is a terabyte of data worth to the business? So, you know, the big thing here is, and to keep in mind is, you know, while compute is hard, scalable, reliable storage is much, much harder. You look at all the different solutions out on the floor, lots of people are doing software-defined storage, but you also have the traditional storage vendors there as well. All of them, you know, working to ensure that the data that's written to them is reliable, durable, and available. So, what do we do? You know, the world from our view is moving to these software-defined storage solutions. This is, you know, whether it's a virtual machine-based cluster, it's physical devices, it's appliances, the ability to quickly plug in more storage is becoming key as, you know, the internal infrastructure teams need to scale out for the new applications that their business customers are requesting. You know, everybody's well-familiar with the fact that, you know, with virtualization and OpenStack, you can go from, you know, weeks to get an application server up and running to hours or minutes, depending on your internal process. The same thing applies to storage. You want to be able to quickly scale out the consumption of storage for a particular business or use case, and that sort of really ties into this storage agility part of the story, which is giving your customers what they want at a price they find attractive for the business and not having them or you worry about, you know, things that don't add value to how you run your organization. You know, focus on the cool stuff, right? If you're a company selling a product, focus on new innovation rather than, how do I make my infrastructure work? And that's where OpenStack, especially software-defined storage, adds a lot of value. So, you know, today's approach and it's shifting, but it's definitely well-entrenched is the, you know, the typical proprietary hardware story. You've got arrays of some type. You know, pick your favorite vendor. They're running iSCSI. They're running NFS. They're running SAN. A lot of them work best in a single site configuration as you start scaling out into multi-site, into geo-distributed. You're running into synchronization and consistency problems. Really, these devices in the OpenStack world work well with that hot edge, that hot edge data requirements. You know, high throughput, low latency. Is it flash devices in the physical servers? Is it, you know, high-speed interconnects? You know, this style of arrays is really, you know, you rack something in. You drive a forklift in. You drop it in. You wire it up. You connected either IP or SAN and you run and you've got awesome screaming performance and that's perfect for that 20%. But when you start looking at what you need in the future, you've got that hot edge, but you also have a capacity tier and that tier two, tier three workload that really 80% of the application workloads really run on. So how do you deal with that? You've got a couple of ways of accessing it. A lot of legacy application are file-based, you know, you write into a directory, you're using NFS mounts and things like, you know, CAD-CAM, video editing, all of these things. You've got the new sort of world of object-based applications, REST. You know, that could be anything from a SaaS solution to internal email. There are lots of different ways that REST is being consumed at this point. And then you have the VM-based application. So that's two aspects. The VMs could be the actual images that are running, but it's also, you know, coming from a legacy model where the actual data is stored in the virtual machine itself, maybe as a separate volume, all depending on what the internal architecture of the organization is. So what you need is, or what we believe you need is software-defined storage. So you have commodity hardware. You know, this is, you know, the typical vendors that are out there, compute in order to power the storage, racked full of spinning and a couple SSDs. And then using that, the software-defined storage, that application layer, you're actually building a highly reliable and durable system. And from Scali's perspective, the real value in where the world is going is the software-defined storage. As I mentioned, you walk around the show floor today. There are tons of people providing software-defined storage solutions. Each has its own nuance and its own differentiation, but it's clearly a move away from the traditional big rack of storage, or smaller scale, super high-speed devices. Again, those are great for some workloads, but when you start looking at, especially an open stack and how you start to scale out, we feel software-defined storage is the way to go. Now, you know, the question is why? And what we are seeing, this is something we found through some analysis, is basically software-defined storage allows you to scale. This particular study is meant to look at, you know, as we add more workload and more number of clients, how does the system respond? The green bar is your response and latency time, and the blue climbing incline is the actions per second attempted, so the number of clients. The thing to notice is, depending on the particular details of the workload, it scales roughly linearly, depending on the number of nodes. And then here, the interesting piece is that, that at some point you reach a resource saturation. So, let's say you've got six nodes providing a software-defined storage solution. At some point, you just run out of the resources necessary to handle the number of clients that are connecting to you, and that's where we see this drop-off, and then all of a sudden, the latency goes all over the place, the ability to respond to actions gets super wobbly, and in the software-defined storage world, what this means is, it's time to expand your solution. Do you add more storage so you can continue to climb at that linear rate? Of course, there are real-world constraints on how far you can climb on that curve, but the thing to note is that, until you hit some of the current physical limitations on size of SSD, size of spinners, as well as the number of compute nodes and inter-node traffic, the latency curve continues to stay flat as you add in more physical servers. So, what does that mean for OpenStack? From our view, there are basically two types of storage that are consumed in OpenStack. One is the ephemeral storage. Now, think about it as when you spin up a VM for a SaaS application. The VM that's running the actual SaaS app doesn't have any data in it, right? So an ephemeral instance, as soon as it disappears, the storage below that disappears. It's good for booting the VM, it's good for some of the operations that are associated with that VM running, spooling logs before they go out to a collection system, but of course, this doesn't persist. And what that means is as soon as that instance goes away, that information goes away. So that's where the persistent storage comes in, and that's where we see two types of storage. So you've got the block storage via Cinder, and typically for storing VMs. You know, as everybody should be aware that Cinder is just an API for controlling storage as provided by somebody else. You know, there's Scality, of course, but NetApp, EMC, SolidFire, all of the people out here providing software-defined storage and Cinder drivers. This is a control API, and the actual IOs talk to an end device that's provided by your storage provider. And the other type of storage used is object storage, and that's via Swift. And Swift is different in that it's actually a full stack, a full solution that provides rest-based storage. For those of you who sat in on some of the Swift talks this week that they have error encoding coming, they do replication, they got zoning, everything you'd expect from a full product that typically you would have seen only through the storage vendors that are providing Cinder-based drivers for their information. So I think the key differentiation there is Cinder, you're actually talking to a product, a vendor, and Swift, you're running the Swift software that you can download for free or get supported by Swift stack or one of the other people. So that's sort of at a high level what we're looking for within open stack and storage. So how do we deal with that? What are the things that give us solutions that work for Cinder and Swift? And this is, for the example of Scalety, where we feel that the software-defined storage really makes a difference, where you have a number of workloads, again, I wanna reinforce that 80% of our workloads are not this high-end, low-latency, high-throughput. You can access storage through file, through object, through VMs, and the different types of protocols you can use are the standard ones, NFS, SIFS, whatever your object flavor that you enjoy, S3 slash compatible, Swift is there, CDMI, there are a half a dozen custom ones as well. It all depends on what type of investment your business users have made on their applications. And then at the bottom, from a storage perspective, they don't care what open stack on the compute side runs, right? We're just providing a service, and you're able to scale that service in both locally or geo-dispersed based on the capabilities of the storage system. And you add more storage, your business is doing well, your business units are continuing to thrive, it's easy enough to scale out. You drop in some new servers with some spinners and some SSDs, and you expand it from one petabyte to two petabyte or beyond. So from Scalety's perspective, we think that software-defined storage, as I said, is the way to go. Scalety itself, we think is obviously a great solution. I wouldn't be up here if I didn't think that. We're already compatible with Cinder and Swift. We made some announcements earlier this week on our new Swift support. We've had Cinder in since Grizzly, and Vinay will talk a little bit about a new open-source project that adds another type of Cinder solution into the mix. The key for us is how do you provide this storage in a way that seamless integrates well with open stack, and from our perspective also services some of the other needs that businesses have outside of the open stack arena. So I've got a team of four dedicated people that are developing work for Scalety, that are involved in the community around Cinder and Swift doing code reviews. We feel that both from Scalety's perspective, where we obviously have a business value that we add, but also from a community perspective that open stack is really going to continue to disrupt what's happening within both enterprise space as well as the service providers. So what I want to take away from here is basically what does Scalety provide? And Scalety in my mind provides a few things, and I'll use Amazon terminology because most people are familiar with that. We provide the equivalent of EBS. We provide ephemeral storage as well, the equivalent to the boot and bring down of volumes. We provide a equivalent of S3 interface, and that can be CDMI, it can be Swift, it can be a S3 compatible API. And on top of that, we throw in some NAS for those legacy applications that are out there, allowing you to basically service all the different needs of the applications that may want to move into an open stack environment. One of the things that is very attractive about these software defined storage solutions versus a legacy solution or one of the higher end solutions that is good for the 20% of those workloads is that we're able to through both software innovation as well as the commodity hardware basis to get a very low cost per terabyte, which is very important to the business as you look at how can you add value to your end customers? What does the business unit or the software as a service application user really need to be charged when they're using a solution built on top of your infrastructure? And of course, all the great benefits of open stack are out there as well, easy automation deployment, monitoring things that are making this community grow stronger and stronger. For the last thing I wanted to talk about, this is an interesting graph. It talks about how do you scale VM workloads across the IO loads? And one thing that we're finding in software defined storage and why we think it's great for the 80% is if your workload is so intensive that it has a few VMs that are hammering a storage device very hard, thousands of IOPS, the benefits that you get from software defined storage become more challenging because you become IO bound. Basically how much IO can you push to a single storage device that's providing the backing for the VM? But as you look at scalable software architectures, you can add in multiple types of access, multiple connectors if you will, that allow you to service hundreds of VMs per scalable storage unit with a total IOPS load that is quite high, but the individual IOPS to a particular VM is in that 150-200 range that I talked about. So you basically see a pretty nice return on the cost of the storage per terabyte. The number of virtual machines you can support per software defined storage instance, as well as the lowered cost associated with managing software defined storage versus a traditional solution. So with that, I'll turn it over to V&A to talk more about the technical side, which, what we're doing in Cinder, Swift, and the open source side. Okay, thank you. So our first contribution so far is Cinder backend. So we are there since crazy. So we offer block storage, volume storage through our distributed file system. So volume is nothing but a sparse file stored on the ring. So as you can see here, this is the horizon conflow in GUI showing a security sparse file declared as a volume. So here you see this is a mount point on the ring, and here you see two volumes stored as a sparse file. And so you can grow a single volume as much as you can because it's basically thin provisioning here. Our second contribution is a Swift plugin. So here we don't change the container and account management because we want to be nearly 100% compatible, especially with Keystone. And we simply replace the object component. So we are behind the Swift proxy. So the only thing we don't support from the TV from Swift is the storage policies for the objects. We do support it for containers. But for object, we have our own policy management. So you can decide to span the data on many rings or one ring stretch in many locations. We do have a regular coding already in production for more than two years. And also the important thing is that you can store your VMs and your application data in the same ring, if you want. So this is a quick example of a... So a simple container creation through the Swift command line tool that we put a file and here we can download it directly from the ring. Our third contribution is a kernel block device that we made an announcement yesterday. So it's open source. It can talk to any REST based driver. So we think it's interesting for the community. So for us, what's the need? So first you can consume a very easily block from a VM or from any other application. So you have two kind of applications inside a VM. The ones that are able to talk object natively. So the best ones. And you have the, let's say the legacy ones which needs data volume. So you can use the block device to quickly mount volumes and provision volumes in your VMs. It's available for other open stack products if they want to hack into it. Otherwise we have got a project to write another Cinder backend which will talk directly to this interface. So you will access the same volume through either the file system or the kernel block device. So depending of what you want to do, especially you can, through the block device, you can enable first Linux caching which is very nice. And also you can plug in layers with a device mapper such as flash cache or mirroring or whatever interesting features. So far the three, these three components are complements of our vision that we just described at the beginning of the presentation. So the thing there is a swift and the rest of driver. So this is for the future. I mean, it's available now but the Cinder driver based upon the rest of the river will be available in kilo. Where else? So who are we? So we are, maybe you know, so founded in 2009, so 19 employees. We are based in France, United States and Japan and Europe. Okay. So that's sort of the quick run through of what we're doing in Cinder and Swift. I think that there are a couple of unique features there that are interesting. One, I think the Scality rest block driver, I think is pretty exciting. It's a block on the top so what people are traditionally used to using and rest out the bottom. Because it's open source, people can extend it however they want. Right now we've written in CDMI as an interface. Could be anything else, the architecture setup to be expandable very easily. We're using it specifically with Los Alamos National Lab. They've got actually a virtual exabyte GPFS volume running on top of our rest block driver pumping data through it. So it's a massive scale, very good for HP3. And there are a half a dozen other great uses for something that talks rest out the bottom and block out the top. As V&A pointed out, the Swift interface that we have is we're actually maintaining the core Swift code and just replacing the backend. And that ensures that as Swift matures and changes and adds in new features, we'll work seamlessly. So there's no playing catch up with APIs and behaviors because we're just acting as a target for the storage, the actual blob of data, all the other things that Swift provides come for free. And then our Cinder drivers I think are quite good. As V&A pointed out, the one that's currently out there since Grizzly timeframe is based on sparse files. So people who are used to managing file systems know how to look at it and understand it. But from a Cinder perspective, it fits in just like any other driver that's out there. And we're looking forward in Kilo to have the generic Scaldi rest block driver available for this new open source technology. With that, we're open to any questions. Okay, well, if no questions. Hi, there is one there. Yeah. No. No, the first version we were implemented as a backend. There is a notion of a storage object controller. So we simply override the class. So it's a very clean integration. We simply add new method of storing data. And we don't modify at all the code. Yeah, exactly. No, it's not upstream. It's just on GitHub for now. And then if it is popular, we'll try to get it upstream. That's a good point, yeah. The block driver. So for that, you can use LVM on top of the counter book driver and you can do it. But for now, we don't support that except in the block interface. Yep, we decided not to touch this part. So what you do is, so we have a pretty much standard installation where you define your container ring. So generally, we put them on the same controller nodes. So you know you have three controller nodes, one HAProxy, and we make a small ring on SSD with containers and accounts. And then we have the skeleton ring behind. So it's fast enough. Yeah, so in skeleton ring, we have replication and we have RZR coding. So we do support up to six copies of data. Six copies when you do replication. And when it is RZR coding, it's up to 64 parts. So we can do 10 plus 14 or whatever. Most people are doing what, eight plus? Eight plus four. Unless you go to your distributed and then they go up to a 10. Okay, well if there are any other questions you don't wanna ask, we've got our OpenStack team here. Come on down and we'll answer them for you. Thanks everybody. Thank you.