 Is the audio working? Can you hear me? Perfect. Okay. Good evening, everyone. Thanks for actually attending this session. So you have one surprise announcement today. You guys got the raffle ticket looks like. So there is one towards the end of the session. There will be a lottery. Considering that it's very light attendance, there is a high probability that you may win, which is good, right? There is a high probability. So this session is all about addressing the storage management challenges using the open source SDS controller. My name is Reddy Chagam. I am from Intel Data Center Group, specifically the storage group, chief SDS architect. I am actually co-presenting this session with David Barber. He is the Oregon State University Senior Program Manager. He actually is from Information Services Department. So he's going to share the perspectives on how he sees the storage management pain points from his IT perspective. He's going to share why it makes sense to have the SDS controller, specifically the open source one. So I'm going to cover a couple of things, just as a backdrop of, before I jump into the actual architecture specifics, let me cover the data explosion. What do we see in the industry from an IT perspective? What are the specific storage pain points? How does the software defined storage framework really help address these pain points? I will touch upon a couple of things. One is the controller prototype. And also there is a recently announced Copperhead open source project from EMC Viper team. We have team here. So if you have any questions on Copperhead, feel free to ask them. They will be here after the session. So you should be able to get most of the answers, the questions answered, I mean. And then we'll talk about the customer perspective from Oregon State University from David, and then we'll do the summary. So if you were to look at the data explosion, the problem that IT is facing, the chart on the left-hand side, essentially the data is doubling every two years. But on the IT budget is relatively flat, somewhere around 2% or so. But most of the IT customers that I have been talking to, the IT budgets are relatively flat. So you have the data is doubling, IT budgets are flat, and your operations team is essentially going to take a mode burden. If you were to look at the chart on the right-hand side, between 2014 and 2020, the amount of data that IT professionals end up managing in 2020 is roughly six times. And if you were to look at the way storage is being managed in the current IT, data center, it's all siloed. So you have essentially the, this is, you probably might have heard about Pets versus Cattle. So essentially you have applications, and you dedicate the storage to a specific set of applications. You optimize those applications to, for the tighter SLA and service level agreement, and you normally don't tend to share. And the challenges associated with that type of a traditional model is, obviously it increases the cost, both from the capital perspective, as well as the operations cost. Because the storage capacity that is lying around can't be really reused because of the fact that you are dedicating to certain set of apps, and you can't really share them. The other aspect is the vendor lock-in. So what happens if you run out of capacity from one frame? You typically tend to actually go buy the capacity from the same storage vendor. So you tend to kind of get into a situation where you do not have lots of choices for you to take advantage of the disruptive technologies that are actually shaping up in the industry, like scale out, for example. And then the emerging workloads are fairly cloud-centric. If you were to look at in-memory analytics, Hadoop type of workloads, infrastructure as a service workloads, they tend to be fairly elastic in nature. You can't really optimize it, deploy it. You really want to have a model where you can grow the capacity on demand, and these are the workloads that are really tailored towards how you expand the capacity based on the increase in the workload requirements as opposed to really plan ahead, deploy, and manage it that way. So those are the three challenges that are actually challenging the traditional storage management which really needs to change. So the industry response is software-defined. So I'm going to focus specifically on the software-defined framework before I jump into the software-defined storage and what exactly it is. In the industry, software-defined storage has different terms. And in Intel, we actually spent quite a bit of time to figure out what exactly is software-defined storage? What do we believe is the right thing to do for the industry? But before I jump into the software-defined storage, we have to look at from the data center perspective. So the framework that we call is as a software-defined infrastructure. Essentially, it has the compute, and it has storage, and it has network elements. So if you were to look at the bottom-most piece, you have set of resource pools. These are the infrastructure resource pools. These resource pools expose wide variety of infrastructure attributes. That's the way we see it. So things like all the power, thermal, security, performance, latency type of attributes. And then you feed those attributes to the orchestration software. The software should be intelligent enough to say, here is what my application needs. Here is what my infrastructure is capable of. And you can actually do a very intelligent mapping of infrastructure resources to meet your application requirements. So that's kind of the fundamental SDI framework is all about. And if you were to zoom into this software-defined storage, we believe the SDS framework is essentially four different elements. The first and foremost that you normally see in the industry is how do you decouple the software from the underlying hardware. So the abstraction of software from the hardware is one of the key building block elements that lays the foundation work for the software-defined storage. So that's one element. And you should be able to aggregate the storage resources from diverse vendor providers. So you may have traditional scale-up, SAN app alliances, NAS filers, or you may have a scale-out app alliances or maybe on the standard high-volume servers you have some software overlaid on top of it and you have a scale-out software infrastructure on those lines. And you should be able to aggregate those resources and consume them. So that's a second building block. And you can provision these resources elastically. You don't have to provision upfront. You should be able to expand based on the customer requirements and workload requirements and demand. And the last one is orchestrating those resources. So if your application says, I want certain type of workload profile, certain type of performance requirements, the layer that actually sits between the infrastructure and orchestration software, which we call as a software-defined storage controller, should be able to have that type of intelligence to allocate the storage resources. So those four things essentially comprise the software-defined storage framework. If you were to click down on the architecture itself, we see three building block elements in the software-defined storage. As I mentioned in the previous slide, the bottom most piece is the storage systems. And it includes traditional as well as the scale-out storage systems. SDS controller is essentially the layer that is responsible for allocating the storage resources when an application makes the request. The way we see applications making a request is essentially using the service-level agreements and service-level objectives. If you were to look at the traditional model of how you're allocating the storage resources, I want 10 terabyte volume. And I want to have... You can actually create a lot more richer constructs, indicating I want this type of performance characteristic, this type of latency characteristic, this type of data lifecycle management. If my volume is idle for six months, I want to be able to move to lower cost here. So the goal is to kind of have application describe what it requires. I think lately, if you guys have seen the SDN topics, there is something called intent-based provisioning. So it's all about intent of how you want to use the storage resources as opposed to, here is the place where I want to create my volume and here is the block and here is the capacity I want. We want to move away from that and have applications describe the requirements in much more richer way. So that's where directionally we want to go over the course of time. So once you have the applications describe the requirements, SDS controller kicks in. It has visibility into all the resources. It figures out the best place to carve out those resources, hands it off and application and the storage system interact directly. That's called data plane. So the controller responsibility is essentially carving out the storage resources, which we call that as a control plane and then once that is done, you switch over to the data plane. We also have one additional building block element which is called data services. Data services, depending on whom you talk to, you will see a different description of what data services mean. In this definition, what we are saying is anything that actually sits in the data plane does some sort of data processing before the data is stored in a storage system. This can be your compression, encryption, dedu, all kinds of analytics type of implementation can be embedded in the data services. It does not persist the data. It just manipulates the data in the data path. What we see is as an industry, by having a generic framework, we can actually enable a rich ecosystem. I call this as an app store for storage. So data services is essentially you can bring in, you can innovate, you may have a high-speed compression appliance and you should be able to plug that in, take advantage of the rich ecosystem around it. So that's the data services. So those three things actually comprise the SDS architecture. Now let's look at the specific functional components in the SDS controller, which we believe need to exist. So at the end of the day, it's all about how do you automate the storage management functions. This includes all the way from provisioning your storage system, discovering your storage system functionality, all the differentiated capabilities we should be able to discover them, and then group them into some logical buckets, and then consume them using the data type lifecycle management. You want to create volumes, you want to create shares, you want to create containers. All that good stuff happens to be in the data type lifecycle management. And then you do the routine operations management, which is monitoring and maintenance. For us to have a full-blown SDS controller functionality, you have to have a completely automated phases for each one of the storage management lifecycle aspects. And ideally what we would like to see is these APIs are interoperable so that you can actually plug in diverse provider solutions. So that's kind of the intent. Let's look at how, if you were to do this in OpenStack, considering we are in the OpenStack conference, how does this work? This is fairly high level. So what you see is essentially you have existing OpenStack investments. That includes Cinder, that includes Manila, Glance for maintaining the images, and then of course you have the object stores. So the controller essentially plays a role of automating the storage management functions by taking advantage of the existing building block functionality that is already out there. So the intent is not to replace them, but rather take advantage of the existing investments and continue to build upon specific storage management functions that do not exist today in the OpenStack umbrella. So I covered the data growth challenges. I talked about what could be the good framework for the software defined storage, how the controller actually plays a role in creating this intelligent allocation of resources and, you know, servicing the applications in the software defined infrastructure. So we have actually done a couple of things. I think in the last year when we presented this, we talked about let's go build the prototype. We looked at what are the key building block functions that do not exist, what the customers that we talked to, they prioritized provisioning and discovery are probably the two most important aspects that we should look at. And we picked a couple of open source flavors, right? Instead of actually going and doing a few things, we said, okay, let's go pick provisioning. Let's look at both proprietary as well as the open source one. So first off, we have a software called Virtual Storage Manager, which is the open source provisioning tool that intellectually led that effort. And it's available in the 01.org, which is an open source tool. So we picked that first, which is the storage system for the scale-out block object and file. And then we picked SwiftStack Controller, which actually provisions the Swift clusters and we used those two elements as a way to understand how do you provision and create a generic framework if you want to provision a storage system, if you want to stand up petabytes worth of data very quickly, how do you do that? So the intent was to understand, figure out what's the best way to do it, and then evolve the framework that is generic enough so that this provisioning software vendors can actually plug into this framework. We also talked about, we also did the discovery. So the intent of actually doing the discovery is to figure out what is the best way to inspect the storage system and look at all the functionality and then bubble it up to the data center admin so that the data center admin can actually use that intelligently to compose storage pools. So what is the best way to do that? What should be the generic framework? So those are the two things we actually did in the prototype. In fact, it's there in the booth, if you're interested, you can actually stop by, take a look at that prototype. What ideally we'd like to do is take a look at this prototype, figure out the best way to move certain pieces into the OpenStack native projects and then implement natively. And then the other piece is, as I mentioned in the beginning, EMC actually announced open source controller project called Copperhead. So it is not CopperHD, it is Copperhead. Essentially, it's a community-driven development model. You have the link there. Essentially, we are looking for three different things. One is contribute to the community, that's one. And the second thing is OpenStack integration part. If you can actually contribute on the OpenStack and make it better together, that will be ideal. You can also develop rich plugins on top of Copperhead that you can actually use that as a differentiated capability as part of your offering. And then the last one is obviously you can write a native driver. So those are the three things you could immediately jumpstart actually contribute. So those are the two things we did on the real implementation perspective. So the prototype is one aspect and we are going to move selective pieces to OpenStack and we are also going to move selective pieces to Copperhead where it makes sense. And then you can jumpstart your contribution as a controller that's already available. You can take advantage of it. So let me switch over to the... I will hand it over to David. He can actually talk about the Oregon State University IT challenges and why it makes sense to have a SDS controller to manage the storage infrastructure. Thank you. Okay, my mic's on. Let me give you a little bit of background about Oregon State University so you kind of understand how we intersect with the storage management problem and our particular sorts of challenges. When you think about a university, typically you may think about the place that you go to. You may think it's not that large. But to give you some numbers on Oregon State University, we have an operating budget of just under a billion dollars a year and we educate about 30,000 students. But in addition to that kind of very direct student engagement with those students, we also engage with about 2 million other Oregonians who are either taking advantage of extension programs or other kind of informal education things that we offer. But what is I think characteristic of Oregon State University is that we're what's called a land grant institution, which means that we have a kind of very broad mission to take research and the knowledge that we develop and to engage with the economy of the state of Oregon. So not only are we an education provider, if you will, but we're in actually a lot of other vertical markets. So we have a College of Pharmacy, for example, that is involved in drug discovery projects. We own 10,000 acres of forest land. We own ships. We own farms. We have locations in just about every county across Oregon. So we're not only an educational entity, but a research entity with a very kind of diverse and extensive geographic footprint and fairly sizable operations. So in our trying to support that as the Central Information Services Information Technology organization at Oregon State University, we maintain an extensive infrastructure, multiple data centers, and a diverse, virtualized and storage environment that includes everything from the sand as the scale-out storage items and the other things you see there. And we use our infrastructure and our challenge with trying to support a very diverse set of workloads on campus that extend from all the traditional sorts of transaction-driven administrative applications, our ERP system, for example, faculty and administrators doing analytic projects. We have a lot of virtualization. We're expanding in virtual desktops very rapidly. And to highlight then, I think, the one that's really kind of driving the storage growth, that's in the research space. About every year, we receive about 300, just under $300 million in various forms of grants and contracts to do research work. And as a result of that activity, we have quite a few faculty involved in a number of different research technologies that generate immense amounts of data. And these technologies are improving constantly. So, for example, if you've seen a gene sequencer before, they have historically been fairly large units that were in a few specialized labs. They're getting out to the point where they're actually expected in the next few years to be the size of a little bit USB memory stick. And that they would go out and be available at just about every lab doing biology work. So where you see the kind of data here, for example, like with 20 terabytes per hour out of traditional current technologies and gene sequencers, and then you imagine, you know, we start thinking about those things, populating every biology lab across campus. You know, and then, you know, in addition to just sort of that one discipline, we're doing a lot with drones, just about everybody who could put a drone in the air and the university is thinking about how they do that, whether they want to fly over forests or the ocean or agricultural fields where they're involved in the sensor equipment and the imagery, you know, devices that they can be attached to drones. So we've got people really kind of engaging and embracing the Internet of Things and the sensor platforms that are coming along with it. And they're not people, you know, while they're generating massive amounts of data increasingly, they don't have any kind of prediction, they don't have any forecast of what that looks like going forward. So they aren't people that we can go to and say, you know, tell us how much data you're going to generate next year or how much you need for your application. We're kind of in the business right now generating as much as possible to figure out how much they actually need to determine certain, you know, answers to certain kinds of problems. So we're really kind of driving a lot of data consumption. And with the long-term digital archive, these activities at the university, the data generation is kind of in the context where for scholarly purposes, sometimes we want to preserve all that data that's generated, but then we also sometimes get into compliance and other legal issues for maintaining that data as well. And so we're, you know, we have to really kind of preserve some portions of this for the long-term. So, you know, as we kind of look at that situation at campus and try to make sense of it from an IT technology strategy point of view, you know, we're looking at that kind of, you know, growing storage. But one of the things we realize and have kind of committed to is that, you know, our core operating budget for IT is going to remain flat so that we have to manage that storage and its growing volume while holding constant the number of staff involved in managing storage. So the stuff that Reddy was showing before in terms of the amount of storage that each increase in the amount of storage that administrators have experienced over the years, that's only going to continue. And we're committed to trying to continue supporting that kind of enhanced growth and the amount of stuff that any of our individual staff members can manage. We're looking at, you know, some other key aspects of our strategy as well. Oregon State University has a very big commitment to open-source solutions wherever we can do them. We prefer those and try to be open where possible. Kind of recognize my colleague or the Lance Albertson from the open-source lab. That's, you know, one example of the university's commitment to open-source and why we believe open-source technology is very important. So, you know, we're not only going to try to, you know, manage more storage with fewer tools and fewer staff. We want to do that in an open-source manner wherever possible. And we also don't expect, you know, say the storage industry to come at us in the near term with some single application, some single solution, some single vendor's product that's going to do everything we need. We think that what we're not only going to have to manage a growing amount of storage with very few people, we're going to continue to have some diversity of tools and pieces within that storage environment because we're not going to be able to just choose the one, you know, Lego system of storage, if you will, which does everything for everybody, and then we only have one vendor relationship. And strategically, it's not going to be desirable just to have one system or one vendor relationship. Strategically, that's not to our advantage. So, looking forward in terms of our plans and expectations, we're going to kind of pay as you grow storage environment. We've done a lot of work, thanks already and others, trying to get the economics right in our storage solutions. We've gotten away from having to make large-time, one-time capex investments in order to get, you know, the cost per unit of storage where we need it to be. So we've got more of a continually, you know, slowly increasing storage cost curve that lets us incrementally sell the storage capacity to the faculty at fairly low rates, competitive with other kind of solutions that they can find. And we're playing with OpenStack, Swift, we also use NetApp. We have Left Hand and some other tools as well. We continue to optimize for diverse workloads. You know, as we are creating our kind of central IAS cloud facility, we're attaching those workloads to different policies and SLA expectations. And we've been working with Intel on the SDS controller. And we, you know, we're very excited when we heard about Copperhead as a project because we anticipated that some of that functionality might be something that was only going to be available to us in open source in several years' time through a lot of continual effort and development work. But now it looks like we may be able to move more aggressively on that and have that feature set to manage our storage more rapidly. And so now we are kind of excited about the possibility of getting our hands on that and start doing some testing in our environment and looking at the challenges and the process of integrating it with some of the systems that we have. Thanks, David. So as you heard from David, your data center infrastructure is not uniform. You're going to have more than one storage system deployed in a data center. Your growth is going to be, you know, at a clip where it's not sustainable with the flat head count, essentially looking at how do you take advantage of software-defined storage wave, specifically with the controller that can actually manage storage resources while giving you the flexibility to add capacity on demand at the same time, you know, control the OPEX cost. So that's where we see the software-defined storage evolving over the course of next, you know, a few years. We believe the SDS framework is extremely critical for us to actually reach that and the best way to do that is in open-source way, right? So the copperhead is the starting point and I strongly encourage you guys to actually participate in the community as well as contribute. Okay, so with that, I'm done with the presentation. So questions? So do you guys agree with the software-defined storage definition framework? Do you believe that the controller can help if you were to have this open-source implementation or interested in knowing the feedback? And by the way, whoever asks the question, only those are actually part of the raffle thing. So it's a caveat. So thoughts, questions? I think, you know, until we have the the question was in the environment that we have at Oregon State University, how would we leverage the controller and copperhead? And I think, you know, that's kind of why it's the next step for us is testing it and seeing how it fits into our environment because you know, we haven't been deploying anything like that to date and so I think, you know, I don't want to say too much until we actually have our hands on and try to use it. Yeah. So I mean, I have reasonable amount of exposure on the copperhead, you know, both internals as well as the functionality. So if you were to just look at step back and look at your infrastructure, let's say if you have VMware environment, Microsoft environment, OpenStack environment, you have few storage systems and in the David's IT environment, NetApp stuff and SAV, Swift, Scale.io and other flavors of storage back ends. What it gives you is the mashup of managing all these storage resources in a very uniform way so that you can actually consume them in different orchestration stacks. So functionally that is available today. You know, the question is going to be how do you take that, try out in your production environment and feedback to the community to say that by doing this, I'm really able to benefit. Here are the gaps and we can work through the community to enhance the product. But the goal is that we have to automate the entire sequence of steps, provisioning your storage systems, expanding capacity and diverse workloads coming into the environment, student workloads, research workloads and you should be able to have that auto provisioning within the controller to be able to not do a whole lot of OPEC stuff. You don't have more than one resource or two resources and if you keep expanding the capacity, that's where the controller will be able to help you. You can increase the capacity by a few clicks. You can retire the capacity. You can actually look at the health of how the storage systems are performing, whether your workloads are actually meeting the SLAs or not. That's where we want to go. Copperhead essentially gives you a few years of work. By the way, this is actually in production. Viper Controller is the production software that EMC has spent quite a bit of effort in developing that for the past five years or so. So you get to actually take and jumpstart that. It doesn't mean that you have all these automated steps but the goal is to get there. It may be around 70% but the goal is to kind of build the remaining functions to give you that entire end-to-end automated storage management functionality so that your OPEX costs go down. So the question is how do you plan to integrate Copperhead in the OpenStack implementation? So there are multiple choices which is good in my view. So you can actually take the Copperhead current plug that as a Cinder driver or a Manila driver and you will be able to manage the storage resources and everything works within the OpenStack environment. So that's one model. The model that we are looking at is you get jumpstart but if you really want to automate a lot of storage functions, how do we do that? And we are going through the process of what's the best way to enable integration into OpenStack but at the same time you are getting the benefits of automated storage management that do not exist in the OpenStack today. Things like storage system expansion of the capacity your virtual pool health service level objectives as a way to describe your application workloads, policy stuff to manage your data movement or data life cycle management. Those pieces we want to be able to take advantage of and we are trying to figure out what's the best way to integrate. So here is the link. You should be able to get to most of the stuff that you are looking for. The source code is not posted yet. It will be posted in June time frame so that's officially announced. But you will get a glimpse of what is the license and how is the you can subscribe to the mailing list getting the notifications and all that stuff. And the plan is I think the second question was how do you get to the copperhead and see what's going on so that's you can use that web link. The second thing is what is going to happen once you open source it. One of the things that the community is going to work. There is a plan for us to actually publish a few things that we have in mind and laundry list of features that you would like to enable. But the overarching intent is to if you look at the slide that I was talking about all the storage management functions and how do you automate them. We are going to look at all the gaps and make sure we prioritize them. The ones that are highly impactful are the ones that you are going to see the list that community together can develop and enhance the software. So you will see the roadmap. So the simple answer is you are going to see the roadmap but it's not really a EMC team or Intel team or State University team coming together and say here is a roadmap and we are going to do it. It's a community effort. We want the community to actually come back and say here is what we should do to address the storage management pain points. Goes back to the questions about how do you know that it is really addressing. And that's where your input and feedback is really helpful. We are almost out of time. Maybe one more question. There is one. Good. Thank you. Thanks for coming. So we have Copperhead team. Do you guys want to stand up so that they know who you are? So feel free to talk to them. You can ask all kinds of questions to them and they should be able to answer. They are the experts. So thank you. All right. Got. I know they should be the ones getting it. It's 6213158. 6213158. No? We got a winner here. We have a winner. We have a winner. Oh, man. I guess you can do the honors. Oh, okay. There we go. Enjoy. Congratulations.