 Good morning everyone. My name is Star and I'm going to be joined by Abhijit Gholay to jointly make this presentation. So we are going to talk about the markets, technology, where do we go from logical block addresses and touch upon intelligent disks and quickly talk about a product that we, a technology that we announced earlier this week. So you would all have seen this picture. There's a huge growth in data. IDC has predicted that the market will grow from anywhere from 11, sorry 1.1 zettabytes to something like 11 zettabytes in the enterprises. The person who talked before me said that I mean anyone can predict a number but most of us are going to be wrong. It's very hard to project what that number is going to be but all it could be is it's in order of magnitude, you know, greater than what we can ever think of this data size going to be. Now this is a huge opportunity for guys like us, you know, we get to sell a lot of storage. But while we do that we also see a few challenges that perhaps could, you know, inhibit or that could cause challenges in delivering a solution to this marketplace. The first challenge we see is if you look at the IT spending growth, the IT spending growth has pretty much been flat or projected to be flat the next few years. So we are going down from anywhere from 6 and half percent all the way to 2 percent year over year growth in terms of IT spending. So the storage spending has been, you know, has had its ups and downs but we see the storage spending also pretty much being flat. So we have this data explosion and then we have this challenge in terms of keeping the cost low. The second challenge that we see is that if you look at the HDD capacity trend, between 1997 to 2007 we saw the capacity grow from anywhere from 20 gigabytes all the way to 1.5 terabytes. So that's an order of magnitude 100x growth in terms of capacity. Now we take these, you know, plot the same thing over a 10-year period starting from 2007 to 2008 all the way to 2017 or 2018. We see the capacity growing from 1.5 terabytes to we are in 2015 we are at about 8 terabytes and we see this number getting to something like 12 perhaps 15 by 2018. So if you look at it from a capacity growth that's like 10x. So what does that tell us? The key information here is that the dollar per gigabyte drop in terms of pricing from in 97 to 2008 was significant but we don't see the same trend continuing. So we have this huge data explosion. We have IT spending sort of remaining flat and then we see the disk drive technology is growing but not at the pace at which we want it to grow to get better economies. And the other challenge that we see is also from a media standpoint the disk is going from, you know, the traditional parallel magneto drive to we are going to shingled magneto recording which causes or which brings its own challenges which means the old software stack should change as well to take advantage of this technology. And the last thing we see is from a workload standpoint we have had, you know, the traditional workloads so far being deployed in the enterprises and if you look at the traditional workloads typically leverage block or a file infrastructure. As we go forward with some of the emerging applications we see the trend is a little bit changing where the emerging applications tend to deploy a new kind of storage infrastructure which is driven by object storage. So we are moving from something like a scale up infrastructures for the classic applications and we are moving to new kind of storage infrastructure with the scale out architectures. So with the data explosion happening with the IT spending being flat and, you know, the dollar per gigabyte trend is coming down but not as significantly as it used to be a new kind of applications emerging. The question is can traditional storage, can it address these challenges? So with that I will hand it over to Abhijith. So good morning, my name is Abhijith Gholay and I am going to attempt to make a case for more intelligent storage. So I am going to begin with this conjecture that storage devices are evolving. But if you look at the disk drive it is difficult to tell what exactly is evolving. Certainly they have been growing in capacity by leaps and bounds over the past few years but apart from that what they do has not seen much change. So possibly similar to evolution of life this change in the pace of evolution of disk technology is imperceptibly slow and that is why it is very difficult to tell what is actually changing. But if you look carefully in a logical sense you might be able to discern at least two eras of disk storage evolution. So now bear with me I am going to stretch this metaphor a little bit. So for life there was an early era that is when multicellular life started on the planet. There was about 500 million years ago. Most of that life began in the oceans and the diversity of life was kind of limited and the location was fixed. It was limited to the oceans. The early era for disk storage started with a fixed location for data. It used to be identified physically by cylinder head and sector. So we may call this the location centric era. For life the early era lasted about 250 million years and then began the middle era of life. So in this era we saw a lot more diversity in life. Massive reptiles like dinosaurs walked on the planet and life actually moved from sea to land. So it was available in both places. So for disk storage the middle era began with the LBA the logical block address. So fortunately it didn't take 250 million years. Base of evolution for disk storage is about 10 million times faster. But the logical block address allowed us to let the disk drive decide where to put the data physically on the platter. So we may call this the location agnostic era. So with the introduction of the LBA in the last 25 years or so disk drives have thrived and they're available in all sorts of capacities large and small. Different speeds and feeds with hard drive as well as SSD technologies. So for life after another 250 million years of the middle era there was the new era and that was called the Sinozoic era. This was the age of mammals and also the age in which intelligent animals like primates and humans evolved. So for disk storage it may be now is the time to create something more intelligent. And this we may now be at the dawn of a new era for disk storage which is more of an intelligent era. And we believe with the key value interface we might be creating something akin to the opposable thumb for the primates. Because just as the primates grew significantly in their intelligence with the opposable thumb. It's possible that the key value interface is endowing the disk drive with a data awareness which will lead to more intelligent storage. So the question is why now and why the key value interface. I think right now there's a confluence of multiple factors that is creating a time for change. Of course the first factor is Moore's law. This year is the 50th anniversary of Moore's law. And as you are well aware Moore's law has led to an exponential growth in computing power available for cheap. The second factor is that while Moore's law has increased the CPU performance. The disk storage performance has not kept pace and there is a huge gap between the performance of the CPU and performance of storage. Even though the capacity growth has kept pace performance is still lagging. The third factor is that to keep up this capacity growth we are inventing new media technologies. Both in magnetic recording as well as in flash. And LBAs are not the most efficient access method for these media technologies. The fourth factor is that as you are getting these bigger and bigger disk drives. The rebuild time for RAID or replication or any of these recovery mechanisms is becoming very very large. And that is becoming a serious management challenge. And finally the fifth factor is that this huge explosion in unstructured data is creating a serious challenge for data governance. And any method by which we can classify the data as it gets created will tend to make this issue a little bit easier to deal with. So with these five factors I think there is enough of a case to create a new approach to disk storage. So going back to the Moore's Law. If you look at the impact it has had on the data center versus personal devices there is a quite different impact. On the data center we have stayed with sort of a client server architecture. And there is no significant change there other than making the server CPUs bigger and bigger. Whereas on the personal devices we are making smaller faster thinner and smarter devices. So it may be time to borrow a page from our friends in the personal device industry. And change the architecture of our data centers. So instead of having few large CPUs controlling storage how about we create lots of low power CPUs and distribute them with storage inside the disk drive. So what does that bring us? What that creates is a match between the compute speed and the storage speed. So we go from having these large CPUs that are very very fast but most of the time they spend waiting for the storage to something that is much better matched. Where we have CPUs inside the storage that are not these massive compute resources and they are not waiting for the disk drive all the time. And I think this key value interface is the ideal interface for such intelligent disk storage. One of the main benefits of having this key value interface to storage is what I mentioned earlier the data awareness. Now you can query the disk drive almost like a database and that can have very powerful ramifications. So to summarize what are we getting from these kinds of new storage devices. Of course we are getting a lot of compute efficiency because now we don't have these big servers or big compute CPUs waiting for storage. The second is we are enabling the transition to these new media technologies which are not optimal for LBA. We are enabling faster recovery from disk errors and failures because instead of having to do a raid or replication of the whole disk drive. We are able to do granular recovery from lost objects which is a much more faster and fine grained way to recover from errors. And then finally with the query capability and metadata that is associated with the key value interface. We may be able to enable much better data governance because as you create the data the disk drive already knows what type of data it is. So with that I want to introduce the technology that Toshiba announced this week. We are calling it the key value drive and it's the industry's first integrated compute high performance SSD and HDD storage combined in a single 3.5 inch form factor. So with this device we can actually not only provide a key value interface but we can actually run applications on the drive itself. So you can imagine sometime in the future data analytics being run on the drive itself. So you don't have to pull the data out of the drive to process it. The drive is already aware of what's stored in there and you can process that data. Somebody earlier mentioned the problem of data gravity and that is exactly what gets addressed by putting intelligence in the drive. We are actually demoing this technology in our booth today. We've been demoing it for the past week. What we are demoing is actually a chassis containing 12 of these drives. 8 of them are running a SEPH cluster inside a 1U enclosure. And that SEPH cluster is being used as a block storage device using RBD. And we are actually demoing a Windows VM running off that RBD. And we have a PowerPoint presentation running in that VM. But we are also showing a performance demo that shows the performance of this kind of a drive. So if you haven't done so already please visit our booth and check this technology out for yourself. And thank you and I will take any questions you may have. Could you please come to the mic? Thanks for the session. And the question is can this particular thing will be made available to the public cloud when there's like AWS or Microsoft? Or how does, you know, because I think since the world is moving towards that, where do you see this, you know, being used in a big way or the right way? Certainly. I think it would, we would want it to be used everywhere. And I think with things like IoT and much larger volumes of data foreseen, I think it makes sense to use this kind of drive going forward. As I mentioned, we think eventually this is the direction storage should head. You just announced this technology this week. So, yeah, so we'd be talking to quite a few customers and in a public cloud windows could be sort of one of the potential targets to talk about this particular drive. No, thank you. Thank you. Hi. You mentioned integrated computer capability in the drives themselves. What is the runtime in that? What kind help do I need to encapsulate my program so that I can run it on the drive? It is actually running Linux. Okay. So as long as you can run on Linux. Is it 64-bit ARM? It is a 64-bit processor. ARM, I guess. We are not revealing that right now. Okay. Are you releasing an SDK for that now? Not yet. Not yet, but yes, that's part of our plan. Sometime in the future we would be doing that as well. Okay. And drives you are releasing now, are you releasing a full portfolio now including SSDs and hard drives or only the hard drives? So the drive that's being demonstrated in our booth currently has two 2.5-inch hard drives integrated. And the capacity could be, you know, today the max capacity that you want to drive is about three terabytes. And that is a roadmap that grows from three to whatever the next logical number is. So you can see that as an evolution where we would be taking some of those existing 2.5-inch drive technologies onto this drive. In terms of SSDs, we do have two M.2 SSDs on this drive. So the M.2 SSD is primarily used for metadata. And we have two of those so that we can have a copy or a minute copy of the metadata on the other drive. Whatever is the balance capacity that's available on the SSDs that could be used for read caching as well. And it has a 64-bit 4-core CPU integrated. So did I get this correctly? So we have a 2.5-inch form factor. We have two 2.5-inch drives, two M.2 SSDs and a processor. Correct. Okay. It's in our booth. It starts to make more sense now. Okay. So are you, would you say this is a microserver? Well, I mean the connotation with microserver is that it kind of feels that you can run any application including traditional business applications as well. And we have sort of stayed away from that positioning. What essentially you can use these drives are for running any storage software, right? So today we are running Ceph, right? We can run Swift. We can run any kind of storage software on this drive. So the thought process is that having these as intelligent drives would allow us to run other applications like Adoop perhaps that supports a key value interface. So anything that supports key value, you can run that application on the device. But if you're looking at general purpose compute, this is not it. That's our positioning, but the industry, the market can use it any which way it wants, right? You mentioned Adoop and I think it's a good idea anyway. If it has enough compute capacity in there. The compute capacity is a one gig four core 64 bit processor running a Debian Linux on it. Okay. Thanks. Thank you. So this product is in a way unique, but I have to ask how does it differentiate itself against your competitors in the market like the Seagate Kinetic? So the question is how do we differentiate this product against our competition? So we're not going to talk about our competition. I can tell you what is that we have done this, what we have done differently here, right? So the one, the approach we have made is that we looked at the complete market and we saw there are opportunities that that's there in the cold data archival tier where certain products makes more sense. And we thought there's an opportunity in the middle, which is, you know, there is high performance market that is going gravitating towards SSDs. And there is this low cost capacity tier that is going towards near line hard drives and technologies like in a ethernet drives. So there is a middle market which is looking for, well, you need performance, you need capacity, but you're not willing to spend that much of dollars in terms of acquiring SSDs. And that's where we think this product fits. That's one target market that we see. The other thing that we see is that, you know, as the unstructured market or unstructured data market explodes in the enterprises, one storage that is going to become common is object storage. Object storage in terms of footprint is going to be a lot more larger than the block or the files infrastructure that you see in enterprises. So our thought process was if we, you know, if we deliver an object storage solution and then if you allow a little bit of a processing power so that you can run some block and file system softwares on top of it. So the enterprises have the opportunity to take advantage of the low cost object storage infrastructure to build the block and file infrastructures on top of that so that there is some cost optimizations. Because one of the trend that we see is that the budgets are not growing and we are not, the drive prices are coming down but not as fast as we want it to be. So we see there's an opportunity there. And that's where we thought we want to build a product and that's where we wanted to go. And that's a reflection here in terms of this key value drive solution. What kind of performance can I expect from those drives? So the question is what's the performance on these drives? You can visit our booth and we can demonstrate to you from in a 4K, 64K, 1 megabyte IOS, what kind of throughput you can see. So today with the 1 megabyte throughput we are demonstrating about 110 megabytes on these drives. And the nice thing about the key value drive is also you're not no longer a block device so you can go down to like 8 bytes and see what kind of IOPS you can generate. So we do about 10,000 to 11,000 IOPS with the 8 byte IOS size. And it's running live in our booth so please do come to our booth. And how much should they scale when you add more of those drives? Where is the limit of the scaling? These are Ethernet drives so the current chassis that we are using is a 1U 12 drive chassis. So we can have 12 of these drives and you can stack them up and they are independent. So your performance is not, you're not ganging these drives to get performance. Each drive can deliver whatever the capacity it is. The scaling is not dependent on the drive itself. It's on the software that may run on the drive or on top of the drives. So Ethernet is Ethernet right? That's correct. So we're pretty limited by Ethernet, by the network. So if I look at... If we're using Swift or CIF or something like that. Right, but in the case of Swift you know you have the client that is sending data to the drives but it can send data to as many drives as Swift can support. So the same limit is Swift, okay? Thank you. And then from an Ethernet infrastructure standpoint the chassis that we see, depending on the workloads that you go after, right? So there are chassis that gives you sort of... You have 40 gig on the front and you have about 24 gig inside in terms of the 12 drives having dual ported. So there you're seeing sort of one is to one or less than that for all subscription. So if you look at capacity-based infrastructures, you're getting into different workloads, then you can have one is to two, one is to three or subscription on Ethernet as well. Where you can have 20 gigs coming into a box and you could have 60 drives and they're all not performing at the level that you want. But if that's what your workload demands are, you should be fine with that infrastructure as well. This is absolutely the new market and we are seeing a lot of interest in the OEMs and system vendors to build Ethernet chassis. So you're going to see a different... You're going to see new kinds of solutions that are coming out, catering to various different workloads in terms of performance. Thank you. Thank you. Sorry, sorry. So yesterday I actually visited your booth and then saw the demo and it was cool. My quick question about the cost and the bone perspective, I saw that you display this 48 terabytes. I think it's two U or one U. One U? Yeah, there are four terabytes with a 12 drive there. Compared to the cost of bone, for you to install quad core, one gigahertz and then this drive SSD to make it that 48 terabytes versus let's say you have commodity hard drive, 48 terabytes, and then by the commodity Intel Gion based server, we thought it to gigabyte, DRAM. Do you really think you can beat the price? I think it's an exercise that you should do yourself. We've done it. No, because I don't know your expectation about your cost, right? So that's why I'm asking. I understood. So the question is, what is the TCO on this? Right, right, right. So yes, we have done that work ourselves. And what it points to is that it's to an extent workload dependent. So today there are two challenges that we are trying to address. So the one challenge is, let's say you take a Xeon box, you throw in 60 drives. You're going to, yeah, the dollar per gigabyte on that box is going to be pretty low. And we think we can address that price more. Now, if you go from 60 to, let's say 80 or under drives, then this could be a little bit skewed to, because it depends on what's the number you're looking at, what that is, what throughput you're looking at. Now, the biggest question that we have is that the one thing that we talked about is the failure domain, right? So with a 60 drive on one server, if you don't care, I mean, if you don't care about the data that is residing on that and you lose that box and you're okay, that's fine. Okay, because the moment you have object storage, you're looking at three replicas, now one replica is gone, what are you going to do about it? So you're going to try to rebalance it and if you're rebalancing what's the impact to your performance and how much of, you know, loss of performance that you can pay. So that's a fundamental question we want to answer. And if that, you know, that answer to that question is like, it's okay, I can deal with that performance. But then you're looking for really cost-optimized solution which is driven towards capacity. But if you're looking at a different kind of a failure domain, your data is important, then, you know, we make a lot more sense and that gives you a better, with that feature, you can pay premium in terms of dollar per gigabyte to us. Okay. Thank you. Thinking about how to put this kind of technology into the context of a large-scale storage environment, it seems like, it's almost, it's like, it's well overdue somebody did something like this. That's the first thing. But the second thing is, you know, there are a bunch of details and there's a whole bunch of inertia in the industry towards block storage and towards RAID and towards things that people understand for doing resilience. And I guess the question I have for you guys is how prepared are you for answering the arguments that are going to come and be pushed against you from all sides in trying to get this kind of thing adopted? I mean, I think that's why the analogy to evolution, right? Right. It's inevitable. It'll happen. It'll happen sometime. Absolutely. And I mean, some other things, like, for example, the argument here where you have, like, you didn't use exactly these words, but it's like, you said, when an object's not there, it's not a system failure. It's a cache miss. That's kind of like the analogy that I think is really appropriate there. But that implies a higher level scheme for, you know, integrating each device into an overall resilient platform. Have you got that story straight yet, do you think? It's definitely something that we are aware of and we're working towards resolving, right? But this is what scale-out object storage is all about, right? It needs now to be aware of devices like these and take advantage of, you know, the intelligence that can reside there. From outside, we are trying to work with all of the ISPs and the ecosystem partners to, you know, want to share with them what is that we have and how they can build, you know, a better infrastructure and, you know, work with them in order to get this out of the market at some point in time. Do you think that if you had an existing CEP cluster, you could just start adding some of these nodes on the end of it and kind of incorporate it into the same cluster? Is that, I mean, just thinking about the migration path if this looked like a good idea for the future? Well, technically, maybe possible, but chances are that, you know, the version of CEP that is run on the drives optimally might be different. But, yeah, if you run the same version on the drive, yeah, maybe you could. But if you want it optimal, you're probably going to run a different version of CEP. So this is a question about integration with CEP. You said that you can run CEP on your disk. Instead, we have two options that are journaling and caching and take advantage of SSD. So are you doing some kind of special customization to run CEP on your disk or how to deal with these performance issues that we usually have in CEP? Good question. Yes, I think we are looking at a modified or a new version of CEP that will use this key value interface. And we are aware that the CEP community is working on a new storage backend that is taking advantage of this kinetic protocol and things like that. And that will be the version that will be better suited to work on our drive because it will use our key value software that's running on the drive that optimally uses both the spinning media as well as the flash. So to add on top of what Abhijit just mentioned, the kinetic APIs that we support both inside the drive and outside the drive. So what I mean by that is you can have a self cluster that's outside talking kinetic APIs to the drive. It doesn't have to run, the OSD doesn't have to be inside the drive, it can be outside and still can talk to the drives using kinetic APIs. And the other option is running the OSD on the drive and then there's no change. It still talks the same kinetic APIs. So if I run the OSD on top of the drive, I don't need to take into account all the resources that CEP is usually consuming. For instance, one of the problems that I have when I do the sizing for CEP is that I need to keep in mind that I need one CPU per OSD and also I need to have at least one gigabyte of RAM per Terrap, at least. So those kind of resources are the ones that I'm saving. And you need an SSD for journaling, right? We have all of that covered inside, yes. It's all covered. Great. Thanks. Any more questions? Are there any other use cases for this Describe or is it just customized for CEP or is it just one of the initial implementations? So the question is are there any other use cases? The reason we decided to announce this as a technology and not as a product at this point is to kind of put it out there and we have some ideas where we want it to go. And that's one of the reasons we are doing CEP OSD demos on this and we think Hadoop with Keyval you could take advantage of this. But we really don't know beyond that where, you know, what the market is and how it could be deployed. And this probably is the best place for us to come and talk about this technology and see how the developers can take this forward into newer application areas that we haven't thought about. I just want to confirm my understanding of, from architecture standpoint, this is one of those hard drives is need one network address, I assume it's IP address. And it has only capability of throughput of about 10 megabytes per second and has limited capacity of what's today's size of the biggest hard drive? For 4TB, 6TB? Yeah, what we are demonstrating today is 4TB and it could be 6, it could be 8 in the future, yes. So these are the drives we can do today, right? Correct. But each drive from architecture standpoint is, you know, the front end is the Ethernet network. So Ethernet network can deliver with a one gigabit interface, we can get about 115 megabytes a serve. And from architecture standpoint, we have matched that in order to meet the Ethernet in a throughput capability, we have about two hard drives plus then SSD and compute in order to saturate that wire. And we have dual ported, that means I can do, while I'm handling some of the front end data, I can also do peer to peer to peer, and I can move data through the back end as well. So we have tried to architect it to make sure that we saturate the gigabit Ethernet wire as much as possible. All the speeds and feeds are kind of matched. So in other words, for any workload which we need very larger data throughput, we have to break down to the pieces to each of the IP addresses. Correct. And because we are Ethernet, you know, nothing limits, I mean, this technology right now what we announced, right? We are not, it's not a product, it's a technology. So you can think of this as going from one gigabit to 10 gigabit to 25, 40 under, what have you in the future. So it's basically taking advantage of Ethernet and you're putting together an infrastructure that can saturate the back end and the front end and make sure they are balanced. So for this kind of hard drives, IP based, what other storage services that drive can provide the application to like encryptions, like... We can do all of that. Yes. We have, you know, when we went about designing this product itself, we looked at, you know, some of those requirements, like encryption being one, compression being another. So we sort of make sure that there is enough processing power for all of those that doesn't impact whatever you see. So it's going under the hoods and there is no impact to the performance. Thank you. All right, thank you. I think we are running short of time. So we'll be at the booth and we can definitely please come by and we can definitely talk. Kiyoko, you're ready to run the app? I'm ready to run the app. All right, all right.