 Live from San Francisco, California, it's The Cube at VMworld 2014, brought to you by VMware. Cisco, EMC, HP, and Nutanix. Welcome back to San Francisco, everybody. This is Dave Vellante and this is The Cube. We're live at VMworld 2014. This is our second day. We'll be here all day today and tomorrow, Wednesday. We're in Moscone South, stop in. We're in the lobby, just on the right-hand side. VMware, as always, has set us up with this really awesome configuration. It's probably our best Cube day in terms of setup and branding and signage and space. It's really good flow, so thank you to VMware. So we have been digging in to Flash in a big way and talking a lot about storage architectures and we've got a special guest here today. Simac Nazari is here. He is an HP Fellow, the architect of 3PAR, product set that we've been following for quite some time. In fact, when 3PAR, Simac was being acquired in that bidding war with Dell, we were here at VMworld. It was predicting what the next round of increase would be. You must have actually, that was a bit of fun time for you. Yeah, that was such a roller coaster ride for about two weeks. But a good roller coaster ride, you know? Every morning, it was a high game, so yeah, there was no lows then. That should happen to everybody. So anyway, welcome back to theCUBE. It's good to have you again. We spent some time last week in Boston. Thank you guys for coming out. We did the analyst briefing. It was good. I thought a lot of interaction. There are probably almost 30 analysts there. That's right, that's right. Really good discussions. So I want to sort of start with the big question that everybody has always had. Three-par, the architecture, you were instrumental in making that happen. And we've talked about this, you and I, in the past. When then provisioning was coming into vogue, you guys had sort of perfected this notion. And everybody else said, oh, we have that too. They came on and they bolted it on. Everybody said, well, no, no, we're great. We're just the same as three-par. But it turned out it probably wasn't so much the case. It really was sort of a check-off item for them. The architectures really were fundamental and made a difference. Now we come to this flash. HP said, well, we don't need to acquire an all-flash array company. We've got the architecture for that. And everybody said, oh, come on, this is a bolt-on. Why was that not the case? People thought that was the case. It appears it wasn't. Why did people miss that fundamental? So I think a lot of it has to do with the mindset of the software that gets generally developed for firmware-based appliances, right? They tend to have a very fixed architecture. They don't really follow good software design model with well-defined interfaces between layers. They don't define the kind of virtualization layer, rate layer, scheduling layer, memory management layer. It all sort of mishmashed all in one sort of infrastructure. And that's been sort of the model prior to three-par. I think people think about three-par as having, you know, innovated in white-striping and sort of, you know, having all the system resources in use at the same time. But some of the innovation is also around thinking about the software as a server-operating system, right? So when you think about something like Windows NT, it's been around for how many years now? You know, it was on a single socket, single core, you know, CPU at the time. And years later, it's running on, you know, eight sockets, you know, 20 cores per socket. And you don't think that nobody comes up and says, well, it was designed for a single core system, right? The fact is that if you set it up correctly from the beginning, and if you architect it correctly from the beginning, then it can evolve as the hardware evolves, right? And so that was the mindset. It was really a bunch of server designers and operating system designers from Sun that started the company, right? And from the very beginning, we just had that mindset of designing a server operating system, not so much a storage operating system. A storage happens to be a personality of the operating system itself. Really, that's the best way to think about it. Such as saying, well, how come Sun could never get storage, right? But the Sun guys, when they pop out of Sun, could get it, right? Because they kept purchasing the stuff as if they were developing stuff. I think that's part of the problem. I think they had, you know, a lot of people working on the OS and on the storage side, they sort of kept chasing the dream of buying something, ready-made, that I think that was part of the problem, I think. It's amazing, actually, when you look at the history of this business. I mean, I used to have conversations with McNeely all the time and say, you got to get your storage act together. Right. Yeah, well, we're going to buy this company, you're right. That's right. And IBM's same thing, you know? That's right. After IBM's, you know, sort of got kicked in the knee by EMC those years, you know, I used to talk to guys like Bill Zeitlin and say, well, you got to get storage right. What they do, they outsource it to MyLex and, you know, LSI, et cetera. Exactly. So like you said, it's a mindset, but it's just something you can be saying that Windows actually, Windows Server is actually architected very well. Exactly. Which is probably a bunch of ex-digital guys. Exactly, exactly. Not like HP, actually. It turns out, Digital's who got, like you said, HP now. Right, right. But they had that systems level of expertise. Exactly. And you're talking about, you know, componentized architecture. Exactly. But so that, so now, carry it through to help me understand the answer to my question, which is, how does that relate to Flash? You're saying that you were able to then, sort of morph the system, so it looked like a born-in Flash right here. Right, so the best way to look at it is that, you know, we have the logical disk layer and then we have sort of the physical disk layer and then we have the virtualization layer, right? So what we did, we actually took, you know, a systemic approach and said, okay, what would the Flash mean to the physical disk layer? Well, things have to be, you know, much lower latency. So we looked at how did we actually DMA the data back and forth? So we made a bunch of optimizations there. The logical disk layer had to be changed because now we realize that, you know, we had optimized it for disks where it was hopping, you know, from disk to disk every 128 kilobytes. Well, you know, that's because you wanted the sequential IO to hit the same disk, right? It turns out that the sequential IO jumping from disk to disk doesn't really have a penalty for SSDs because they don't have a seek access. It turns out that that was designed to actually be able to actually go to much smaller module sizes of 32 kilobytes. So we can now hop from drive to drive every 32 kilobytes. Then you go to the virtual volume layer and like, okay, what do I do with the virtual volume layer? And we had to do a bunch of optimization having to do the remote DMA. In fact, there were some features in the ASIC that we had not turned on because we never really needed the performance. This is sort of going back and designing the architecture early on, thinking that we will need these if the performance curve gets there. And that's exactly what we did. We had not needed it, but now the Flash comes into play. We go and turn that feature on. And then we actually pick up a 30% latency improvement in all of you just because we never really had needed that particular IO path. So it really is sort of going through every single layer and say, what does this really mean architecture wise? And if you put it all back together, then you have an architecture that is much more optimized than it was. When did you start this? Was it in 98? 99. It was 99. That's right. Okay. So 1999, you weren't thinking about Flash, I presume, right? No, nobody really knew exactly. I mean, we thought how could we build the fastest possible systems assuming there are thousands of drives behind us, right? So now you won't need thousands of drives behind you. Maybe you need 20 of them to actually get the same performance you got years ago from thousands of drives. But the point is that we built a system that was highly configurable, well layered. So you could actually go and apply optimizations at different layers without necessarily worrying about other layers. So I'll give an example. When the new line drives came out, we had to go back in because the drives are not particularly reliable, we had to add rate six to our system, right? Well, it turns out adding rate six because it was so isolated from the visual volume layer, you just added it and then all of a sudden it could actually migrate data back and forth between rate five and rate six. Almost like a service. As a service, exactly, right? And this was one of those things. And the other nice thing is that it could actually convert these things from fat to thin to fat, rate one to rate five, rate six, from FC to NL. And all those things are because each layer has a well-defined API when it talks to the other layer. It's like a service, right? Okay, well, you know, you're changing the quality of service behind you, but since the API doesn't really change from layer to layer, you actually can actually innovate from within each layer without necessarily impacting the entire system or breaking the entire model. Okay, so we're here at VMworld. You hear, we get inundated, of course, with software defined, obviously. That's right. It's the hot new trend. Yeah, you guys use an ASIC inside your system. You said something last week was interesting. Every time we build a new system, we look at it, okay, do we still need to do the ASIC? It's not like we like building ASICs. I wonder if you could sort of recount that conversation. Why don't you like doing ASICs and why do you do ASICs? Right, so ASICs are quite expensive to build, right? And mistakes are really costly, right? I mean, they're recompiled in a sort of operating system environment world. It's a 15 minute exercise, right? But in an ASIC world, it's a six month exercise and costs a million to two million dollars, right? So every time we do an ASIC, we step back and say, what are the properties of the ASIC that we actually need? And can some other off-the-shelf component, maybe a CPU, maybe some off-the-shelf engine that you can buy from a third party do all the functions that we have? And the answer is yes, you could, but it turns out the system becomes so complicated and the boards become so large and it becomes cost-prohibitive. And the vendors that we go after, they keep changing their mind about whether they want to be in that business or not, right? For instance, we looked at some of the features in Intel, we looked at some of the features in some of the off-the-shelf CPUs and they keep, you know, and they really have not been, you know, constant enough for us to be able to depend on them, right? And so we always come back to say, well, we really need these features, we really can't get them in a package small enough and cost-effective enough so we have to actually go back and build them. And there's a risk mitigation aspect of that, too. Exactly. They might not be there. Exactly, there's a programming model that is well developed, we understand it well, and they have to really fit that programming model. It's a storage device after all of your, you know, the way I describe a storage device, it's kind of an idiots-of-and box, right? You know, lots of IO bandwidth, very little computing need because it's not really operating on an IO. So you have to, you know, there are some very specific, you know, so other people that build this stuff, they kind of build a balance system between IO and the processing. You need very little processing, you need a lot of more IO bandwidth. So in a sense, it has some very specific properties that it's very difficult to just get from off-the-shelf stuff. That's interesting, I mean, you're right. That was that, I mean, for decades, and it even continues, that notion of balance between IO and processing has been fundamental to the architecture of the subsystems. That's right. You're just saying you broke that model. Exactly, exactly. Because we are much more of an IO processing engine. We are not really interested in the contents of the bits that are going by, right? There is very little processing that gets done on that data. So in fact, when you look at the actual memory bandwidth you need, you know, we have a lot more memory bandwidth that you can get from a normal two-socket system. You know, if you need the same memory bandwidth, you have to go to four-socket system to get the same memory bandwidth. But then you have a lot more compute power than you really need. So the CPUs are sitting there idle and you're just putting them there for the memory bandwidth. Well, it turns out that because our ASIC has memory, you actually get the memory bandwidth necessary without necessarily having the extra compute sitting there doing nothing. So in the early days, you did a lot of things to sort of address the performance drawbacks of spinning disks like your wide striping and you dealt with the efficiency issue with things like thin provisioning. And then flash comes along and you're saying you had to do a lot of work. That's right. You know, it took a long time and some resources to actually get there, but the architecture accommodated that. Now, last summer you came out with your first instantiation of the all-flash array. That's right. And relative to where you are today, and even at the time, myself and others said, wow, it's kind of expensive, it's kind of okay at this, okay at that, it's okay at latency, okay at IOPS, you know, et cetera, et cetera, et cetera. Sort of middle of the road and pricing was, or the cost was okay. Sure. And now all of a sudden, year later, you're sub $2 a gigabyte, you're for the industry best in class in latency, et cetera. So what happened? So a lot of it has to do with the fact that we take a very systems approach to the design, right? And that's an advantage that we have over a lot of other people where they buy off-the-shelf hardware or they have lots of hardware, but they have very little software, right? It seems to be too design model where there's people buy just off-the-shelf hardware and a lot of their IO stack actually comes from the Linux community and they buy a lot of software in the middle, right? Since we have a systems approach up and down, we can actually talk to the firmware when there's on the HBA side, the firmware when there's on the drive side. We've been actually been developing stuff with the drive vendors on the one end and the HBA when there's on the other end. So if you just continuously look up and down the stack, I told you that we were looking at everything up and down the stack. That stack extends to the actual partners that work with us on the drive side and on the front end HBA side. And that's really been our approach, right? Every time we put out a product, we run a series of benchmarks and tests. We look at what are the opportunities for us to improve? And that's really the continuation of that process. We're sort of almost like a Toyota model. Every year we actually look at this stuff that we could do better and we actually go and modify and make them better. And we're not done. I think there is a lot of stuff that is left for us to be done. And I think this has been interesting to actually step back and peel it open and say where can I actually modify stuff to get them better? So can we talk specifically about the cost reductions? And obviously there's MLC, there's higher density devices, there's other data reduction technologies. Can you talk specifics? Sure, so the perfect example, initially the cost was, you know, prohibitively high partly because we were using really expensive MLC devices, right? And in terms of, now we have lots of telemetry information coming back from the systems in the field and we understand exactly how they're being utilized. And so we went to commercial grade MLCs and then this is something that we did with our partners. We looked at the actual, what we call right amplification that is happening in these devices and what can we do to actually do that? It turns out if you look at the EMLC and then CMLC, the biggest difference is just really around the amount of data that is set aside for over provisioning of the drive so the drive can effectively do wear leveling on the device itself, right? It turns out our white striping actually gives us an opportunity here because what happens, you know, we actually set aside a portion of each drive as a spare. So if a drive fails, we are reconstructing to all the other drives. Kind of unlike sort of the old RAID model where you have a single drive that is set at hot spare and if something fails, you just rebuild drive to drive. We actually build drive to spares and the spares are distributed across. But it turns out the spares aren't being continuously used. They're just essentially sitting there idle, right? So now we talked to our vendors and said, you know, what if we tell you that space is not being used currently and just use it for your wear leveling, right? So it is commercial grade drives but they actually have enterprise grade sort of wear level space available to them so that during the normal operations, only during a failure operation, do we tell the drive, I need that space back on a temporary basis so we can reconstruct. And then once the reconstruction is finished and the dead drive is replaced, we actually put the data back and this space is then, you know, given back to drive for its wear leveling. So this actually provides us two opportunities here. Number one, the performance of drive is much higher because the drive doesn't have to work as hard to find empty space to actually land the data in. And number two, it actually improves the actual wear on the drive. And we could actually see this because we are actually measuring the internals of drive and sort of the right amplification that is happening with and without this adaptive sparing technology that we have. And so that's something that has allowed us to sort of go from these really expensive drives, you know, lots of experience talking to the vendors and now we're able to actually provide sort of this, you know, quantum drop in pricing for the customers. Okay. And then you've obviously added some new data reduction technology. That's right. So the duplication is what we are adding and again, the duplication one of the things that now we have lots of data about who uses VMware versus other. And then we kind of now have a sense of what sort of data reduction we would expect and what would that do to the monorite that happens to these drives. So we're comfortable that there will be a significant drop to the drive rights and therefore the overall lifespan of these drives are comfortable to actually warranty these for five years for the customers. And that technology is what something you guys developed inside of HP, obviously. That's right, absolutely. Is it something that came out of HP Labs, was it? So we actually had a lot of conversations with them. So there are bits and pieces of the algorithm that are borrowed, but a lot of their algorithms are designed around streaming data. Ours is sort of random access block data. So great for backup, but not necessarily. Exactly. There are bits and pieces that you could actually use having to do with their chunking, but there are bits and pieces that you really can't use and it has to be and plus we have our own ASIC and we're actually using the CRC engine from the ASIC and we do the inline computation of the CRC and they use the actual CPU to compute the checksums. So there are some advantages for us because we can actually do that in line using the ASIC. So what do you see as the future of spinning disk as somebody who's a technologist, you think about these things and obviously you've got to implement, get products to work. I know you spent a lot of time with your team making that happen, but if you step back and look at the future of the impact of flash, the disrupted nature, the share shift that's going on, what do you see as the future for spinning disk? So they will have a, I think it will have like a tape tail. I mean, there was a time when backup to disk showed up and that all of a sudden changed the model and a lot of people started backing up the disk, right? As opposed to backing up the tape. But the tape is still there. There's lots of reasons why people, I mean, the density of the tapes are still quite high, right? And the dollar per gigabyte of the tape is still much better than the dollar per gigabyte of disk. So there are some, so I think I see a really long tail to the drives getting replaced with SSDs, but there will be a clear class of applications that will be switching to SSDs. And I think you can think of your database, sort of your standard Oracle databases converting to SSDs and not looking back. All right, so we're out of time, Simac, but I wonder if you could just sort of bottom line at first, what should we be paying attention to in terms of what's going on in your world with 3PAR, with the products that you're developing? What should observers be looking for? Sort of indicators of success or things that we should be paying attention to? So it really is a question of, are you able to take the architecture and keep adding interesting new features or interesting new functionalities or personas? So for instance, one of the things we're working on is the ability to provide file services from the block store, from inside the block store as opposed to some sort of hardware add-on. So the question is, can we sort of take this, modularized design and keep adding and extending it and then having it more to be able to solve problems that in the past you would have needed two or three different set of solutions to actually solve? That's really our benchmark for success. Can we sort of consolidate it all into a single platform and actually have it meet a whole lot of different needs as opposed to a single purpose need, great. All right, Simac Nazari, thanks very much for coming on theCUBE. It's great to have you. Thank you. All right, keep it right there. Everybody will be back with our next guest. This is theCUBE, we're live from VMworld 2014. We'll be right back.