 Okay, we're back live at siliconangle.tv, the cube, here at Stratoconference day two for us, day one for the actual conference. Keynotes are opening up. People are flowing through all the luminary guests since keynotes are all doing their thing. Crowd is packed and the show is completely sold out. ExpoPass is sold out and we're excited to bring you live coverage all day long, eight hours a day, interviews and commentary from siliconangle.tv, siliconangle.com and wikibon.org, our research team. And we're going to break it down. There's a ton of live streams all over the internet with O'Reilly putting out all the keynotes, all the sessions, they have their news desk which is a cube-like function that they do for O'Reilly which is book type interviews, how-to's and some commentary from O'Reilly. But here you're going to get the independent, hard-hitting knowledge, pure knowledge here on theCUBE and we're excited to bring that to you. I just want to say to the folks out there that you can find us on Twitter, the hashtag is Stratoconference. And that's where you can find all the commentary and conversation and of course siliconangle.tv can bring you the coverage live. Silicon Valley is known for predicting the future, for inventing the future and Strata is really the place here where big data is taken center stage and really entrepreneurs from all around the world, not just Silicon Valley are here in force, really taking their hands on big data. We would not be able to bring you this coverage if it wasn't for advertisers and we are now 100% advertising support with theCUBE. We want to thank our supporters who have put up advertising and you'll see videos throughout the day. This is a first for theCUBE and we're excited by that and we want to thank Cloudera. We want to thank Digital Reasoning, MapR, and did I forget one? 1010Data. 1010Data, there it is. So those are the guys. You're going to see videos throughout the day and thank them, they're big supporters. They allow us to do our thing all day long and I'm excited. I'm John Furrier, the founder of siliconangle.com and I'm joined with my co-host. Hi everybody, I'm Dave Vellante of Wikibon.org and as John said, this is day two for us, day one really of the kickoff of the Stratoconference and we've got a great guest here today. We're talking about one of the hottest spaces in the storage business. Storage is of course the linchpin of big data. Scott Dietzen is the CEO of Pure Storage. Scott, welcome to theCUBE. Thank you for having me. So you are in the Flash space. It doesn't get any hotter than that. Everybody's jumping in, even the big whales and so let's get into it. So we're here at Strata. We want to talk about the big data angle but maybe we can start off by just talking about, you know, Flash almost overnight. It's just taking the storage world by storm. Maybe for those who aren't, you know, so familiar with it, what's the big driver for Flash? Why is it all of a sudden so hot? Well, as you're considering a data workload that it's a good fit for Flash, the first thing to look at is how much random IO are you doing inside of this workload? Because Flash is two to three times faster than disk. It's sequential workloads, but it's two orders of magnitude. Think 100x faster on random IO. So if you look at workloads like online transaction processing in a data store, if you virtualize your databases, if you're doing complex analytics in the Hadoop family, if you're relying on HBase or planning to rely on HBase, all those workloads tend to be random IO intensive and your CPU is just horribly mismatched with your hard drive for random IO. From the CPU's perspective, hard drives today look slower than tape did 15 years ago on those random IOs. So your CPUs are a lot more efficient if they're not sitting there waiting and context swapping until this disk comes back eons later from the CPU's perspective. So that's where I would start. You know, is random IO intensive workloads? The other thing to look at is deduplication in particular. There are new techniques out, Pure has been pioneering this, that allow us to reduce a data set in real time. So we can get four to five X data reduction on typical structured database workloads, even more for virtualization workloads. And four X is significant because at four X we can deliver MLC at or below the price point of hard drives. So when you can get flashed down below the price point of hard drives, it's dramatically more space and power efficient as well as being two to three X faster sequentially and a hundred X faster for random work. So MLC is the less expensive flash, right? I want to say cheaper because, you know. But it's also less reliable, but you got more capacity to deal with things like we're leveling, is that right? Is that the concept there? It is. So multi-cell level flash is, it tends to have more variable performance, right? Especially if you're mixing reads and writes on the same drive. And it also tends to support fewer write cycles. So flash wears out over time. You get so many writes. But these problems are absolutely solvable in software. Much like we did with disks, right? What storage arrays traditionally did was meld a bunch of semi-reliable hard drives into a much more reliable, more performing whole. We think that same work is being done now around flash. I'm old enough to remember those hard drives weren't that reliable. It is. And flash is solid state, right? Without mechanical moving parts, there's a lot less to wear out. And it's actually very predictable. When you, we have managed to work around every flash issue we've seen. You know, our two and a half years of existence. You know, we've had several dozens of customers that have been running workloads for two years. We have literally never had a flash device fail in a way that we couldn't automatically recover it. So you got like the CPU running at the sane bolt speed, and then you got the disk drive going really slowly, like it's got cement on the shoes, okay? And so it's the only mechanical piece that's left in the computer architecture, and essentially, you're talking about doing away with that for all high performance activity, or most high performance activity. I think that is the key. What you want to do is get mechanical storage out of the performance path. No one is saying disk doesn't belong in the data center. Disk absolutely is in the data center for capacity workloads and archiving backup. And I think it's going to continue to do heavy lifting like video streaming and compressed file system workloads, like we see in MapReduce workloads, because the flash cost is not that dramatically different there. But again, from the CPU's perspective, going to flash is like going down the street to a library. Waiting for disk is like getting in a canoe and trying to paddle across the ocean. So John, you remember data domain? You know, talking about Silicon Valley startups, data domain, hot startup, they had the bumper sticker saying, tape sucks. And what I'm hearing is disk sucks, so we need to really address that problem. Well, and in fairness, right? I mean, data domain absolutely use that the tape sucks, but what data domain helped do was redefine the role of tape. Tape got pushed further back into offsite storage and long-term archiving, and disk was used for the critical path. And the thing we loved about the data domain model is they use data reduction to make the media people wanted hard drives affordable, right? Delivered at or below the price point of tape. And that's what we're striving to do with pure storage for flash, is to make the media people want flash, which is so much denser, faster, and more power efficient, but it suffers from being too expensive. We can use data reduction to get the price point below hard drives. And then why wouldn't you use flash for all these workloads? Scott, Scott, you've been, you've been a very successful entrepreneur, rock star, some say it in the valley, hate that word, but you know, you've been very successful, but you've been in the business talking to customers. You've been involved in mega trends before. So obviously flash is a mega trend, there's no real debate about that. Question I have for you is, as you're out in the landscape doing this work, what are you seeing in terms of where the market is? On a scale of one to 10, 10 being fully mature, one being just getting started. Are people truly embracing flash? We know that they're using it. Fusion IO went public, there's a lot of big clients, but the average IT shop and big IT shop, I mean, how far has it gone and will it go and where is it in terms of adoption? Fully from a system standpoint. So in the data center, flash is still at its very early stages. You know, with Fusion IO IPO last year, we got some visibility into some large consumer website deployments. Google's rumored to be using it for their instant search, for example. And you know, obviously we know Facebook and Apple are deploying it at scale. But you know, we think the critical model is to get, again, the flash price point down and we think all the 15K drives in the world are ultimately going to get replaced with flash. And that's total of about 10 exabytes a year. The consumer flash market is already 20 exabytes a year. And by the way, that's where the demand seems to start, right? People love their phones, they love their tablets, the performance, the power savings is very visible to them and the reliability is there for those workloads. Why not get this stuff into the data center if it can be made cost effective? From an evolution standpoint, Dave likes to talk about S-Curves and all that good stuff. But really, you know, it's getting out there so it's still in the early days. From an architectural standpoint where people are actually reusing flash from an architectural standpoint, where is the status of that kind of engineering? And like you said, you have background in the systems and systems chops as we were talking before. How much work needs to get done from an engineering standpoint to come in and make flash and commodity hardware, quite frankly, really rock and roll. I can have four cores, I can have master slave, I can redefine the architecture for IO database work unstructured, structured. So how much more work is there to do? What's your view on that? I, again, I think we're at the very early stages, right? So we're delivering flash now in form factors, you know, with Fusion IO's cards, for example, and their competitors there, you can insert a Fusion IO card into your server and for software that's been designed to use the server as the store of record, you know, it's an easy fit. The problem for a lot of system software that's out there, database, virtualization solutions, is they're really designed to leverage shared storage for high availability, performance management, consistent snapshots and so on. These are things that, you know, really are very hard to accomplish in the server tier unless you spend a lot of time in distributed systems expertise to get there. So, you know, we think there's gonna be a huge push for flash in traditional servers and there are two form factors. The one that Pure is doing is all flash and using data reduction to get the cost down. The alternative form factor is where your mixed flash and traditional hard drives into a single device. You know, we jokingly call that hierarchical storage management because that's what it is, right? The challenge with mixing flash and disk is you never know what you're gonna get. You issue an IO request, sometimes you're fortunate enough to hit the flash cache and other times you've got to wait on disk and you see a one or two magnitude order of magnitude latency spike. That's no fun up the stack. But, you know, there's one other point I wanna make. I'm really excited about flash getting broadly into the server. One would expect that Intel over time is gonna put flash on the motherboard and we're gonna have very fast interconnects between the CPU complex and flash. As that happens, I think much of the system software in the world is going to evolve to take advantage of that local flash cache. I still think the store of records gonna mostly be in shared storage because for flash, the economics run in reverse. Flash is actually cheaper as a network appliance than it is as a server answer. I think what you just said was so exciting because people who might not understand that, unpacking that is that as Intel moves to this new architecture, whole new of innovation will create around that. I mean, being an operating system geek myself and a data geek now, I mean, it's just limitless. I mean, so what possibilities, I mean, take your CEO hat off of your storage for a minute and talk about the entrepreneur. Looking forward, what does that enable? I mean, we're talking about a completely new redefinition of what an operating system is, how to construct systems with data here at Big Data. What kind of opportunities are you seeing that as an entrepreneur, you go, man, there is, when we get to the top of this mountain, we're gonna look out on the Vista and see valleys of great opportunity. What are you gonna see? It's a fabulous question. The thing that makes the valley so exciting, especially right now is, we've talked a little bit about Big Data, we've talked about Flash, those are two of the dislocators that are sweeping through, but there are a bunch of others, right? And we have cloud and virtualization, making everything elastic. One of the things that's extremely exciting to me is once you have an elastic data store, there's no reason not to mix online and analytic workloads together because if you need to scale them, you just add CPU and storage and the whole system just goes. So you're not in this situation where current customers would love to do certain analytic things, but they can't afford to because their batch cycles just don't give them enough time to run these things. If it's just incremental scale, you just decide what your budget is and what you can accomplish it in that budget and you can change the budget. I think the other profound change is mobility and tablets sort of have become the client of record replacing PCs for all but the most information-intensive users. So you look at that, it's a clean sweep, right? The client changes, the server environment changes because of cloud storage changes because of Flash and Big Data. So it's going to be very tough to be an incumbent in the coming years. What do you think about networking because all this talk about conversion networking and obviously storage is in that equation as well as compute and server, but the network now is the bottleneck and so there's really kind of new bottlenecks kind of shifting with this innovation. It's like jacking up one side of the car and then the other side needs to be jacked up. So changing all the tires, one of the networking is talk about virtualized networks. What's your view on that? I mean, it's still kind of an open book and I know a lot of investors are looking at things like open flow and other techniques because if you have unlimited compute and access to multiple data sets at the same time and kind of the things that you're doing, the network's the bottleneck. So what's your view that? So I think it'll take issue with network being the bottleneck. By far the slowest thing in the data center is mechanical disk, right? And it's consuming 40% of the data center's power. If you look at a typical latency curve, it's 90 plus percent of the latency is waiting on that disk head to spin. I would say on the network side, we are seeing profound advances, right? I was talking to the Melanox CEO recently. He talked about his customers jumping from one gigi straight to 40 gigi, that's a huge shift up the curve. And with Romley CPUs coming out from Intel and much more scalable bus technology interconnecting that, I think we're gonna be able to take advantage of this new networking technology. And the good news for us is all this pressure from CPU and network advancements put that much more pressure on the storage, which is going to be flash. Okay, I would have kind of agree with you there, but I don't want to kind of argue that point. But assuming that flash replaces all the mechanical parts, maybe we can talk about that in a few years. But you've got a long way to go. You have a lot of headroom. You said it's early on in flash. Obviously you have a big vision with pure storage and that's a good market share if you can replace all the moving disk. You're go to market as a business. You've got to kind of tackle some kind of blocking and tackling stuff. What are you guys doing for your, as you go into your customer base, what's the biggest problems that you can tackle right now? Is it the de-dupe? Is it that environment? What specifically is a compression? So you got to start somewhere. So what's your entry beach head that you guys are nailing down right now? One of the most crucial things for a startup is honing a repeatable cell. And by repeatable cell, I mean that you can quickly identify a customer pain point and recognize if you've got a solution that will fix that customer. So you can then arm your channel partners and your sales force with the information they need to identify great candidates for your product and ones that are gonna be a fit, right? Cause it's really expensive to engage with a customer that your technology doesn't prove a fit for. I mean startups make more mistakes by saying yes to customers than by saying no, right? If you can walk away because something isn't a perfect fit, you're better off and finding those other customers that are that ideal fit. So for us, we've really worked hard, by prospecting over two years of our early access customer program to hone the repeatable cells. And for us, it's structured data workloads. So, you know, traditional SQL databases because we get great data reduction on them, order four to five X data reduction. That allows us to deliver flash below the price point of hard drives. The other workload is server virtualization. Server virtualization has even more redundancy, compressibility in the data set, which allows us to get, you know, in some cases, eight to 10 X data reduction. So we can go in at half the price of what customers are spending on their traditional hard drive storage today. And that's without any flash caches. That's without counting short stroking on the disk. And it's not without counting any of the power and space savings that comes out of flash. I love those kinds of value propositions, right? When a competitor would have to do more than discount their product to free, they'd have to write the customer a check for it to be in the customer's interest not to take your product. So I wonder if we could talk about the market because it's, I've been saying it's hot. Let's talk about that a little bit. Valuations are, you know, going through the roof. Are they mostly private companies? You know, obviously Stack is public and there's a couple others, but you saw some acquisitions earlier. Is the market starting to figure out the difference between sort of, you know, the analog being a disk drive supplier and say a storage systems company? And in other words, you've got guys like Stack that were going through the roof. You look at the chart and it looks like, you know, 1999 all over again, but it was just a couple years ago with all the sort of EMC action and now that's sort of settled down a little bit. At the same time, you've got startups coming out like Solid Fire, I don't even have a product out yet and you know, a lot of buzz, a lot of talk. You guys, you know, smoking hot. So are the valuations fair? Of course, you're CEO. You're going to say, yeah, they're a little undervalued, but let's talk about the market size. So talk a little bit about, you know, the size of the market that you guys are going after because as we all know, the size of the market really makes the valuations attractive, right? Indeed. So, you know, the really fun bit as an entrepreneur is finding a big market that's about to experience a dislocator and getting a great team together. The scale of this market is very large. Just the performance storage market. This is, you know, where some application is waiting on the result and so it's, you know, it's crucial to do it in as quickly as you can. That market is $15 to $20 billion of annual spend. You know, and it currently is predominantly going to buy hard drives today. We don't think those customers are getting enough for their money and, you know, that's what, you know, creates this opportunity to come in and sell flash. I would say this market has changed pretty dramatically to Gartner has the all flash market is growing to order $3 billion in by 2015. Literally a year ago when you talked to analysts, they didn't believe there was going to be an all flash market. They thought the market, you know, it was going to be flash caches added in front of disk tiers. But I think it's literally been the demonstration of being able to do this data reduction and use MLC to get the price point of flash down below disk that's changed the way people are thinking about the market. So, okay, sorry. I was just going to say, certainly, you know, those in the space doing M&A understand the difference between the technology providers. I mean, we work with Stack, for example. They're one of our technology suppliers and those like us that are systems companies, right? We predominantly take commodity hardware components off the shelf hardware components. We're not trying to innovate in hardware. We're relying on Stack, we're relying on Samsung or other suppliers for that technology. And we focus most of our energy in software, which is frankly what the storage business at the systems level is as a software business. And Fusion IO would say the same, right? I mean, they see themselves as a software innovator. I'm sure, you know, SolidFire would say the same. So, I want to test those numbers a little bit because Wikibon published and Mark, if you could bring that up, the Flash Memory Summit roundup. I'm looking at table one, actually. So what David Floyer did, David did this, and we were talking about this a little earlier, basically broke down the market in terms of flash storage on servers, which would be the Fusion IO piece. Flash-only arrays, which would be you guys and some others, and then traditional storage arrays, mainly SATA. And he broke down the storage capacity that's gonna, you know, by 2015. How much of the capacity is gonna go on each of those? The relative pricing and the percent of enterprise spend. And you can see he's got, you know, small piece, 3% going to flash storage on server, but it's expensive. It's, you know, 12X SATA drive, so it's gonna be 20% of the spend. For you guys, he's got the fat middle. That's the big part of the marketplace. 11% of the capacity and because it's 6X, the price of spinning SATA, it's 35% of the market. Now, that equates, if you assume a $35 billion market, that's about 10, 12 billion. Okay, so significantly larger than what Gartner forecast. So I think the lawyer scenario is that prices are coming down faster, and then to your point, why wait? And if you can get flash in your PC like I have, you don't have a disk drive in there. And then disk drive, as you were saying, data domain changed the whole use case for tape. You're trying to change the use case for disk, kind of pushing SATA to the bit bucket, so. Exactly right. So it's a matter of timing. It's not an if, it's a when. Gartner has 3 billion by 2015. I don't know, David's numbers are significantly larger than that. So, you know, we'll see. We'll come back in a couple years and talk about it. You know, I can tell you as an entrepreneur, it's like the Grand Canyon. I've been there about a dozen times. And every time I go, it's bigger than I remember it, right? So, you know, this market opportunity from, you know, an entrepreneur's perspective, it feels unbounded in terms of the demand we're seeing out in the marketplace. So it's going to be a really exciting time. We're going to see a bunch of fiercely contested, you know, innovations and startups going up against the big guys. And it's, you know, it's what Silicon Valley's all about. So how do you differentiate from all the other sort of flash-only guys out there? What's your bumper sticker there? Well, so let me talk about differentiation along two dimensions. Most of the time, we end up competing with traditional disc in our customer engagements, right? So there we're, you know, we're taking all flash and we're putting it up against a, you know, a disc legacy solution that's got a flash cash in it. We argue that the big problem with that is the variable response time because, you know, every time you fall through cash, you've got to wait on disc. And if it, especially as you start putting 7,200 RPM SATA drives back there, you may be waiting a very long time to get your data back when it doesn't fit in that flash cash. But by taking price off the table, right, if we can deliver that all flash at or below the price point of just the hard drives without the flash cash, then it truly becomes a new brainer. So if you flip the equation over and then you look at, you know, the other companies innovating around the flash area, for us, the differentiators are we've pushed the cost curve more than anyone else, I believe, on two fronts. One, we try to stay at the leading edge of the, you know, the lowest cost MLC that we can get. That means it's the least reliable in terms of how many times you can write it. But we've been able to use our software techniques to deliver, you know, on par reliability for that more economic flash in the box. And then the other thing is, we've demonstrated data reduction that, you know, delivers results in 500 microseconds to a millisecond. I don't believe there's any technology out there where those results are demonstratable, right? Our competition generally turns off what data reduction technologies they advertise when they're benchmarking the product. We don't. When we put up numbers, you know, 200,000 IOPS, that's in conjunction with deduplication and compression going on. Very efficient, no latency there. Yeah, and there's an existence proof now that you can do this really fast because this has been the holy grail of primary storage for, you know, order 10 years. And now we can show that with flash we can do it. Scott, great to have you on. You can come on theCUBE anytime. You'll see us around. We're going to be also at Hadoop Summit. We'll also be at the HBaseConf that Cloudera's putting on, HBase Conference, as well as EMC, World of the Emerald, all the normal other tech shows. So pure knowledge from the CEO of Pure Storage. Thanks so much for coming on theCUBE. Final parting question is take a personal look at what's going on right now and share with the folks out there watching. In your mind's eye, share with them what's happening right now in this tech business and then talk about what will be different next year at Strata when we come back. As an entrepreneur, as an investor, as a CEO, as a person, what's happening right now in the tech scene and what's going to be different next year? Well, let me use that to focus a little bit specifically on big data. I'm a board member at Cloudera. I'm blessed to have that opportunity. It's another big market and another incredibly gifted technical team which makes it hugely fun for me. I'm really excited about the emergence of HBase. I think we are going to look back and see a lot more big data go online and support incremental update and evolution. HBase as a technology allows you to do a lot more things with big data. So for example, you can do a big hosted SaaS deployment because the changes are incremental. It's not like you're trying to do these huge batch cycles and bring yet a new copy of the online data in for analytic processing. And it lets you mix more of the online workload in against the same big data infrastructure. I would say that the other trend I'm really excited about is big data ISVs. Companies, especially partial to Hadoop, given the Cloudera connection, but we're seeing a bunch of software vendors and SaaS vendors offering value added solutions on top and making big data analytics that more accessible to a broader population of users that don't wanna have their own map reduce programmers on staff to write these things. So I think we'll look back at this year at the next strata and see that success. And John David, I would just like to thank you very much. Always love to come on theCUBE. You're great and you're more than welcome to guest write on Silicon Angle if you want to anytime. And of course, if you're at an event where we're here, you're always welcome to come on theCUBE. Love the knowledge and experience and thanks for sharing that with everybody. Appreciate it. Yeah, Scott, thank you. Appreciate you coming on.