 In 2010, Wikibon predicted that the all flash data center was coming. The forecast at the time was that flash memory consumer volumes would drive prices of enterprise flash down faster than those of high spin speed hard disks. And by mid-decade buyers would opt for flash over 15K HDD for virtually all active data. That call was pretty much dead on. And the percentage of flash in the data center continues to accelerate faster than that of spinning disk. Now, the analyst that made this forecast was David Floyer. And he's with me today along with Renan Halleck, who is the founder and CEO of Vast Data. And they're going to discuss these trends and what it means for the future of data in the data center. Gentlemen, welcome to the program. Thanks for coming on. Great to be here. Thank you for having me. You're very welcome. Now, David, let's start with you. You've been looking at this for over a decade. And frankly, your predictions have caused some friction in the marketplace. But where do you see things today? Well, what I was forecasting was based on the fact that the key driver in any technology is volume. Volume reduces the cost over time. And the volume comes from the consumers. So flash has been driven over the years by initially by the iPod in 2006, the Nano, where Steve Jobs did a great job with Samsung in introducing large volumes of flash. And then the iPhone in 2008. And since then, all of mobile has been flashed. And mobile has been taking in a greater and greater percentage share. It to begin with, the PC dropped. But now the PCs are over 90% are using flash when they deliver. So flash has taken over the consumer market very aggressively. And that has driven down the cost of flash much, much faster than the declining market of HDD. OK, and now, so, so, Ren and I wonder if we can come to you. You know, we've got, I want you to talk about the innovations that you're doing, but before we get there, talk about why you started VAST. Sure. So it was five years ago. And it was basically to kill the hard drive. I think what David is saying resonates very, very well. In fact, if you look at our original presentation for VAST data, it showed flash and tape. There was no hard drive in the middle. And we said 10 years from now, and this was five years ago, so even the dates match up pretty well. We're not going to have hard drives anymore. Any piece of information that needs to be accessible at all will be on flash. And anything that is dormant and never gets read will be on tape. So OK, so we're entering this kind of new phase now, which is being driven by QLC. David, maybe you can give us a quick, what is QLC? Give us the bumper sticker there. There's 3D NAND, which is the thing that's growing very, very fast. And it's growing on several dimensions. One dimension is the number of layers. Another dimension is the size of each of those pieces. And the third dimension is the number of bits, which QLC is five bits per cell. So those three dimensions have all been improving. And the result of that is that on a wafer that you create, more and more data can be stored on that whole wafer or on the chip that comes from that wafer. And so QLC is the latest set of 3D NAND flash that's coming off the lines of the moment. OK, so my understanding is that there's new architectures that are entering the data center space that could take advantage of QLC, enter VAST. So that's a nice setup for you. And maybe before we get into the architecture, can you talk a little bit more about the company? Maybe not everybody's familiar with VAST. You shared why you started it, but what can you tell us about the business performance and any metrics you can share would be great? Sure. So the company, as I said, is five years old, about 170, 180 people today. We started selling product just around two years ago and have just hit $150 million in run rate. That's with eight salespeople. And so as you can imagine, there's a lot of demand for flash all the way down the stack in the way that David predicted. Wow, OK, so you've got pretty comfortable. I think you've got product market fit, right? And now you're going to scale. I would imagine you're going to go after escape velocity and you're going to build your moat. Now part of that, I mean, a lot of that is product. Product is sales. Those are the cool two golden pillars. But and David, when you think back to your early forecast last decade, it was really about block storage. That was really what was under attack. Kind of fusion I.O. got it started with Facebook. They were trying to solve their MySQL database performance problems. And then we saw pure storage. They hit escape velocity. They drove a truck through EMC's Symetrix HDD-based install base, which precipitated the acquisition of Extreme I.O. by EMC. Something Renin knows a little bit about having led development of the product. But Flash was late to the NAS party. Guys, Renin, let me start with you. Why is that? And what is the relevance of QLC in that regard? So the way storage has been always, it looks like a pyramid. And you have your block devices up at the top, and then your NAS underneath. And today you have object down at the bottom of that pyramid. And the pyramid basically represents capacity. And the y-axis is price performance. And so if you could only serve a small subset of the capacity, you would go for block. And that is the subset that needed high performance. But as you go to QLC and PLC will soon follow, the price of all Flash systems goes down to a point where it can compete on the lower ends of that pyramid. And the capacity grows to a point where there's enough Flash to support those workloads. And so now with QLC and a lot of innovation that goes with it, it makes sense to build an all Flash NAS and object store. Yeah, okay. And David, you and I have talked about the volumes. And Renin sort of just alluded to that, the higher volumes of NAS, not to mention the fact that NAS is hard. File is difficult. But that's another piece of the equation here, isn't it? Absolutely. And NAS is difficult. It's a large, very large scale. You're talking about petabytes of data. You're talking about very important data. And you're talking about data, which is at the moment very difficult to manage. Takes a lot of people to manage it, takes a lot of resources, and it takes up a lot, a lot of space as well. So all of those issues with NAS and complexity is probably the biggest single problem. So maybe we could geek out a little bit here, you guys go at it. But Renin, talk about the vast architecture. I presume it was built from the ground up for Flash since you were trying to kill HDD. What else do we need to know? It was built for Flash. It was also built for Crosspoint, which is a new technology that came out from Intel and Micron about three years ago. Crosspoint is basically another level of persistent media above Flash and below RAM. But what we really set out to do is, as I said, to kill the hard drive. And for that, what you need is to get the price parity. And of course, Flash and hard drives are not at price parity today. As David said, they probably will be in a few years from now. And so we wanted to jumpstart that, to accelerate that. And so we spent a lot of time in building a new type of architecture with a lot of new metadata structures and algorithms on top to bring that effective price down to a point where it's competitive today. And in fact, two years ago. The way we did it was by going out to talk to these vendors, Intel with 3D Crosspoint and QLC Flash, Mellanox with NVMe over Fabrics and very fast Ethernet networks. And we took those building blocks and we thought, how can we use this to build a completely different type of architecture that doesn't just take Flash one level down the stack, but actually allows us to break that pyramid, to collapse it down and to build a single system that is as fast as your fastest all-flash block device or faster, but as affordable as your hard drive-based archives. And once that happens, you don't need to think about storage anymore. You have a single system that's big enough and cheap enough to throw everything at it. And it's fast enough such that everything is accessible at sub-millisecond latencies. The way the architecture is built is pretty much the opposite of the way scale-out storage has been done. It's not based on shared nothing. The way Xtreme IO was, the way Isilon is, the way Hadoop and the Google file system are, we're basing it on a concept called disaggregated, shared everything. And what that means is that we have the media on one set of devices, the logic running in containers, just software, and you can scale each of those independently. So you can scale capacity independently from performance and you have this shared metadata space that all of the containers can see. So the containers don't actually have to talk to each other in the synchronous pack. That means that it's much more scalable. You can go up to hundreds of thousands of nodes rather than just a few dozen. It's much more resilient. You can have all of them fail and you still didn't lose any data. And it's much more easy to use to David's point about complexity. Thank you for that. And then you mentioned up front that you not only built for Flash but built for Crosspoint. So you're using Crosspoint today. It's interesting. There was always been this sort of debate about Crosspoint. It's less expensive against the RAM or maybe I got that wrong, but it's persistent and it's okay. But it's more expensive than Flash and it was sort of thought it was a fence sitter because it didn't have the volume. But you're using it today successfully. That's interesting. We're using it both to offset the deficiencies of the low-cost Flash. And the nice thing about QLC and PLC is that you get the same levels of read performance as you would from high-end Flash. The only difference between high-cost and low-cost Flash today is in write cycles and in write performance. And so Crosspoint helps us offset both of those. We use it as a large write buffer and we use it as a large metadata store. And that allows us not just to arrange the information in a very large persistent write buffer before we need to place it on the low-cost Flash, but it also allows us to develop new types of metadata structures and algorithms that allow us to make better use of the low-cost Flash and reduce the effective price down even lower than the raw capacity. Very cool. David, what are your thoughts on the architecture? Give us kind of the independence. I think it's a brilliant architecture. I'd like to just go one step down on the network side of things. The whole use of NVMe over fabric allows the users and all of the servers to get any data across this whole network directly to it. So you've got great performance right where across the stack. And then the other thing is that by using RDMA for NAS, you're able, if you need to, to get down in microseconds to the data. So overall, that's a thousand times faster than any HDD system could manage. So this architecture really allows an any-to-any simple single level of storage, which is so much easier to think about architect, use from and manage. It's just so much simpler. If you had a, I mean, I said, I don't know if there's an answer to this question, but if you had to pick one thing, Renan, that you really were dogmatic about and you bet on from an architectural standpoint, what would that be? I think what we bet on in the early days is the fact that the pyramid doesn't work anymore and the tiering doesn't work anymore. In fact, we stole Johnson and Johnson's tagline, no more tiers, only it's not spelled the same way. The reason for that is not because of storage, it's because of the applications. As we move to applications more and more that are machine-based and machines are now not just generating the data, they're also reading the data and analyzing it and providing insights for humans to consume. Then the workloads changed dramatically and the one thing that we saw is that you can't choose which pieces of information need to be accessible anymore. These new algorithms, especially around AI and machine learning and deep learning, they need fast access to the entirety of the dataset and they want to read it over and over and over again in order to generate those insights. And so that was the driving force behind us building this new type of architecture and we're seeing every single day when we talk to customers how the old architectures simply break down in the face of these new applications. Very cool. Speaking of customers, I wonder if you could talk about use cases, customers, this NASA arena, maybe you could add some color there. Sure, our customers are large in data. We started half a petabyte and we grow into the exabyte range. The system likes to be big as it grows super linearly. If you have a hundred nodes or a thousand nodes, you get more than 10X in performance and capacity efficiency and resilience, et cetera. And so that's where we thrive and those workloads are today mainly analytics workloads, although not entirely. If you look at it geographically, we have a lot of life science in Boston, research institutes, medical imaging, genomics, universities, pharmaceutical companies. Here in New York, we have a lot of financials, hedge funds, analyzing everything from satellite imagery to trade data to Twitter feeds, out in California, a lot of AI, autonomous driving vehicles, as well as media and entertainment, both generation of films like animation as well as content distribution are being done on top of best. Great, thank you. David, when you look at the forecast that you've made over the years and Ren and I imagine that they match nicely with your assumptions and so, okay, I get that. But that doesn't, not everybody agrees, David. I mean, certainly the HDD guys don't agree but they're obviously fighting to hang on to their awesome run for 50 years. But as well, there's others doing hybrids and the like and they kind of challenge your assumptions and you don't have a dog in this fight, we just want the truth and try to do our best to report it. But let me start with this. One of the things I've seen is that you're comparing a dedupt and compressed flash with raw HDD. Is that true or false? It's in terms of the fundamentals of the forecast, et cetera, it's false. What I'm taking is the new egg price. And I did it this morning and I looked up a two terabyte disk drive, a NAS disk drive and it was $54. And if you look at the cost of a NAND for two terabytes it's about $200. So it's a four to one ratio. And that's coming down from what people saw last year which was five or six and every year that ratio is being coming down. The ratio between the cost dealt, the HDD is still cheaper. So Ren and I wonder, one of the other things that the lawyer has said is that because of the advantages of flash, not only performance, but also data sharing, et cetera, which really drives other factors like TCO, that it doesn't have to be at parity in order for customers to consume that. I certainly saw that on my laptop. I could have got more storage and it could have been cheaper per bit for my laptop. I took the flash, I mean, no problem. That was an intelligence test. But what are you seeing from customers? And by the way, Floyer I think is forecasting by what, 2026 there will be actually a raw to raw crossover. So then it's game over. But what are you seeing in terms of what customers are telling you or any evidence you have that it doesn't have to be even that customers actually get more value even if it's more expensive from flash. What are you seeing? Yeah, in the enterprise space, customers aren't buying raw flash. They're buying storage systems. And so even if the raw numbers, flash versus hard drive are still not there, there is a lot of things that can be done at the system level to equalize those two. In fact, a lot of our IP is based on that. We're taking flash that today is, as David said, more expensive than hard drives. But at the system level, it doesn't remain more expensive. And the reason for that is storage systems waste space. They waste it on metadata, they waste it on redundancy. We built our new metadata structures such that everything lives in crosspoint and is so much smaller because of the way crosspoint is accessible at bite level granularity. We built our erasure codes in a way where you can sustain 10, 20, 30 drive failures, but you only pay two or 1% in overhead. We built our data reduction mechanisms such that they can reduce down data even if the application has already compressed it and already deduplicated it. And so there's a lot of innovation that can happen at the software level as part of this new directed, disaggregated shared everything architecture that allows us to bridge that cost gap today without having customers do fancy TCO calculations. And of course, as prices of flash over the next few years continue declining, all of those advantages remain and it will just widen the gap between hard drives and flash. And there really is no advantage to hard drives once the price thing is solved. So thank you. So David, the other thing I've seen around these forecast is that the comments that you can't really data reduce effectively hard disk and I understand why the overhead and of course you can in flash, you can use all kinds of data reduction techniques and not affect performance or it's not even noticeable but the cloud guys do it upstream, others do it upstream, what's your comment on that? Yes, if you take sequential data and you do a lot of work upfront, you can write out in very big blocks and that's the perfect sequentially good way of doing it. The challenge for the HDD people is if they go for that, for that sort of sequential type of application that the cheapest way of doing that is to use tape which it comes back to the discussion that the two things that gonna remain are tape and flash. So that part of the HDD market in my assertion will go towards tape and tape libraries and those are selling very well at the moment. Yeah, I mean, the economics of tape are really attractive. I just feel like I've said this many times that the marketing of tape is lacking. I'd like to see better thinking around how it could play because I think customers have this perception of tape but there's actually a lot of value there. I want to carry on. Can I make a little point there? Yeah, I mean, there's an opportunity in the same way that VAST have created an architecture for flash. There's an opportunity out there for the tape people with flash to make an architecture that allows you to take that workload and really lower the price enormous. You've called it FLAPE. There's some interesting metadata opportunities there but we won't go into that. And then David, I want to ask you about NAND shortages. We saw this in 2016 and 2017. A lot of people saying there's NAND shortages again so that's the flaw in your forecast. Prices of, you're assuming prices of flash continue to come down faster than those of HDD but the shortages of NAND could be problematic. What do you say to that? Well, I've looked at that in some detail and one of the important things is what's happening in the flash market and the Chinese YMTC, a Chinese company has introduced a lot more volume into the market. They're making 100,000 wafers a month for this year. That's around 6% to 8% of market of NAND this year. As a result, Samsung, Micron, Intel, Hynix, they're all increasing their volumes of NAND so they're all investing. So I don't see that NAND itself is going to be a problem. There is certainly a shortage of processor chips which drive the intelligence in the NAND itself. But that's a problem for everybody. That's a problem for cars. It's a problem for disk drives. You could argue that's going to create an oversupply potentially. So let's not go there. But you know what? At the end of the day, it comes back to the customer and all this stuff is interesting. I love talking about the architecture but it's really all about customer value. And so, Renan, I want you to sort of close there. What should customers be paying attention to and what should observers of vast data really watch as indicators for progress for you guys, milestones and things in the market that we should be paying attention to? But start with the customers. What's your advice to them? Sure, for any customer that I talk to, I always ask the same thing. Imagine where you'll be five years from now because you're making an investment now that is at least five years long. In our case, we guarantee the lifespan of the devices for a decade such that you know that it's going to be there for you. And imagine what is going to happen over those next five years. What we're seeing at most customers is that they have a lot of dormant data and with the advances in analytics and AI, they want to make use of that data. They want to turn it from a cost center to a profit center and to gain insight from that data and to improve their business based on that information that they have, the same way the hyperscalers are doing. In order to do that, you need one thing. You need fast access to all of that information. Once you have that, you have the foundation to step into this next generation type world where you can actually make money off of your information. And the best way to get very, very fast access to all of your information is to put it on fast media like flash and crosspoint. If I can give one example, hedge funds. Hedge funds do a lot of back testing on VAST. And what makes sense for them is to test as much information back as they possibly can. But because of storage limitations, they can't do that. And the other thing that's important to them is to have a real time experience to be able to run those simulations in a few minutes and not as a batch process overnight. But because of storage limitations, they can't do that either. The third thing is if you have many different applications and many different users on the same system, they usually step on each other's toes. And so the VAST architecture solves those three problems. It allows you a lot of information, very fast access and fast processing and amazing quality of service where different users of the system don't even notice that somebody else is accessing the same piece of information. And so hedge funds is one example. Any one of these verticals that make use of a lot of information will benefit from this architecture in this system. And if it doesn't cost anymore, there's really no real reason to delay this transition into all flash. Excellent, very clear thinking. Thanks for laying it out. And what about things that we should, how should we judge you? What are the things that we should watch? I think the most important way to judge us is to look at customer adoption. And what we're seeing and what we're showing investors is a very high net dollar retention number. What that means is basically if a customer buys a piece of kit today, how much more will they buy over the next year, over the next two years? And we're seeing them buy more than three times more within a year of the initial purchase. And we see more than 90% of them buying more within that first year. And that to me indicates that we're solving a real problem and that they're making strategic decisions to stop buying any other type of storage system and to just put everything on vast. Over the next few years, we're going to expand beyond just storage services and provide a full stack for these AI applications. We'll expand into other areas of infrastructure and develop the best possible vertically integrated system to allow those new applications to thrive. Nice, yeah, investors love that lifetime value story. If you can get it, you know, above three X of the customer acquisition cost, it's IPO in the way. But guys, hey, thanks so much for coming to theCUBE. We had a great conversation and really appreciate your time. Thank you. Thank you. All right, thanks for watching everybody. This is Dave Vellante for theCUBE. We'll see you next time.