 And then welcome back to theCUBE's coverage. We're back in the press room now. We were at the lake house earlier, Databricks Data Plus AI Summit. It's been a great event. Packed house 12,000 people on site, 75,000 online watching. Databricks really kind of take the next level of evolution with the industry going to beyond data. Data and AI is a story, data plus AI. It's enabling not only a new infrastructure, data infrastructure models, creating new products, new ways to think about how to create value and applications. Data, because the next guest is Jeff Denworth, who's the co-founder and CMO of Vast Data, who does a lot of work with storage and how data is thought about and is in the middle of rethinking that with your company as you guys are progressively making new things happen with future of data. Thanks for coming on theCUBE. Thanks, it's been a long time. I think you were in theCUBE, 2003. You're an OG CUBE alumni. Yeah, 2003, 2013. 2013, I'm sorry, I'm really dating myself. 2013. I don't even know who was president then, I guess it might've been the- Yeah, it definitely wasn't Trump because I remember that day very vividly. I was in Seattle for a CNCF event. But, you know, one of the things that's interesting, because you and I have talked about this before in the past, during that wave of big data, remember Hadoop, the Hadoop wave, and what made Databricks really successful was Spark made it easier and cloud was emerging and that created a whole generation of value. And we saw the same old storage, you store the data, it's in the cloud, you've got on-premise, but now it seems like a whole nother kind of replatforming of the infrastructure's happening, new younger generation talents coming in with AI. People are rethinking the future of their data and they're thinking more about platforms. What's your view on that? You guys are doing a lot of work here, thinking about how storage converts to a data platform, what's your thoughts? Yeah, so Vass is a company that was rooted in building in next generation distributed systems architecture. And in the early days, we studied organizations like Facebook and Google and it was less around big data in the way that you and I used to speak about it and more around all the real unstructured data that informed some of these really rich models, right? Data that goes well beyond the numbers and tables that you put in like a classic data warehouse, right? And so as we think about that, what we see is that at the most foundational level there are new solutions needed to help people take all these massive GPU systems that are being built and actually apply them in anger to large data reservoirs that are filled with this data that doesn't have any structure or schema in the ways that business systems would expect. And so just a few weeks ago, we announced that we were the first enterprise distributed storage systems company to be supported for NVIDIA's super pod architecture. So super pods, if you don't know, are like the biggest and the baddest of the NVIDIA machines that are being built. Those are the ones that are being deployed by the people that are basically building the most popular foundational models that you have in the market today. And that requires a very specialized type of infrastructure that up until now has been kind of the domain of the large hyperscalers. And we think there's a huge opportunity to just go democratize that to the masses. Yeah, and I think I love that democratization message but we're hearing more and more of it. We heard on stage, Mosaic ML has a lot of GPUs I heard. Databricks bought those guys for 1.3 billion. So you're seeing this bolt-on GPUs but now you got cloud scale enablement too. So you got the bolt-on with GPU support. You got on-premise hybrid cloud booming. Open source is booming. All that's kind of coming together but the data is the key to AI without the right data inputs. And that came clear in the keynote here and it's kind of an epiphany here at Databricks and I heard the same thing at MongoDB event with developers and we've heard the same with others, even Google, Snowflake and even Amazon all say the same thing. Data input drives the value and there's no property issues, privacy. This is the important part of the data. Can you explain how you see the role of data in the AI equation? Sure, so we're dealing predominantly with global organizations. And there's this conversation that is in the market around data gravity but there's also new terms that are coming up and new concepts that are coming up around things like federated learning. And essentially if I'm building some sort of healthcare patient data AI system, I may not be able to pull that data out of the different sovereign data centers that it might be in but if you can just pull the weights out and create some sort of accumulated learning from all the different sites that you've got your data strewn across then interesting things can happen. So you mentioned cloud earlier, I think about cloud more like distributed computing model more than like this guy has to go to this CSP or this CSP and our realization is that everything is strewn everywhere and there's very little interconnectedness between all of these different systems both from a processing perspective and a data perspective. And so a data platform first has to be able to kind of stitch all this together into one unified cohesive view and then allow you to go and run models that are completely distributed across the world. What's some of the infrastructure requirements to run these modern deep learning workloads because that's clearly coming. You mentioned that like what's the minimum requirement? What's the ideal infrastructure to run these modern deep learning workloads? So it's very clear that you need some sort of really strong horsepower with respect to AI processors. Most commonly that's NVIDIA, there's a few other players in the space but the prescriptions that these organizations are making are for essentially supercomputers. And so you can either consume these in the cloud as a service or you can go and deploy them on premises but you first need some big machinery and then big machinery gets coupled with big data. When I say big data I don't mean like a 50 terabyte Hadoop data set that you're sitting on but I'm talking about, we're working with customers, we have one project right now, the customer the other week forecasted 10 exabytes of data that they're going to process on. So you know with respect to the data paradigm, if you have machines that can actually process at this level of intensity, our observation, I'd be curious to hear what you think, John, is like people are now starting to architect strategies to collect tons more data because for the first time ever they can actually go and process it. I was just talking to a venture capitalist who just came in earlier and big time investor successful in the field and all those data hoarders are now looking good right now because hoarding the data and storing the data which by the way it gets cheaper and cheaper are looking good because now a lot of the LLMs and these foundation models can take advantage of that data like they've never seen before and that came out loud and clear in the keynote today where you see the data, historical data, not just real time data, historical data is now just as important as real time data because you now can mix and match with it more effectively, this is developing on data. This is a trend that's actually not hard to do now and democratizing that is the killer feature and we're seeing that play out today at this event. Yeah, I think about it as, so essentially all AI, machine learning, deep learning is just applied statistics, right? Statistical models get better as you add more and more data to the training set assuming it's good enough quality but the streaming component of this is more of a reflection of how you and I think, right? We're not like scheduling something at eight o'clock tonight to say go think, right? Go interact with the world and maybe make some realizations after that. You and I are talking, languages, our API, we're trading information in real time and you're correlating that against things you already know. That's the way that computers should work, right? You should have data that just flows in and then the system understands how to process that in real time. That's why I wanted you to come on theCUBE because that's exactly the point I think you have great insight into. Ali, Godsea talked about it today in the keynote. They're breaking down the notion of data warehouse, they're killing this war on formats, they're unifying things but at the end of the day you still got to store stuff, right? So, and you got storage out there, you got databases, you got application companies, all these little silos, right? And so what you're referring to is you got to kind of bring it together and make it fluid and more interactive. Whatever you want to call it, it's still got it, it's different, but it all starts with the data is. It's stored somewhere. So either data is landed somewhere and it's stored or it has to move somewhere, okay? So you come from a storage perspective, what needs to change? The future of AI is data, data is stored. So it's a storage solution kind of on steroids but it's not a storage solution, it's not a storage platform, it's a data platform that happens to have storage. Or am I thinking it the wrong way? What's your perspective? This is, these silos going to be gone, it's not just a storage company or an app company, it's going to be a data company. What is that future? Because storage is important, you guys know a lot about that. Well I think customers are looking for something more integrated. Storage systems are dumb, applications have no memory. And so if you think about it, there's kind of two elements. One is there's a huge division in the market right now between products and services that were designed for the era of big data and what is needed to make that next step into the era where we can actually process natural data. Data that doesn't fit in a classic data warehouse, right? And that's where the last mile of robotics lives, that's where the last mile of automation is and probably the biggest benefit to humanity. But there you need support for unstructured data, you need support for thousands of GPUs that are built on top of parallel computers that stretch all the way around the world, unifying the semantic layer with the unstructured layer. And so if you have that, then you kind of think about it and you say, okay well who's in this space? And we see there's very few full stack data platform companies that are out there, there's nobody that's addressing the deep and deep learning. And the players that are out there, some come from the engine level and they start to kind of build around that, some come from the data warehouse level and they start to build around that. We came at it from the storage perspective first, thinking and understanding that applied statistical dynamic that I talked about earlier. I said okay well if you can solve for access at any level of scale to this rich set of data that people haven't really figured out a great way to process across the enterprise, if you can solve for that, then you just build on top of that capability to go better and better. Yeah and I think that's an interesting point because the number one thing that comes up when I talk about large language models and foundation models and AI is, who do I work with? Which companies do I work with on that? And then where do I run it? And then the third question, how much does it cost? And does it perform well? So at the end of the day we're still going to have the performance challenges. Sure. Databricks has a verification kind of like feature with their GPUs, it's been authorized, it runs faster, they're just trying to see more of this SLA, policies, quality of service. Performance is huge factor in all this. It is, it is and you know, if you think about putting together stacks of a bunch of different technologies and expecting it all to work together, our thesis around APIs and integrated services is that it steps the combined solution down to the lowest common denominator and that's why we started from the bottom and we've been working our way up because performance is critical but it's not just like performance at the efficiency level, it's also in terms of scale in ways that I think people don't fully appreciate and so we have a kind of a unique perspective into the future because we're working with those hyperscalers that are building the largest of the foundation models and if we can figure out how to get those things to scale to five, 10,000 GPUs, hundreds of petabytes of infrastructure, then the hope and the expectation is we can go apply that out to the rest of the enterprise that's lagging those companies by a few years. So you guys are a storage company, storage firm that's now a platform. You have to be a platform to survive. There's a few things that we haven't yet announced about what we offer and that'll be coming. Okay, all right, so what can we expect next year? Come soon, what's the... About a month from now, right? So August 1st, we're basically going to unveil what we've been working on for seven years. It's kind of like a fly, if you know the Amazon flywheel strategy where they get something on the ground, they start making money off of it and then they keep innovating. We've had a core product architecture that we've been thinking about. Internally we call it a thinking machine, which isn't our name we came up with, but we love it. And now the components are all coming into place where we can show the world what we were really thinking about so we're super excited about that. Well, right now the market is hungry for solutions that could get the most out of the data. I love the data plus AI, democratization, you mentioned that, that's right on point. I think that the enterprises have unique needs, right? And the developers are going to have theirs. And right now, if you look at the audience of developers, the demographics shifted to a lot younger on the analytics side, they're a little bit older. You got this kind of collision between, I call the old school analytics data people and the young generation was like, clear all that out of my way, I just want to take care of it for me. And the rise of applications, just seeing data applications emerging and they got to run somewhere, they got to be stored somewhere. So looking forward to hearing more. Yeah, we think data needs to be a lot smarter than it is today. You talk a little about the data developer and we see just a huge opportunity and a great future for that concept applied. You're co-founder, you've also been in the business for a while, so you know the enterprise needs. For all the co-founders out there watching entrepreneurs that are in this world, they're all like literally pumped up. They're super excited, so much opportunity. And a ton of white space too, as these platforms are emerging, there's no one platform that's going to rule the world. You're going to have a combination of platforms that's winner take most. And there's so much white space, there's so many opportunities to build apps. What advice do you have for them out there watching? Well, I think the thing that we think about is not focusing on the problems that have been solved for today. There's a lot of sex appeal in the marketplace right now and the easy thing is to go chase the shiny object. But the realization that we had, even before we started the company, is that the shiny object has largely been solved by people that are far more resources than you do. So you have to look a little bit farther in the future. You have to bring real invention because uninventive companies tend to die on the vine. And then you get to work like hell. And I think one of the things that I like about this mosaic act is that the team is strong, they have great chops and training, which is really hard to do, hard to get a hand on GPUs. And you got to have a good team. You got to have a great fast moving team. Well, I think overpaying is relative, right? We encounter a lot of customers that are looking to do things like apply LLMs for, you know, let's say like chat bots or things like this. And there definitely is a notion of good enough. And so as, you know, stuff gets pushed in the open source, it kind of gets pushed out into the market. And people realize, okay, well, if I'm not trying to have my, you know, my financial document scanning app also write a website for me. Maybe this kind of, this model that was trained with like a hundred million parameters is good enough. But the question becomes, you know, what do we think about, and what do we talk about five years from now? And is it large language models? I'm not sure that's the... I think it's going to be abstracted away. I think it's going to be abstracted away. I think it's going to be abstracted away. I think it's going to be abstracted away. I think it's data. I mean, at the end of the day, operational data should be invisible. Apps should be running data. Data infrastructure should be scaling. I mean, I mean... Well, we'll be talking about data for sure. But... It's in your name. We'll have to change it to whatever's next. No, but you know, today these models are basically just parroting out what they learned on some Twitter scrape or something like that. But think about it, five years from now, you'll have, you know, compute horsepower and data sets that are five, 10 times, 20 times more than what we have today. We're looking forward to a world where you actually have machines that can start to reason and understand the fundamentals of what they're talking about. And that's exciting. Well, I'm looking forward to what you guys are doing. I like what I'm hearing. I've heard a little bit about platforms. We'll get more in August. Thanks for coming on theCUBE. Jeff Tenworth here on theCUBE. Breaking down the future of data stored, but it's going to be intelligent. It's going to be reasoned on. It's going to be applied into applications. theCUBE bringing you all the data here. We're in the press room now. We're in the lake house earlier. We're day one of Databricks' Data Plus AI Summit. I'm John Furrier with Rob Stretching, bringing you all the action. We'll be right back. Our next guest after this short break.