 Hello, welcome to theCUBE's live coverage here in the Lakehouse for Databricks event. AI is the big focus, data plus AI. The generative AI generation is here. We're covering it like a blanket, this is theCUBE, go to SiliconANGLE.com. I'm John Furrier, your host with Sanjeev Mohan, part of theCUBE community, industry analyst and expert in data, Sanjeev Mohan with Sanjimo firm, that's your name of your firm. Great to see you again. Likewise, I saw you last week at MongoDB and I saw Dave at Snowflake Summit. I was tempted to just leave the headphones on, you know? I really appreciate you on the Snowflake side. We love the commentary, it was really first rate here and different story. Very different. What's the difference between Snowflake and Databricks? You've been to both shows, I've been only here, a little phone while I'm not going to Snowflake, I'd admit, looked pretty good. What's the difference between Snowflake and Databricks? Eventually, the business requirements are the same, which is a unified, consolidated way to access data, but the way they are approaching are totally different. Snowflake is primarily about data warehouse, data cloud, applications. Here, the message is data and AI, as the conference name says. So, unification is extremely important here. You see unification of data and AI, you see unification of data and metadata, you see unification of table formats, you see unification of Lakehouse Federation, where you can now go access Postgres, Snowflake, DredgeShift, and you see unification at Unity Catalog, where you can now access your models, your tables, your reports, your notebooks. So, that's the big difference I see here. So you think the big theme here at Databricks Summit is their unification message is number one? Correct, yeah. Okay, so, you have the unification. Yeah. They're also talking about democratization. I guess that's a high level way to look at unification. How does democratization fit in? How does unification feed into democratization? So, you know, it's very interesting, we were talking to Ali Gotzi, and he was saying if he were to do the whole Databricks from scratch, he would start with data governance. And so that Unity Catalog came a bit late in the game. It only last year is the first time we saw it. So, that democratization is now possible because of Unity Catalog, because it's becoming the single pane of glass for all personas to unify. So, let me ask you a question. So, there's been a lot of players in the catalog. It's Informatica, you got Elation doing well with federated cataloging, which won an award three Pete at Snowflake Partner of the Year, first partner of the year here in governance at Databricks, their first award year. So, what's the difference between all these other solutions and then what's native into the platform? Because Databricks is a platform now. Yes. That's the pitch, everyone's a platform. Correct. What's the mean for the customer? I have all these other choices too. Right. So, let's take an example, Instacart. Instacart is a very big Snowflake customer and it's a very big Databricks customer. Now, when you have a situation like that, you cannot really use the specialized catalog of a single platform. You need something that is cross. But because each platform now has its own catalog, so you can do a lot of push down. And you can take advantage of this hybrid model where you have Elation, Culebra, Informatica at the top level and then you push down. Sanjeev, first of all, I love having you in the queue because one, it all started with me and you talking about this at KubeCon in Amsterdam. Correct. We started, that was the beginning of, when we first coined the term data developer. Right. And it wasn't me, the report, or you. We're doing it in the open, open source. This rise of the data developer is clear to me. I see it very clearly that this new persona is going to be a developer. Yes. They look at the makeup of the people here. Younger demographics. Right. AI is attracting the young developers, not just the crypto people going over AI, but real, the next generation application developers. So this idea of AI native in applications, as well as having the operational infrastructure to be horizontally scalable, is emerging. What is your view? How do you see this? You've seen this movie before, the classic database market share graphs, all this other stuff. This is not about databases, this is a whole nother game. How do you frame the market with this idea that in the future developers are going to be programming with data, or data working with the developers? You know, if you ask me which was your most surprising discovery at data and AI summit, something that caught me completely by surprise is the English SDK for Apache Spark. I did not see that coming. So there's amazing stuff that I've been exposed to, Lakehouse IQ is another one, the uniform format. But to give the ability for any developer to code in English, have that get translated into PySpark, is absolutely brilliant. It just expands the number of people, like democratization, people who can now program on data. And you see the rise here, too. Another surprise that got me by surprise was the rise of their marketplace and the application layer emerging. I mean, we have two startups here that are building applications, data applications. One actually coined the term LLL and LLM engineering as a discipline. So you start to see LLMs becoming an input. Different size models, different native, bigger, smaller. So mix and matching of models is coming. And very much part of this. So the rise of a marketplace, the rise of applications is a sign of an ecosystem. Platforms have to enable things to grow. If you have a platform, you have to enable that. And that's not about the database or whatever. It's the connective tissue, it's the software, and the data. And so if you have the data and you have the software, that's the new model. So in the LLM space, there are three ways that you can use it. You can prompt it, which is what everybody is doing. You can fine tune it, or you can retrain it. So prompting, we've kind of figured out where the regenerative, the rag, we can now even customize it. Fine tuning is a very difficult exercise, but Databricks got to it through Dolly. So you can change the model weights and you can fine tune it. Training, a small form factor language model is a very big challenge. Music, ML, the purchase is to solve that training. Are we going to get to a stage where we'll all be training LLMs, we don't know. I think the Mosaic ML was an interesting acquisition because obviously they're aligned with the vision of Databricks. But also they were doing around the funding, the huge CapEx opportunity for them. They got to spend more on GPUs. They already have a lot of GPUs. So training and inference are like the ying and yang in this business now. How do you look at training versus inference? Which one's harder? It seems that training is more CapEx oriented with GPUs. The inference or vice versa. What's the difference between training and inference in your mind? Yeah, so training is not for everybody. For training, you really need AI skills. Inference, you don't. So majority of people are going to use LLM model for code generation or for summarizing documents. So inference is going to be massive. There is a cost to inference as well because it takes a lot more electricity to infer from an LLM than it does from a Google search. Yeah, yeah. But training is a totally different ball game. The price of GPUs are going to go down. I'm pretty sure there'll be more companies that will come up. You saw Mistro AI in France got 113 million. Seed round. Seed round? That's ridiculous. That's not a seed round. That's a, yeah, would have been a C or maybe a B round and now a C. With like four people and a month old company because everybody's trying to make the training become a commodity and I'm sure we will get there. Amazon has a lot of GPU power too. They want to run all their workloads on Amazon. So we're in a builder mode though. I want to get your thoughts on this because we talked about this before. I want to table it again. You got builders who are building apps and LLMs and AI and playing with data. Then you got to run it. You got to run the infrastructure. So you have two markets developing. One's a little heavy on the developer side right now or re-architecting and the other one's what do you run it? How do you run it? You know, so this question came up even in the last, where Dave learned about build and actually I don't know, maybe it was at MongoDB the build and deploy and the statement I made then is that we should not think of these as two different spaces. You cannot build something. The problem is if build is very easy and cheap then we pay the price down the road when we have to deploy it. So how would you grade Databricks in this event? Give them a grade, let a grade. I would say, I would give them an A. You give them an A? Yeah, yeah. Wow, a lot of A's going around. Is that? Well, Mongo got a B plus. So Mongo actually had a one day event. So they compressed their two to three days into one day. So something's got left out but Databricks has gone all out. In fact, people tell me here, the vendors tell me the traffic at this event is back to the pre-COVID days. Yeah. 12,000 people. So on the A, what's the merit on that A? Content, traffic, booth, positioning. I wish, actually one thing that stands out for me when I look at what Databricks is doing, they have stuff that is already working. They may announce new stuff but they can show it to you. The demos were good. Yeah, so excellent demos, great engineers. So the content is fantastic. Where, if I were to say there could have been A plus, I don't think they're overly engineering focused to a point where when you like Lakehouse IQ, they think AI will infer all the relations, all the contacts, all the metadata. But that's not true. You need the business aspect and that I did not see. Yeah, and to me, I was just talking to folks from at scale here. They have a semantic layer. I think the big weakness I would say by Databricks is they don't have a good semantic layer story. Yes, correct. And I think to your point about it, they think AI will do it. I think that's a blind spot for them. So for that, I give them an A minus. I give them an A in all marks because on the demos, they're doing everything right. Databricks right now, first of all, they're making a statement by having Moscone North and South, 12,000 people. It was packed yesterday, so legit crowd. Chase Center, big money spending. So they're making a statement with the spend. But their event is about developers. It's all about learning. So every single big company, VMware started out this way, Amazon started out this way. Their early conferences, before they really grew, was about showing how it works. Correct. Very educational oriented. Right. And their demos were A plus. They had great demos. They can show it. I love the unification. Ali Ghazi saying, we're going to put it into the format wars. Yep. Boom. Yeah. That was a mic drop moment. Your reaction to that. Yeah. The demo they did was that somebody ran a query in BigQuery on Google Cloud and BigQuery only understands Iceborg. It returned the results, which was actually a Delta table. It's like, this is phenomenal. Yeah. So the whole world's on there. I love the voice. English language is now the query engine. Yeah, the new programming language. The new programming language. Any other observations you want to share that you saw here? So one thing that there's a lot of attention being paid to is security. It is pushing down a lot of like rural level security, column level security. So Unity catalog now has a built natively. Also Databricks invested in Immuter. They bought Ocaro. So there's a lot of emphasis on making sure that not only can you build apps and deploy them, but they're secure and governed. Yeah. That compliance piece is huge. Yeah. Because you want to make scaling up, not slowing things down. Yes. Compliance always and governance always seem to be a drag. Yes. Slows things down. Right. You think Databricks has got that right? They're getting there. So there was another thing that I really liked was Federation, Lakehouse Federation, where you can now access external properties through Unity catalog and eventually they're going to push their security and governance down to those external products as well. Thank you. Great to have you on. Thank you so much. I want to ask you a final question. What are you looking at right now from a research perspective? You've got, we've been all the shows. You've seen all of them. Been all the shows together. Snowflake, Mongo. You got companies like Vast Data emerging really quickly. Correct. So like we covered them yesterday in an interview. Yeah. Storage layer moving up. Yes. You get a lot of moving parts of this industry. It's not your yesterday's database industry. What is your research premise right now? How are you looking at the market? What's the puzzle that you're putting together? Can you share what you're working on? Yeah. One of the things that I have not seen enough attention pay to is an idea of a data product. A data product is where we are building an outcome. It's still a report or a ML model or a notebook or a materialized view. So physically it's not different, but the difference is you start with business, you put product management discipline and you have a sense of accountability. You have an owner. So that, and if you combine that with an LLM, then you get a native English language access. You have a data developer. Yeah, absolutely. Data developer, next big thing. Okay, we're in the queue. We're bringing all the data to you. Sanjeev Mohan, industry, leading industry analyst here, part of our analyst circuit here. We had Tony Bayron and Doug Henshin earlier. The analysts are all waiting in with positive grades, the data bricks. So of course we'd like their story. We love the open source. We love data plus AI. We're video plus AI here at theCUBE. Go to siliconangle.com, theCUBE.net to see all the action and coverage. I'm John Furrier, host of theCUBE. We'll be back with more coverage from the Lake House after this short break.