 Welcome to Amsterdam and KubeCon CloudNativeCon 2023! Join John Furrier, Savannah Peterson, Rob Stretche, and UPScott as the Kube covers the largest conference on Kubernetes, CloudNative, and open source technologies together with developers, engineers, and IT leaders from around the globe. Live coverage of KubeCon CloudNativeCon 2023 is made possible by the support of Red Hat, the CNCF, and its ecosystem partners. Welcome back everyone to Kube's live coverage here in Amsterdam, KubeCon, CloudNativeCon, Europe 2023. I'm John Furrier, host of the Kube, got Savannah Peterson, got a lot of hosts, a lot of guests, a lot of great panels. This one is about the role of data AI in DevOps, a lot of changes. We have Sanjay Mohan, principal and analyst. He's been tracking the data space, and we have the two great guests here, Matt Butcher and Justin Cormack on and back again, experts. Guys, data in DevOps, AI, future, chat GPT is taking over the world. There's a company here that says they're the chat GPT of DevOps. Provision some Kubernetes. I mean, automation is, in a way, data. Yeah, it is, and configuration is data, and to generate processing data and configuring data is kind of an interest. It's an obvious target. We've seen quite a few things in the last few weeks with people taking that, looking at that space. Sanjay, I want to bring you in, because you've been covering this space aggressively super cloud on our event. You've got the data fabric. Data will be a disrupting enabler in the world with AI. Machine learning has been around. I mean, everyone's like, we've been doing machine learning for a long time, which is true. AI is just a reflection of that next evolution. What's the AI impact in your mind? What do you guys think about this AI strategy for DevOps? It's not chat GPT, but it is intelligence, some automation there. What's the role of data in DevOps? So I would say data and AI. There are two major shifts that I see going on. The first thing with data is something that's very different from what we've been doing all this time. We've been talking about big data forever now, for the last 10, 15 years. But it's turning out that the data that you need, like you mentioned, configuration is data. A lot of data that we need is not really that big sometimes. Although your entire data set may be huge, but you really care about the most fresh, the freshest, the most recent data. If that data is not too big and I can train an AI model and I can start doing some sort of like a search or analysis, I can do it on my laptop. Yeah, and also we kind of forget that in the 15, 20 years of big data, our computers have got much bigger and so the size of the data we're offering has got smaller. I have seen benchmarks where the database has run at the same speed, if not faster, on an M1, M2, Apple Mac than a server. So if I can run if the data size is not too big and the hardware is so advanced, maybe I should do my compute in an engine that everybody has, which is a browser. How cool is that? It is cool. I think that we're seeing more and more hardware for processing data hitting laptops and phones because it's being driven by use cases like image processing on the field photos on the phone. It's done in hardware and then that hardware becomes available for all sorts of applications. Right. And I think the rise of real concern about privacy and personally identifying information is going to drive that forward because a lot of the things we care about knowing are also things that require data that maybe I don't want to send to somewhere else to have it stored in perpetuity. And so to be able to take smaller well-defined sets of data and work on those locally, I think that's going to fill a big need for what we want. So let me throw something out at you guys and get your reaction because I think Dave Vellante and I have been talking about this super cloud concept for about a year and a half, but it's become more multi-cloud. But let's pull that aside for a second. If the developer experience is more productive, more compatible, run code anywhere, once run anywhere, reusing code, you can almost imagine that's going to be better apps coming. So just assume that for a second. With the cloud now in its teenage years, you've got more innovation that's not like your classic ISV. Look at Snowflake and Databricks. They're building on top of the hyperscalers without spending a dime of capex. So no hardware required. Now you're running apps in the browser with Wasim and you have data and AI coming. The next level applications that will come out are going to look completely different. If you assume that, what does it look like? And how does it work? What do you guys think? Yeah, I think that's the thing that we have witnessed begin to unfold over the last five to seven years. The cloud started as largely the simulation of the data center, which sort of started as the simulation of stacking a bunch of computers under your desk and plug them all together. And now that we have seen a generation of services evolve that were built for the cloud, it's time to sort of start thinking, okay, well, now what's the next thing? How do we start building out from there? AI and machine learning is a good example because now we're starting to understand when we can at least somebody else's computing power and we can store vast amounts of data in there. We can build things very differently than when we have to pay for the hardware ourselves. And then we have other things where, you know, then we start saying, well, wait, some of this we have to move back closer to the user. And that means we have to rethink a lot of the architecture that I think we're building. I think it's been really interesting, you know, in the development of, you know, it's been a decade of containers this year since Docker launched. And I remember that, you know, that a lot of the questions early on were like, Oh, are people going to run a database in a container? And then it turned out that actually suddenly like the question like that wasn't the important question. The question was like, Oh, I'm moving all my databases to a cloud provider because I want to I don't want to run a database at all because yeah, there's no upside. All I could happen is I could lose my data. And there's, you know, there's just downside. And now we're seeing a kind of third phase, which I think is the really interesting thing for this conversation is an explosion of new tools around data and new ways of doing things. You know, we're seeing like SQLite and DuckDB for local data processing. We're seeing like people thinking about, you know, we think cockroach and, you know, kind of new databases that are distributed from the start. We're seeing a huge amount of innovation in the data space. We're seeing almost, you know, a vast majority of apps have some date, you know, caring about data. They're not like, if you look at the 12 factor app thing from the beginning of containers, the one about data is the one that doesn't really apply anymore because like, never store data anywhere in your application doesn't make sense with an application that's all about data and you want to do data processing. All the other 11 factors still look kind of okay, but that's, you know, those data architectures have made those two jumps, one to the cloud and one to, oh, okay, we write data so important, we're going to have to think about this again now. It's a next gen conversations. What do you have? Yeah, I see two super interesting trends and things that are happening from the front end and the back end. So the front end is how data consumers are getting the data they want to do the business decisions. And today we spend a lot of time creating these reports and dashboards and something is never right. You know, you always need that extra piece of information, but with the AI coming into the picture now, what if I can ask, I'm a business user and IT doesn't anticipate what my needs are, what if I can ask a question on a model that's trained on my institutional data. So reports, dashboards, apps, I think in a few years they'll be gone. We'll have this interactive Q&A kind of a thing. That's at the front end. Now at the back end, something even more interesting is happening. You know how we kind of change the whole application space with microservices? I think that's coming to the data, in my opinion. In what way? So it's called data products. And this year at theCUBE I mentioned that I see data products which are a sort of a self-contained, tangible piece of code and data together. So now a data product is like it's an API access to some pieces of data. So that becomes like a microservices to your point, Justin. That is how we are going to co-locate data and apps together. Who consumes the product, the developer or the application? So it could be any data consumer. It could even be a DevOps engineer. A DevOps engineer wants to configure a system or wants to somehow monitor or test it. So instead of going directly to the database and writing a SQL statement, they go to a data product. A data product could be for business user, like a supply chain for a retailer. There's data products for people to look at. But there's also data products to feedback into systems themselves. So it's like a product that optimizes your pricing is a data product. But it doesn't feedback to a person who changes the price. It changes the price for you and optimizes your pricing directly because you want to have that level of control and feedback built. You've got observability, you've got data. That's going to change with AI. Board code is going to come in. I think machine learning here is the interesting new variable in this equation. I think for a long time, going straight to one of the points you made, when we were thinking about how do I organize my data and then how do I query my data. Organization was all about formatting everything in the right. We're breaking all the pieces apart and we're putting each one in its little box and we're going to shove it into a database. And then we're going to index it. But it was all very oriented around the data types and the coder's ability to structure that data into little chunks that the computer understood. Then SQL was all about saying, and here's a language that I as a human can learn to try and make some sense out of the data in there. Well, the cool thing about where ML is going is that both pieces of that equation are starting to look different. We've learned to be able to feed data into the model and have it figure out what the relationships are. And then the query language becomes chat, right? That's the prompt. That's the spoken word. This is what I wanted to get to. Thanks for bringing that up. That's exactly the point. I couldn't figure it out, but you just said it. Data on data. So organizing data was a decision not made by developers. That's a fact. Database for you. Infrastructure for you. Oh, it was a story in storage. What's the, how do you flip that script? What's in it for the developer? How does a developer develop with data? And then how do you organize the data if on behalf of the developer productivity? Because the prompt thing is showing us that data is interacting with data. That's like code. Yeah, it is like code. You have to start doing code like things with it. I mean, like there's, you know, there's very different flows involved once you've got data. You have to start, you know, you're building models and you're testing them and you're refining them. And then you're, you know, there's a, there's a whole separate set of processes that look like what you do with code, but they're kind of also different in the sense as well. But you're, you know, you've got to be, you've got to think more like a scientist about like, you know, has the, has the world still, what it was when I built my model last week, or is it, has something had changed and this is now ongoing. There's more cars and more colors. There's more cars and more colors. But this is happening right now. This is new. This is a new phenomenon. We're living through yet another one of those change moments. And to, I mean, that's the, the model, we've always kind of had the notion of model, I think, but we were very stringent about it. I think of a database, you know, the DDL for a database is a model, but it's primitive and it's, it's... And it's very slow moving. It doesn't change. You don't change it very often because it's you're turning all the knobs and dials to tweak it. Now, with machine learning, we have to build models and then trust that the model is going to... Object, store, unstructured database is growing up into a monster. Right? I mean, that's what was happening here. Right. And that was Justin's point really is that if, if we lose, we like the tight grain control, the vine grain control over what the model looked like. But now we're starting to see the power of using an ML model. And this is the challenge. The trust is the issue we don't have. Yeah. And that's what I mean by experiments and like, does, is this working for me now? And does it, like you, you no longer have the, I wrote this, I understand every line of code in this. It's like, I fed this, this data and it told me this was the answer today. And I want to, I want to be very sure that the kind of answers it's giving are still right. And that is not doing... The S-bomb is what? Yeah. Drop the S-bomb on the show now. Had to get it out there. Freaking game. We'll take a shot. It's using the ML to, to do the work, but it's also not just to create the app or the model but to also retire it. In the data world, we've been really bad at that. So in the, in the application side, you know, we write the code, we version control it and we treat it like a product. We get rid of it when it's not used. In, in, on the data side, we just create a new data model. So there's a new request. The, the DBA or the developers, the designer says, sure, I'll create you a new data mod that data mod sits forever. The machine learning model sits forever. And the, this guy has run off to do it. All right. So I'm going to ask you a question. Yeah. But first, we agree that the data for developers is now upon us. That's going to be figured out with early days. No one has an answer. That's, that's job one. Chat GPT, this is a question for the, the industry analysts who's got all the, the bases covered. What companies are going to be disrupted by chat GPT? If that goes its course, name the companies. Is it Snowflake? Is it like Teradata? Who is getting disrupted by chat GPT? If you assume now developers are going to flock to a model where you got data fusion for lack of a better word or data that's more robust and programmable. So John, depending upon who you talk to, you'll find out either chat GPT produces pristine code or it screwed it up. So you can just like a human. So my point is that I don't see any company like snowflake and database companies getting disrupted at this point. Maybe they'll change, but to me, chat GPT is an amazing way to get started, but not to finish the job. For that, you still need that expertise. Well, here, well, I would first, I think snowflake would be disrupted because the lock in point you brought up about WebAssembly is interesting. No data can't be locked in. If you lock in data, then use all the benefits of machine learning. So you have to assume that snowflakes going horizontal across clouds, but you got to use their fabric. Correct. Right. So you're locked into snowflake, right? It is true. Yeah. So does that help or hurt the future of the AI or is it doesn't matter because they can talk to another model? So when I first, last year, we had snowflake summit and then the data break summit. I'm going to say something which may be a bit controversial, but when I look at what data breaks does, you put all your data in an open format, Apache Parquet files, on an object store with a common interface, maybe it's S3, which is now everybody supports S3. So that data is not locked in. But when you put it in snowflake, it goes in the proprietary format, plus snowflake will do micro partitioning, clustering, anything and everything to improve that speed. So there is a difference, but snowflake has a massive ecosystem. Everybody's moving. In fact, I would say cloud data warehouses today have become the clearing house of all compute activities. There's no doubt snowflake is a great cloud data warehouse. And I think being a cloud data warehouse means you're like managing data at scale, right? The question now comes back to the developer. You mentioned Salesforce in the last year. Yeah, I think one of the questions is, is the developer going to be programming at snowflake all the time or are they going to, things like a parquet, again, that's part of the really exciting parts of the data tool set now, that we're getting all these tools that are designed for sharing data between applications, easily writing applications against data. And these formats are becoming really important for quickly building new applications that use data. And I think it's hard to kind of ignore those things, but you need to get stuff out of snowflake because you're going to need it all over the place in order to write applications that use data pervasively. So people who are watching this are already up in arms because they're like, no, wait, how can you say snowflake is locking in my data because snowflake can go to external files at the table using Apache Iceberg or Houdi. I'm already getting hate texts from Databricks as well. Since we've brought up Databricks, we'll end with the comment around tool chain because WebAssembly, which we were just on, which I wanted to bring you in, we couldn't have enough chairs, but tying it all together, I think WebAssembly points to the future and it's in its own North Star of we want to have compatibility around coding and code so that developers don't have to do stuff repetitively and make their jobs suckier and want to make a great experience for developers and be productive. That's unequivocally a great North Star. When you apply it to these tool chains out there that are proprietary or tied in to, say, Databricks tool chain, you're tied into their AI by default on the tool chain. So what's interesting to me is maybe that's a good thing, maybe that's a feature, not a bug, if there's a WebAssembly like mindset in data. Yeah, if you can run WebAssembly in your snowflake database and things like that, those are definitely options that make it easier to write applications there, portability. I want to think that those are the kind of use cases that the security models design for WebAssembly and the extensions for performance and so on. That kind of take your code to where your biggest data is. It makes a lot of sense from the data gravity point of view. If I can write a highly-performant application in C, C++, compile it to Wasm and run it in my database, that is far better than what happens today. Today, you write it in whatever language you want, then it goes to this translator. It translates it into SQL because databases, there's no SQL. If I can avoid that and have a binary format, go ahead. I was just, and there's an expense component to that too, because you're pulling the data out of the database, which requires pulling it over the network, often incurring, then you're consuming compute resources here, churning through all the data, and you're putting it back in. To be able to run that code inside the database, instead of out here, means we're saving money, we're much more efficient. You can do more local optimizations around. Dramatically simplifying the tooling and the experience. Guys, we are putting on a clinic here. We've identified that AI answer is not yet clear, but we're starting to get signs of visibility into the areas. Patterns are bubbling up. Yeah, making things easier, faster, reduce the times, it takes the do steps, and the database, I mean, maybe it's database-less world we're going to be moving to. Got server-less? So what is server-less? The way when people say, what is server-less? The way I define it is when you think of server-less, because server-less is there, but you don't think about it. So the database-less would be where the end user is not worried about, is it an Oracle or Snowflake or Teradata, but it's there. Well, we've got to end it there. We're getting the hook here. We've got to pull the plug on us. They are. I'm getting yelled in my ear by Leonard. Yeah, thank you so much. Sajeev, thanks for coming on, bringing the analyst's perspective. Laying out the landscape, it's really the confluence of a really great time. If you're a developer, open source is booming. The opportunity is just laid out today. Some entrepreneurial opportunities, a white space to innovate, tons of action. Just great stuff. Thanks for sharing. All right, we'll be back more with live coverage here. KubeCon in Europe. We'll be right back.