 We're going to go ahead and get started. Cool. Hi, everybody. Good morning. Afternoon. Yep, after 12. First question. Difficult question. Before I introduce myself, we introduce everybody. How many of you are attending this event for the first time? Good job, it's the first time this event's being done. Sure question. All right, so that's our little icebreaker. But I also get a nice shot for the B roll for the summary video. My name is Bart Farrell. I am a C and Steph ambassador. And I'm currently doing the podcast QFM for Learn Kubernetes. If you're interested in sharing your experience about the difficulties and challenges around working with Kubernetes in all of its forms, please get in touch. Apart from that, content creator from the US originally, but living in Spain for the last 12 years, I ran the data on Kubernetes community from August 2020 to March of this year. And I'm still working in the data space. I'm also an ambassador in the SOTA Foundation. I know the topic of story I just mentioned in the last talk. I would also like to give a shout out to the previous speaker. Eddie did a really good job. It's going to make our job a little bit harder. Shout out to him. Seriously, that was really good. Yeah. Wow. I think like seeing a lot of KubeCon talks, having seen one that was laced together so well from a technical perspective, but also very human and creative. So I enjoyed that. So that being said, we're going to be talking about the future of database as a service with four well-seasoned folks that are very knowledgeable about the ecosystem. I told them in our previous call that I would do a quick intro for them, but I also want them just to tell us in 20 seconds about their background experience with database as a service. We'll start out with Oded. I'll share the part that I can tell you is that the founded multiple startups, ex-Googler, excellent basketball player before a career-ending injury, not like the one that happened to any. But take it away, Oded. What's your experience with database as a service? Hi. So my experience with database as a service started in my previous company. And with my previous company, it was very, very hard to scale with Redis. And that's actually why we created the Dragonfly DB. All right, good. Next up, Jordan. Should I jump in? Yep. OK. Jordan Tagani, co-founder of a company called MotherDuck. My start with databases as a service is I helped start Google BigQuery about 13 years ago and was an early engineer on the project and led engineering and then led product for a while until I jumped over to doing my own thing about a year and a half ago. All right, fantastic. Monica? So before that, before this, I'm Monica, the founder of a company called CETA. And we are building a data platform on top of Postgres. My journey started with a few years, seven, eight years of experience in the monitoring space. So I have experience with pushing databases to the limit by storing lots of data. And I started CETA because I started this nonprofit organization. And we couldn't find a database that can fulfill all our wishes. Good. OK. Yeah, and I'm Lisa. Can you hear me? Is this still working or do I need that? I think you're good. Oh, OK. Oh, OK. I'll just take this one. Hi, I'm Lisa. I have worked at, I think, four database companies now. Most recently, for the last three years, I was at Cockroach DB. I did not cue Eddie up to say those nice things about Cockroach. But I'm a stand-safe ambassador. And that's who I'm representing here today. So I run a really large user group out of the San Francisco Bay area. And I've been running that user group, showcasing end user stories and getting to meet amazing people like Eddie for the last 10 years. So if you're in the Bay area and you want to give a talk, hit me up. I might be able to feature you on a big stage. Perfect. Nice introduction. So in terms of the conversation today, I'm going to ask the speakers to keep their answers relatively concise so that we can have more direct interaction with the audience when we do Q&A at the end. But to get started, Odette, if you can answer this question, it's going to be of a significant interest, I'm sure, in KubeCon, given what's come up recently in the infrastructure area around Terraform versus OpenTofu. But in terms of open source database providers, how are things evolving in terms of the licensing models? And what does that mean? What's the impact then more broadly for Dbass customers? So I think that we're now seeing for the last few years kind of a shift between regular open source licenses like Apache and MIT towards newer licenses like BSL. And I call this shift Open Source 3.0. And basically licenses like BSL, you can think of it that they allow you everything. Like you can see the code, install it, do whatever you like with it, take it to production. The only thing that you cannot do with that is you cannot run a competing service against the creators of the open source. And I think that this is an amazing model for the companies because the companies themselves, they can feel safe about their business model. Every contribution that they do with the code goes to all their clients or all their users and not to the competitors. So the developers wins twice. First, more open source is being generated. And second of all, they can use it wherever they like. Maybe Lisa, I know that you're very, that open source is very close to you. What do you think about BSL licenses? Should it be considered as open source? Wow. OK, we're at an open source conference. And the Linux Foundation and the CNCF have very, very strong opinions about this. And we just got in this huge debate at all things open. I don't know if someone told you to ask me this question because I'm not literally, but I pretty much figuratively got tarred and feathered when I suggested that companies like Cogger HDB and so many other ones out there who have a BSL still should be considered to have an open source product because they have, the code is very transparent and it's all up there on GitHub and you can actually take it and fork it. And they have an Apache license, but they literally don't have a license that says you can take it, fork it, and sell it for commercial use. And apparently I was told that that is the actual definition of being open source. And people feel really, really strongly about this. Luckily, there's good reasons to have a BSL and it still means your company, you can still be very open. They had 25,000 stars on GitHub because of the transparency. But, and it protects you from, I don't know, the Amazons and the companies that gives you a three-year head start anyway to keep your proprietary technology safe. But you can still have a serverless version that you offer for free. You can still offer databases of service and support it through a free Slack channel and a free Discord channel. And you can go a long way and like my previous company had an incredibly generous free tier that tens of thousands of users were using. So I would argue they have more open source users with their community addition than they would have people paying for them. But I will leave it up to you all to decide if that makes you an open source company or not. My answer will be a bit different. Just a bit of a history. Before I was working for five, six years at Elastic and I was at Elastic, the company behind the Elastic Search, probably you already are familiar with that. And I know how many discussions we had to about changing license is, I can tell you it was a difficult decision. And, you know, there was a trend that not only Elastic change the license but also MongoDB, Redis and others. And my opinion is that, you know, you will see that more or more older companies, database companies, they changed their license, but the new database companies, they didn't. So they have, they use Apache 2 or MIT. But I think it's just a matter of time until AWS or other big players, you know, cloud providers are, you know, get making use of their services as you use it. It's just a matter of, you know, they are waiting for those services or those database platforms to be kind of successful enough to be able to do the move. So I think in my opinion, it's just a matter of time until database companies in particular will change their license to a more source available. And I think as a personal advice, I think in order to be protected, I think it's more important to search for other ways. I think it's important to, when you choose your core database to think about to choose something that has supports an open protocol. I think in my opinion, the factor standard protocol for databases is Postgres wire protocol. This is something that it has a more open, you know, license, it has a foundation. And I think if, you know, there are lots of database companies that are implemented that protocol. So I think it's easier to migrate your database to another provider. I think this is the biggest worry that you have to have being locked in to a database provider. All right, Jordan, another duck? Yeah, so I've got a slightly different take on it. And I think one of the reasons that, you know, people have felt the need to have these other licenses is a side effect of the way that open source companies tended to get funded is, you know, you start, you build the open source product, you get a lot of usage and a lot of excitement around it. And then people will give you money based on that. And you're like, hey, I'm gonna build this. I'm gonna build this company. And the thing is like you're then incentivized to get more and more users, you know, of course, but you're not really incentivized at least in the earlier stages to actually make, you know, make the thing that you're going to start making money out of and say, people say, well, how are you gonna make, how are you gonna make money with your business model? And you say, oh, it says. And that's coming later. And one of the problems is that people don't spend enough energy on like, what are the real differentiators in our SaaS service? What are the things that we can do that because of how we're deploying, because of how we're building, like what are the things that we can do that are unique? And then when you do that, when you kind of innovate on the service side, it's a lot harder for an AWS to come along and just sort of like spin up a competing service. And it's really hard as an open source company if you're to spend a lot of money and time and energy and thought on that SaaS service when you're just, you know, you're fighting tooth and nail to sort of build a great open source company. Which I think one of the exciting things that we've did sort of at MotherDuck is because we are working very, very closely with the open source duck DB team. The open source duck DB team is focused on building a great open source database. And we realize, hey, no, we have to build, we have to build a real differentiated service. And we have to get people to pay us even if they can, even if they can sort of clone at least the open source, the open source components of it. But if you look at some of the other, you know, companies building on top of open source that are doing this well, you know, look at like NeonDB, PlanetScale, you know, they don't actually own the, they don't drive the open source projects themselves, but they have found a way to build a really kind of compelling service on top of kind of more pure open source. All right. Another growing trend that we see from the, one thing from the Data on Kubernetes Research Report is about how the fastest growing workloads in terms of new workloads making their way onto Kubernetes or AI and ML. And in one of the talks previously from Dominique, I don't know if he's still here, if that was really cool. He, you know, he was speaking about, about machine learning workloads, also AI. But if we're talking about databases of service, what's the effect here overall that AI is providing for databases and in terms of their use cases? Odette, what are your thoughts on that? So a year ago, I said that in a year, all databases will have a vector search. I think that, I kind of think that I'm right. Even Dragonfly now have a vector search. I have another prediction. I think, and I wonder what's your opinion on that? I think that in a few years, there will not be any database that does only vector search. Okay, this is like, for me, vector search now is like an operator on the database. It's like an equal sign or greater than. There will be no database in the future that supports only an equal sign. So that's that. This is how I think AI is an ML, is changing the database world. But I think that basically the role of a database didn't change because of AI or ML. It's still the predominant way to store data and to retrieve it and to get statistics out of it. I don't think that anyone here would like an ML model to calculate the balance in the bank or something like that, right? Having said that, there are many others, small things that are changing with AI. So SQL is a language. I'm not sure about it. Like I'm sure that it will be in the backend, but maybe it will be more natural, natural language in order to query databases. Also, database optimization. This is something that AI and ML models would be able to do better, like query optimization. But I really like to hear your opinion maybe about vector search databases in the future. Yeah, I was actually thinking, there's a lot of graph databases, like Neo4j for instance, I was just looking at some of the stuff they're doing. I think they're optimizing for AI and for ML. So I think we said earlier, there's a bunch of different types of databases up here represented from the companies that we work at. And I think somebody said it in earlier presentation, there's, you need to figure out what's your use case to figure out what database and what type of database you want. What is it, like 300 database companies out there? But one of the things, I actually looked up a couple of companies. I don't know if you've worked with Apache Flink or Debezium or some of the stuff, Red Pandem and Decodable is doing because right now the amount of data that AI and ML is producing and the amount of real-time queries that people wanna start running against these, we're talking billions and billions of records and transactions like per second, even if you're talking about financial institutions and some of the other really, really big applications that are out there. So I think if someone is figuring out, and I think some of those companies that I, or some of those projects, the first two were open source projects, Apache Flink and Debezium, what they're trying to do is help these platforms be able to actually manage and run those large queries against this real-time data that's coming in. So I think this is a new and exciting part of the industry. I don't know if you have other thoughts on that. So I really agree with this concept that I don't think there will be in the future just the database, they just have the vector search. I think, I mean, we at CETA, we have, we target as a persona, the builder, and we see now the majority of companies that are creating now, they are building AI application or in the AI space. And I think that's why I think this is one of the reason why all these database companies, they need to have an invest in having AI component. And currently most of the users are looking for vector search, but I think in the future, they will have to add more functionality in order to make a easier experience for the users, right? In order to have out-of-the-box experience. So currently in order to, for example, to store embeddings, so I think this is the first step in order to have vector search. But currently in order to have embeddings, you need to have an extra code, you need to have extra code to kind of split all the text into small chunks and then basically compute embeddings by using LLM. And I think that will be the next step that databases will go after, basically go into the direction of getting more functionality from frameworks like Lama Index or Launching in order to kind of improve the experience for main customers that have build applications on top of AI. I also think that AI is an area where if you're building databases as a service, you can really differentiate on top of just pure open source. If you're building interesting AI features inside your database, those are things that you can say, hey, this is why it's better, this is why it's easier to use, this is why it's faster, more optimized, et cetera, than what you could do if you were trying to do this on your own or if you're trying to do this on your own. Maybe even the AI components are open sourced, but it's just a lot more work to tie things together. On the analytics side, we've already seen that people are changing how they're doing their analytics, they're doing their job, just as it's similar to GitHub Copilot is changing how developers are doing their jobs. People are already starting to use chat GPT and other mechanisms to sort of, hey, how do I get the SQL for this? Can you fix the SQL for this? Can you, so I think that every database is going to be incorporating AI features. And then I think the big question that I think Oded alluded to was, okay, at the end of the day, what's the language that people end up using to interface? Do people save English text in their DBT scripts instead of SQL? Will it be reliable enough that you can just describe what you're doing in English? I'm not sure we'll get there, but it'll at least be an interesting journey. I think there's also lots of interesting things where, hey, you're using this AI to kind of make people better at their jobs rather than saying, oh, we're gonna sort of do away with SQL. SQL is, a lot of people have tried to kill SQL and so far it's still raining pretty strong. Never kill SQL, just improve it. That's the issue. Maybe I would like to add something else. Inference is expensive. It's expensive for all those models to run behind the scenes like GPUs. So I also think that a lot of databases would need to increase memory and caching in order to store all those inference results. So we will not need to burn calories or burn fuel to generate again and again the same answers. So that's another thing that I think would change in databases to the future. In terms of other trends that we're seeing in the Kubernetes ecosystem, because obviously AI is one of them, we've seen this across KubeCons, we see the different trends that are coming in, whether as software, supply chain, AI being mentioned, things around chat, GBT, multicod is still a very strong trend in the Kubernetes ecosystem. In terms of building proper data infrastructure, sorry, for multicod environments, what are things that are catching your attention on that? Maybe Jordan will still talk about that. Okay, go for it. I think actually, multi-cloud is super important. I think there's some interesting things where it's not just being in a certain cloud or another, but can you actually make it so that users don't have to even care where the data is? Is this in GCP? Is this in AWS? Is this in US East 1 versus EU, whatever, 4 or whatever? There are cases where people absolutely care. My data is not allowed to leave Germany or Australia or Singapore, but for cases where you can apply constraints to force that to happen, and cases where people don't care, like the data and computation should be able to run anywhere, and particularly, people want lower latency to the end user, can you actually push the work closer to where the actual users are running, and then the edge running in telcos running, whatever data center is actually the lowest latency to the user, and I think once you start to do that, it opens up lots of interesting opportunities to build services around this stuff. Now, and I think to your point on where you're, it really does matter when I polled a bunch of people earlier and asked what do you want to hear about on this panel, data residency, data locality, or probably the most answer to the question scalability, of course, but yeah, I mean, if you, maybe not US East, that's the one that seems to go down the most, for good reasons, by the way, not just ripping on Amazon, I think they get all the hardware first and everything breaks, that's what happens, right? So having a multi-region solution is not just important for data locality requirements, like GDPR, but also for obviously resiliency, and you know, if latency, there's going to be a trade-off, is I guess the point I'm getting to, because if latency really, really, really does matter, I mean, we've had customers say, I don't care, I am running everything in US East one because I need the absolute lowest latency, I'm willing to take the risk of if it goes down, and people might, it depends on what your business model is, right? Others, not a chance, you know, and you have a lot more use cases, like online gambling is a huge one, and where the data has to be, and where the workloads have to be, and also like in-game betting, you know, you're watching the World Cup, I know you're watching the World Cup, you start betting on like penalty kicks per kick, but you live in Australia, but your data center's in Malta, because they all are in Malta, right, they have to be, and I had to throw that for my Maltese friend over there, but you cannot miss that kick, right? You can't have that transaction not happen, so you have to set up your environments in ways that you're going to be able to, you know, make these use cases possible. So what's the right use case for you, and then pick the technology that will support that, and SQL's not going to wait, by the way. I'm a big fan of SQL, and Distributed SQL, especially where I just spent the last three years getting my head wrapped around, and it's really cool stuff, and no SQL's really cool, too. So it's, again, pick your poison. Do you want to turn it? What I would like to add to this is that, I think one of the challenges of having multicloud is that you need to really have to synchronize the data between different cloud providers, and I think this is, you mentioned the Basium, which I was a bit surprised earlier. I don't know if you're familiar with this. Basically, it allows you to copy data and to replicate the data and synchronize the data, but I think the biggest problem with the Basium is that it only replicates the data, but it doesn't replicate the schema or the views or function definitions. So basically, this is something that we are working at, Seja. We are building a new kind of application, or you would call it internal APG Stream. It will be allowed, it will be released soon that also does that. So if you are curious, or if you want to, if you have any issues with the Basium, and would like to hear from me, I would love to hear about that. I want to touch a few things that were said here. I think that as data companies, we have a responsibility to free the data of our users. So basically, clouds are great machines to offer CPU memory and storage. Kubernetes is built on top of that to build an orchestration layer on top of that. And now you can move from cloud to cloud with your computation mainly, but storage still remains the predominant place that you cannot move from cloud to cloud. So the data is not yours to move. And it's our responsibility as data companies to actually make your data portable. So you'll be able to move between the clouds and then you'll be able to do all those wonderful things like going to the lowest latency or going to the lowest cost cloud. The cloud provide them themselves will never offer that, I think. Okay, we're getting towards the end. What I would like to finish with before we have time for questions, I would like each of our panelists to summarize if you can in 30 seconds, about from all the things that we've been hearing today, if end users need to be focusing on one thing and one thing only when it comes to database as a service in the future, what should that be? So think about all the things that we've heard about in today's co-located event, moving forward, what are the things that you think people absolutely must have on their radars? So we can start with you, Jordan. Wow, put me on the spot. Everybody else gets to think for another couple of minutes. I can ask another question before you answer, sorry. This is wildly inappropriate, but if someone works at Mother Duck, are they a Mother Ducker? Absolutely. Okay, sorry. Yeah, no, no. You heard it here straight from the service folks. We realized we either had to pretend that we had no idea what any connotations or whatever would be around the name or we had to just embrace it. So like, I love it. One of our things is embrace the duck. So yes. Oh yeah, yeah, yeah. Yep, hoody penny. All right, so that being said, 30 seconds. I think building databases as a service and running databases as a service, I think it's really key to have a great relationship with sort of the open source team. We're very, very much fans of open source and I think we've sort of built a unique model in how we've done this with the DuckDB team and that we're hoping that if things work out at Mother Duck, people will be like, hey, that's a good way to do it. If it doesn't, maybe we have taken one for the collective team. But having basically the core DuckDB team who's built the open source, they actually own part of Mother Duck. So if things work out for us, it'll work out for them and that helps keep them aligned with what we're doing. But on the other hand, they keep the IP pure and they kind of have the DuckDB foundation that we don't have any control over. And I think I'm hoping that this is a good model for open source in the future. All right, Monica, 20 seconds. No pressure. So I think it's interesting because we are kind of different type of databases here and I think what I only encourage people is to kind of, you know, when you build a data platform, when you're building application, basically you need multiple data stores to, and then you have to replicate the data. You have to synchronize the data between them. So I think it's really important to use the data store that is right for you for the use case that you have. For example, if you want search, probably you need something like elastic search. If you want transactional database, probably you want to go with Postgres or something that implements a Postgres wire protocol. So I think there's always, yeah, I think that's my takeaway on this. Okay, thank you. I think there's a statistic in wine, it used to be in the wine industry, saying if you put an animal on your label, you have 25% more chance of somebody actually picking your wine. I think that applies to databases too. If I'm looking around at this panel, there's a lot of flying cute, flying insects and ducks and cockroaches and dragonflies. So choose your database based on the animal on the label. But also I'll just use a little data-driven point here. If anybody's afraid of running database as a service, first I would say don't be, because this is the future of databases service panel, but it's actually here, if anyone is thinking it's not. And even the largest banks in the world who swore two and a half years ago that they would never move everything to the cloud or even anything to the cloud are moving a lot of workloads to the cloud. So don't be afraid of it, there's lots of ways to solve the problem. And just in case we'll use data to prove the point, I looked up some Gartner statistic numbers, this database as a service was a $10.4 billion industry in 2018, which was 23% share of the whole database management industry. And last year Gartner published that it's a $40 billion industry, which is half of the $80 billion database management market. So don't be afraid of it, it's here. There's lots of companies that are doing it really, really, really well. And again, just pick the workload that you're gonna run in the cloud and start there. All right, cool, we've got one more minute over there. I won't use the minute, I'll use 20 seconds. Think about, I encourage you to think about your users, what do they want from your service? And then your team, how are they going to support the scale that you need in order to service those users? And I think that handling now on your own databases is too complex and you would like to take responsible professional companies to do it for you. Very, very good, excellent. Thank you very much. Thank you very much. You've got time for questions, correct? Any questions? We have a question. So I'm not sure there's an answer to this question, but I'm curious to see what the panel will say. We've moved our open source from some combination of commercializing it with professional services and support and open core with enterprise add-ons. And over the last couple of years, we've just moved to actually having SAS as the business model, right? Where we're selling it as a service instead and at cease of use and all of those things. And AI and machine learning is probably gonna drive that more because it's hard to run those things yourself and you're gonna want to consume those as a service. So do SAS providers now drive open source strategy or does open source strategy need to define SAS providers? And where do the foundations fit in all this? I can take at least the first part because I think that I think SAS is a great model for open source because I think there was, if you're trying to sell services, you're trying to sell some sort of on-prem software, there's always the tension between what you can release as part of the open source core versus what goes in the enterprise version. The, you want to give more to the community, on the other hand, you need to have an actual business. And I think by putting it behind a SAS boundary, it gives you the opportunity to basically charge people for it without them thinking, well, what are you actually providing as value? Because if they were gonna run it themselves in an EC2 instance, they'd have to pay for it anyway and chances are you can do it more less expensively by using multi-tenancy and then also it lets you kind of do a bunch of behind the scenes innovation. So I think that it can help accelerate open source by saying, hey, there's a really viable business model for doing this by building SAS and you don't have the same kinds of divided incentives. Yeah, I was not in all alone like this and I very much agree with like currently what I'm seeing is that SAS is driving open source or BSL licenses or licenses like this. We are at the stage where we find the business model that can sponsor those. I think that there's almost no difference between the community version of the product and the service other than our ability as a service to optimize cost, our ability as a service to simplify the service that is provided. And I think that this is the main value that the SAS contributes on top of the open source. It's no longer an open core in most cases because almost all of the source and all of the features are out there. So the traditional way of open source is that you create an open source project, become successful and then you build a SAS offering. In my case, we did it the other way around in the sense that we started with the SAS and we are open sourcing some parts of the product that make sense to be used as a standalone project and solve a specific pinpoint. What I want to emphasize here is I think many people don't realize this but being in an open source space for, I don't know, 10 years or so, running an open source company or an open source project, it's a lot of work because you need to, besides dealing with the community but you also have to build packages for different distributions and so on. So in my opinion, it kind of adds up 40% of your resources. So what I'm trying to do with Sage is we try to concentrate all our resources in building features for users instead of going on the route of building packages and so on in order to kind of build features that kind of make the life of the developers easier and allows us to move faster with the product. You cannot imagine at Elastic how complex our spreadsheets were, this version work with this version and don't even get into account all the distributions and so on. So definitely I think it has a huge advantage to having a SaaS offering and we are betting for that. Okay, time for one more question, we gotta cut it. All right, well, we have the excuse to continue the conversation once we finish up. I want to say thanks again to our panelists, can we give them a round of applause? Also shout out to all the people who put the event together, it's the first time they're doing this, I hope it's not the last. So thank you very much and let's keep going.