 Welcome back everyone to theCUBE's live coverage here in Vancouver, British Columbia for Open Source Summit 2023. I'm John Furrier, your host with Rob Streche, analyst of theCUBE. A great guest, Manfred Moser, who's the director of technical content and Trino developer relations at Starburst, a rapidly growing company doing extremely well in the data space, obviously open source. Manfred, thanks for joining us today on theCUBE. Appreciate it. Thank you for having me. It's a pleasure to be here. First of all, great. This is Commander Bon Bon, our mascot that we came up with when we changed the name and one of the kids of our engineering leaders came up with the name Commander Bon Bon. We don't have a background story, but it's very cute and we got all sorts of like interesting. You got to get to register on the trademark on there. I'm only kidding. So that would be very proprietary. Yeah. And at the conferences we run, we have a full on like bunny hat walking around and like you can shake Commander Bon Bon's hand and stuff like that. It's fun to have a good mascot. Well, thanks for coming on. What do you guys do? What's Trino about? Give us a quick update. So Starburst is the company that works with the Trino open source project. The Trino open source project has been started at Facebook in 2012 to solve a specific problem, mainly being large data lakes powered by Hive and Hadoop, trying to get the data out there. And that was always like very painful. Queries would talk, take forever and be very slow. And the business analysts would go, like, I want to understand this data, what's going on there. And then like three hours later, they'll get some result and they're like, oh, that was a syntax error or like they didn't get the result. And then they wait three hours again. So Trino solved that problem at Facebook for those large data lakes with what's called the Hive connector. But over the last 10 years or 11 years nearly now, it evolved to do much more. So it is a distributed query engine. So you're not running one computer. You're running a cluster of like a hundred, a thousand servers that then process the data to run your queries really fast to do analytics. And just a point of clarification for the folks watching, Hive and Hadoop was big data first generation. Exactly. Now we're in the second wave with data lakes, data lake houses, data bricks, snowflakes, starbursts, proprietary data lakes. What's the current state of the market relative to that upgrade from the old Hadoop Hive this is at Amazon. Is it cloud-based? Everyone started to more and more feel the pain of managing Hive, right? Like Hive data stores became very powerful but also very painful to manage because like your data always changes how it's organized and stuff. And then Iceberg and Delta Lake and Hoody are like the new table formats that make that much more manageable. And you can go do things like jump back in time and say, well, what happens two weeks ago? What was the data then? What is it now? And a lot of advantages and also much more performance. So that's one big thing that the data changed where it was, like in what format it is stored. The other thing that changed is where is the data, right? Like no one runs their own data center anymore and has an open stack cluster or something with like Ceph or something. Like people run MSN S3 or they run Google Cloud Storage or Azure Blob Storage. And Trino has many, many connectors to query all these different data sources at the same time. So you can get some stuff out of Iceberg or out of your Oracle database all at the same time. Who is going and actually contributing these integrations with these different data? So we have a very large community of users. We have like close to 10,000 users on our Slack channel. And we work with many companies that embed Trino in their commercial products. Starbust has two of them. Starbust Enterprise, which allows you to run Trino on-prem with additional security, my user interface and stuff like that and Starbust Galaxy in the cloud. But there's also many others like famous examples, probably Amazon with AWS Athena. That runs actually Trino inside. People don't know so much about it, but the power of Athena comes from Trino. And there's many other platforms and companies like at last Trino Fest, our summit, our conference. We had Apple present, for example, or we had SK Telecom from South Korea present. All these companies use Trino to try to get insights out of their data. Do you see an uptick in this since, I mean, again, we're about six months into this new everybody pushing LLMs and really looking at how they get data into these LLMs and into AI in general. Have you seen an uptick in Trino since then? It's not so much an uptick in Trino's from that, but it's just yet another massive data point that points to the importance of data, right? Like it doesn't matter where it's stored. There's different data that's stored in different systems and there are always going to be different systems that are better, right? Like there's large language models, there's very complex data structure sometimes involved. And then you have the very traditional relational databases that where the data is very structured, right? And ultimately you also have different teams that understand the data and need to work with these different databases. Trino is a system allows you to integrate them and query them at the same time with high enough performance to understand what's going on. So you can get those insights out of your lake house without having to move the data around. What's the use cases, main use cases that you're seeing right now? And what's the typical environment if they've been watching would they want to get involved with Trino? When do they bring it in? What are some of the pain points? What are some of the things that would get people to do it? And then what are the top use cases? Top use cases are things like having data already in a lake house or in a data lake and it's not being fast enough to query and get the analysis out of it. So that's definitely something where on the one hand you want Trino to query it. The other thing then is that you want to evolve that to away from using just Hive to a more modern storage like Iceberg maybe to move it into the cloud. Trino allows you to connect to all the different data sources at the same time so you can have some data migrated but still have some other data in your Oracle data warehouse and query them at the same time. Yeah, do you have a big idea about how prevalent connections are to things like Iceberg? Because Iceberg's fairly new. I mean, relatively new. Iceberg is fairly new, but for example we had at Trino Summit last November and there were a whole bunch of presentations of large organizations that are moving to Iceberg and then obviously there's like longstanding system companies that have been instrumental in the Iceberg community like Netflix and others where it came out of where it is used in like really high use case production. So if you are adopting Iceberg or even also Delta Lake water, these formats are mature and they're way more mature and better performant than what you have with Hive. So if you're already in the lake, data lake sort of ecosystem, don't bother going to Hive, go straight to one of the modern ones. If you're stuck in the old ones, try to get to the newer systems. No, she ain't even going to Hive right away. Yeah, no. It's mostly legacy because people realize that was not working. The loss of ownership was high, talent, acquisition. And I think it also is a cost, right? I think the cost aspect of it because when you're talking about Iceberg and being on top of an S3 compliant type of storage, right, you're working with objects, is that just another additive for things like Trino where, hey, because reading out of S3 is not bad, but if you're trying to write back in or do other types of things, it may take a little bit longer. Well, that's just like cost savings and sort of like, maybe I kind of like cost juggling is also definitely a use case. Like a lot, there's a lot of people that are like, oh, there's like proprietary database and massive wearers, it's too expensive and we have so much data coming in, we can't handle it anymore, but we don't really need to query the old database so like all the time. So let's throw it into a cheaper storage. And then there's lots of options, right? Like obviously you can go the cloud providers, but there's also alternative S3, the protocol kind of is like, not proprietary, there's various alternatives. So you can really shop around and find your solution and also move the data between because Trino allows you to write queries that pull data out of one and put it into the other system, right? So you can really like move it around dynamically. Well, great to have you on the queue. Appreciate you coming on. Rob and I both enjoy the conversation. I guess my final question, Rob probably has one too, but is talk about your journey in the industry. We were talking before we came on camera that you and I both were before open source, we were stealing software, but copying on disks. Open source has gone so, so it's gotten so big and so great, it's won. Everything's winning, keeps winning. But open source is under under pressure with the tsunami of AI coming, more data is coming. So things like Trino and other tools need to be in place. What's your view on where we are with open source? Talk about your journey and then what you see in front of us now in the open source world. Yeah, the software industry called open source. Yeah, no, I started as a child hacking on computers and like hardware and then much more software and moved into the open source ecosystem because I like the collaborative way of interacting with other people, working together, innovating together. And AI is great, but ultimately it's just going to be another tool. Like it's like you used to have a text editor where you used to like write the code and then get out of the text editor to start compiling. Well, now it goes on the fly in the ID. Well, the next thing is now the AI will help you write the code even better than just like code completing the signature of a method. Now it'll really like maybe write the whole block, but ultimately the innovation of you as a person, understanding the business problem and understanding the world that the whole thing evolves in is not going to go away. The human aspect of it is you got to check the code, you just auto-generated. Yeah, and like we had fun with ChatGPT on the Trino website as well. We got ChatGPT to write a poem about Trino and we'll roll like three or four different ones. You get some tweet storms too by the way. Yeah, yeah. Write a blog post. And it's all fun to play with but ultimately whatever the output is, you still need to check what comes out. Well, if it does our cube interview next time, that's going to be a problem because we'll be out of a job. Yeah, I'll have more square eyes. We're going to be out of a work. It's going to replace my job. I'm really scared. Oh, you shouldn't be scared. You're doing a great job. I don't think you need to worry. I mean, irreplaceable. That's right. So tell us what's next for Trino and how people can get involved. So we have a Trino.io website where you can find out all about how the project is organized, links to our community chat system, our GitHub repository and also our upcoming conference. Like in June, we have a free virtual conference two half days with more news. What's the latest and greatest? Talking about real use cases from people in the industry and what's coming in the project. We're cutting new releases every week. We're getting lots of contributors and we're working with many large companies on improving the collaboration. And obviously here at the conference, it's great to meet many of them in person and strengthen that collaboration even more. So it's good. What's your big takeaway from this event so far? I know it's day one, but what's the vibe? What's the main story? I think it's a bit of a sigh of relief in a way. Like even in the keynote from Jim, it was like, it's great to be in person again. And I think that is really true. It is still a bit different because you see people running around in masks and stuff like that as well. And I think that's really good to be conscious of that. But ultimately that human interaction and that personal connection, refreshing those connections with people that you haven't met at in-person events for a while is crucial and that's really useful and great. Thank you for coming on. Congratulations on your success. Thank you. And we'll be back with more coverage here at theCUBE, here in Vancouver. We've got another day, we've got two days, three days of live coverage. We're in day one, I'm John with Rob Stretching. Stay with us for more coverage after this short break.