 Welcome back everyone to theCUBE's coverage here in Las Vegas for AWS re-invent. Day three of four days of coverage is going to be brutal. Biggest event of the year for theCUBE. And probably our largest event, 11 years cover in AWS re-invent has been quite the journey. This year, it's a major transmission of Gen of AI and the conversation is about data, how that's impacting the keynotes going on right now. It's finishing up SWAMI. We'll be on tomorrow at four o'clock here on theCUBE. And I've got two great guests here going to unpack this. We've been on theCUBE before. Really it's about data, data flow, and orchestration, data science, and how those platforms are going to emerge. We've got Andy and Stephen here from Stromer. Thanks for coming on. Good to see you again. Welcome to theCUBE. Thanks, John. So let's get into the data conversation first. Take a minute to explain Stromer real quick for the folks who haven't seen the videos before. We're watching now. Yeah, in brief, where the commercial developer behind open source airflow, probably most of your listeners will have heard of airflow. It's created by Airbnb for managing complex data pipelines. We've just taken that and made that enterprise grade. We make a lot of the contributions to the airflow project and have a cloud service that deploys that. And Steve, you just joined as CEO. I did, John. I'm sorry. Yeah, I did. I did, yeah. So I joined about five months ago. And one of the things that got me really excited about this opportunity was the amount of data that's being passed through workloads application to application is growing on a daily basis. And when you look at the modern data stack, there's only one standard right now to be able to make that happen, which is through airflow. And as Steven mentioned, that's an open source project that we're the primary contributors towards. And we also make our commercial product that has a lot of higher value add feature sets above the airflow product itself. And those feature sets are front and center with delivering helping companies deliver on AI and ML initiatives today. What's interesting about the open source conversation here at Reinvent, we just are perplexedly on stage there. They took all the open source models they trained and they're making that available. It's the open versus closed conversation. And it's interesting, some of the big, large language models are actually not open, the source. So open source has been a big driver and it's been around for a while, airflow, Airbnb was the pioneer behind that. But right now the data pipeline conversations about scale and how it's going to be adaptable, right? So if generic comes in, what data is available? So we've been asking the question, is what's the problem statement right now? Is it scale? Is it the pipelines? Is there are people struggling with even figuring out what should I do from my pipelines? Are they going to change? They'd be static. What's this current state of the problem? I think there's lots of challenges, but in a way the one problem there is, is the model. The models are great, right? It's how you bring the data to the models because there's so many different data sources whether you're an actual LLM provider or you're just trying to use LLMs in your conversational assistance or in your content creation. You've got to bring in a lot of data and process that. And so it reduces in a way to a data engineering problem which is a reason I think why Airflow is still being widely used for these sort of ML pipelines. It's interesting, Andy, the conversation with platform engineering has kind of become kind of the broad SRE DevOps person now they're doing platform engineering at KubeCon that was the big conversation. But data is the same problem. Scale, data engineering is a term that's being used. It's not about databases anymore. It's about architecting data. Data flows and orchestrating them. Explainability is a big part of it. Observability of the data. Now developers are going to have to have guardrails now for developers who are shifting left with data. I think we talked about that in our last interview. How's that shaping up from an industry standpoint? How do you see this evolving? Are customers like on point on this or is it still in kind of the platform kind of groups? I think it's still in the platform groups, John. I think the companies that we do business with which are hundreds around the world and the thousands of other companies that we're talking to they're trying to figure out how to structure and organize their infrastructure in their organizations to best take advantage of this massive amount of data that can build, you know, tons of value into their organization. So today, you know, when we look at what Astronomer does and what Airflow does, really it does three things on top of allow the data engineer to have access to, you know, delivering in building pipelines. One is it allows companies to centralize more of their work and collaboration between software engineering, data engineering and ML ops which is a big challenge today. And that centralization in itself, we see more and more companies wanting to achieve. The second is the security and governance around that. So when we look at organizations today, just think being the CIO of a large Fortune 500 company, there's data engineers, ML engineers and software engineers all developing in silos. And it's scary because the governance and security around that information is not high, typically either. And Astronomer and Airflow allows companies to bring not only a centralized environment, but a highly governed and secure environment as well. And from there, you know, we're going to allow people to add more value to the business through their different development efforts through that centralization and governance that we're bringing to the table. Yeah, so you guys bring some reliability, kind of stability to a foundational playbook they can build on top of. We do. We do and we do it in a modern data stack as well. So, you know, one of the things that we spent a lot of time on Steven has with our AI team and our company is how do we continue to innovate to allow data engineers access to information where they can build ML applications, language models and add value back into their organization. The integrations that we just released recently is an example, yesterday actually is an example of some of those high value feature sets that we're delivering to the open source community, which we fully believe in, is the standard that organizations should go towards. And then also our software product itself. Yeah, what's interesting is that when I see the bedrock announcements yesterday, it's clear that choice is a great strategy. Choice and open always win, in my opinion. And that means developers are going to start getting, jamming some apps out there with the MLMs. That means the data infrastructure has got to get cleaned up. What's the problem statement for your customers? Walk me through an example, because this is again, you're in the engineering side of the data, which sits on the infrastructure and then that next layer up. So you're in that, you're feeding the AI apps, basically, in my opinion. Well, that's my take, what's yours? It's interesting, but before you said, in terms of the problem statement, you said it's not about database anymore. I think it is still about databases, but it's also about a million other things as well, right? You've seen the sort of reoccurrence and sort of reappraisal of vector databases being very commonly used right now. But then you've also got traditional databases and you've got streaming data sets and you've got cloud machine learning platforms. How do you bring all of those things together? That's really the problem, I think. It's like, we joke sometimes that the most common word or phrase we hear is the Wild West, right? When we look at the data science teams, even the data engineering teams, everybody's doing their own thing. So being able to orchestrate those together and bring them together on a common platform is an imperative, because then you can stop being more inventive and more creative, but until you've got control of those pipelines, then you're kind of on shaky ground. So you're saying then the problem statement's not so much, it's more about the complexity of the... Of the environment. Of the environment, because you have multiple database, now we're hearing it's not one LLM though, rule the world, I get that, I agree. So you've got all the stuff out there, but it's got to be pulled together. That's right. Okay, from a data perspective, and if you have the wrong data going in the wrong place, you're just scaling bad data. It's already true, right, with traditional machine learning models, that you need to be pretty damn sure that this prediction was backed up by data that you can reproduce and have transparency into. With large language models, which are often very much non-deterministic, that ability to see the lineage, how you got to that result is even more important. That's a big point that determines the rest of it is a huge factor. That changes everything. Take us through a customer example of how you guys engage, who you talk to, who's the buyer, who's the user, and how that evolves. I mean, are they reconciliating their data? Is it a reset? Is it refactoring? Is it just adding more to the infrastructure? What's the environment look like? Take us through an example. Yeah, sure, and I think, you know, Steve, obviously, let's tag team on this. So, I think typically what happens is that a data engineer goes to the airflow open source product and starts to use that product and it becomes viral within an organization. At that point, we call that a line of business. A line of business usually starts to look and say, well, there's this airflow set of pipelines out there. How do we start to bring this together where, you know, we can have more visibility, more security, and more fidelity around the data that's being delivered into applications that are running our business, frankly, whether it's a supply chain application or a retail application or regulatory applications or ML or AI application. So, that's when we start to engage because these organizations want higher level feature sets that can guarantee that the data that's being delivered and powering their business is going to be accurate data. It's going to have SLAs. It's going to have more governance around it and there's going to be more security around it. And it's going to be centralized. That's another big area. I know I mentioned that before, but that's centralization and collaboration of more teams using airflow as their central nervous system, delivering this data to not just a line of business as it grows, but their entire company. That's typically our engagement model and where we see companies getting higher and higher value out of our product, it's when they start to deploy larger and larger data sets and they start to take advantage of the features that we're delivering on those data sets. My favorite example, I think, is the Texas Rangers. I think that's a sports team. Yeah, they've been on theCUBE. They're CUBE alumni too. They've been on theCUBE. Yeah, I think they've been doing pretty well lately and actually a lot of their success, I think we can justifiably claim is driven through airflow actually and through the Astro platform that's running it for them. They're getting data off the field. They're getting data, medical data. They're definitely playing the analytics game. That was obvious during the World Series. That was fun to watch. My other favorite is the firm Laurel that uses generative AI models to summarize the work that lawyers do and accountants do and other professional services do. Again, using data that they gather pretty much in real time from what you're doing on your laptops to summarize the work that you're doing. I mean, it just completely streamlines the work that people have to do. It's exactly the right application of these models. I mean, I appreciate it, Andy. I appreciate that description too because that's what I was trying to get to is that you're talking about engineering. You mentioned data engineering. I think that's the key word. There's real engineering going on around the data. It's not like I need to have a data strategy. It's like someone wants to get the tool. They jump into open source. Essentially, tire kicking was their version of open source. And they go, this is going to work great. And then they start playing with it. Then they go, okay, this is legit. Let's go to the next level. So it's incremental sounds like what you're getting to, right? It is. It isn't like, so that nice on boarding is for that data engineer, right? And that is what that airflow product is really, really meant for. And as it starts to grow and grow and grow, and as frankly these companies start to use airflow to start to generate more data and deliver it into these big data sets and ML and AI applications, it starts to become a line of business. And in that line of business, we're empowering more and more engineers to get the job done and deliver more business value. And at that point, that's where we're starting to see more executives start to pay attention to what's going on with this airflow thing. Can we get higher features to a higher value out of airflow? I mean, you start to see airflow sprouting up in organizations. I mean, like the world's largest retailer probably has like 500 separate teams running on like a thousand airflow deployments. So you've got to bring some control to that. You know, I just love a little smile because I just love to see the success of how end users are contributing their stuff to open source. That's right. Uber did it, Airbnb does it, Intuit did it, a lot of it. And that's awesome. I mean, that means Open Source is one. It's like, it's now the software industry. So it's a great example. But the other thing that I like is, I think this is what's coming out of this show is that the data engineers, this persona is legitimately building. They have the key to AI because if AI doesn't have that pipelining built out and who knows where it goes. AI could build the pipelines. Like a 3D printer, I remember the first time I saw a 3D printer, I'm like, well, that's magic. I can imagine AI building pipelines on the fly in the future. So just the engineering has to get done and that's, I think, going to be a big part of who's successful with Generve AI because the non-determined pieces are huge. We've actually started to introduce some features into the product to allow those data engineers to build those pipelines using large language models. You actually see that in the product. Really? Yeah, I mean, again, it's just magic, you know? But it also can build, it can also build the wrong pipelines too. You got to be careful. It can. And look, we take a lot of spirit from that too internally. So our company, over half our company is data engineers and we really want to eat our own dog food, if you will. So we've created our own AI and ML algorithms that we're using internally and we stress test those a lot quite a bit. The one thing, John, to your point is that governance around that data to make sure that it's being used the right way. And we learn a lot from that internally as we start to have our data engineers create applications. I think the data engineer is going to be as important, if not more than, say, the security person because we had the same conversations with security, build security from day one. I think you're going to hear that conversation about governance because you can't scale AI unless the government's actually built in from day one where it's automated into the process or because how does the data run policy or make decisions on where to go in real time? Well, I'd say there's like two big conversations that we have with our customers. One is, can you deliver data and empower data engineers to do their job? One, and then two is, what's the security and governance around that data? And so to your point, John, I think that is front and center on everybody's mind as well. Guys, great insights. We're unpacking it in real time. The data engineer is going to set the table for AI. It's going to be the theme of this interview. As we wrap up, last minute we got, give a quick update on the company, size, scope, target customers you guys hit, that certain profile you're going to hire. Give a plug for the company. Yes, so first we're doing phenomenally well, right? So we're growing at an accelerated pace. We're about 200 employees. We're a global company. We have hundreds of customers. Those customers are all around the world. We don't have a specific segment of customer as long as somebody is using Airflow or has interest in using Airflow. That's a great company for us to talk to. And we are hiring, so anybody that wants to come work for a great company is free to hire us. Yeah, for sure. I got to ask the question, because you mentioned data engineering, a lot of data engineers work there. What's the culture of astrometer? If you had to pin it down, every company has their culture. Data engineering, or why would you describe? You mentioned that half the company is data engineers. That's a big part of the culture is that we do eat our own dog food. We have our data science team that is of course built on Airflow and on the Astro platform. So it's very much like try it before we sell it and make sure that it works. It's a real culture built around that open source foundation. I mean, you have to earn the trust. If your persona is building and engineering their future data, they're going to want to use the product. They're going to have confidence, full confidence that it's going to work and they got to have companies support it. So congratulations. Continue success. Great to see you. Congratulations on the CEO. Taking the helm. Thanks, John. I appreciate it. I'm going to agree time. Thank you. All right, more CUBE coverage coming. Day three, we'll be right back back to the studio. We'll be right back here shortly.