 Hey everyone, welcome to this CUBE Conversation as part of the AWS startup showcase season three episode one featuring Astronomer. I'm your host Lisa Martin. I'm in the CUBE's Palo Alto studios and today excited to be joined by a couple of guests, a couple of co-founders from Astronomer. Baraj Parikh is with us as is Pella Paraza Calderon. Thanks guys so much for joining us today. Excited to dig into Astronomer. Thank you so much. Yeah, we're going to be talking about the role of data orchestration. Pella, let's go ahead and start with you. Give the audience that understanding that context about Astronomer and what it is that you guys do. Yeah, absolutely. So Astronomer is a, you know, we're a technology and software company for modern data orchestration, as you said, and we're the driving force behind Apache Airflow, the open source workflow management tool that's since been adopted by thousands and thousands of users and we'll dig into this a little bit more. But by data orchestration, right, we mean data pipeline. So generally speaking, getting data from one place to another, transforming it, running it on a schedule and overall just building a central system that tangibly connects your entire ecosystem of data services, right? So what that's Redshift, Snowflake, DVT, etc. And so tangibly, we build, we had Astronomer here build products powered by Apache Airflow for data teams and for data practitioners so that they don't have to. So we sell to data engineers, data scientists, data admins, and we really spend our time doing three things. So the first is what we build Astro, our flagship cloud service that we'll talk more on. But here we're really building experiences that make it easier for data practitioners to author, run, and scale their data pipeline footprint on the cloud. And then we also contribute to Apache Airflow as an open source project and community, right? So we cultivate the community of humans and we also put out open source developer tools that actually make it easier for individual data practitioners to be productive in their day-to-day jobs, whether or not they actually use our product and pay us money or not. And then of course, we also have professional services and education and all of these things around our commercial products that enable folks to use our products and use Airflow as effectively as possible. So yeah, super, super happy with everything we've done and that gives you an idea of where we're starting. Awesome. So when you're talking with those, Pella, those data engineers, those data scientists, how do you define data orchestration and what does it mean to them? Yeah, yeah, it's a good question. So, you know, if you Google data orchestration, you're going to get something about an automated process for organizing silo data and making it accessible for processing and analysis. But to your question, what does that actually mean? You know, so if you look at it from a customer's perspective, we can share a little bit about how we at Astronomer actually do data orchestration ourselves and the problems that it solves for us. So as many other companies out in the world do, we at Astronomer need to monitor how our own customers use our products, right? And so we have a weekly meeting, for example, that goes through a dashboard and a dashboarding tool called Sigma, where we see the number of monthly customers and how they're engaging with our product. But to actually do that, you know, we have to use data from our application database, for example, that has behavioral data on what they're actually doing in our product. We also have data from third-party APIs tools like Salesforce and HubSpot and other ways in which we actually engage with our customers and their behavior. And so our data team, internally at Astronomer, uses a bunch of tools to transform and use that data, right? So we use FiveTrans, for example, to ingest. We use Snowflake as our data warehouse. We use other tools for data transformations. And even if we at Astronomer don't do this, you can imagine a data team also using tools like Monte Carlo for data quality or Hightouch for reverse ETL or things like that. And I think the point here is that data teams that are building data-driven organizations have a plethora of tooling to both ingest the right data and come up with the right interfaces to transform and actually interact with that data. And so that movement and sort of synchronization of data across your ecosystem is exactly what data orchestration is responsible for. Historically, I think, and Broj will talk more about this, historically, schedulers like Cron and Uzi or ControlM have taken a role here. But we think that Apache Airflow has sort of risen over the past few years as the de facto industry standard for writing data pipelines that do tasks, that do data jobs, that interact with that ecosystem of tools in your organization. And so beyond that sort of data pipeline unit, I think where we see it is that data orchestration is not only writing those data pipelines that move your data, but it's also all the things around it, right? So CICD tool and secrets management, et cetera. So long-winded answer here, but I think that that's how we talk about it here at Astronomer and how we're building our products. Excellent. Great context, Pella. Thank you. Broj, let's bring you into the conversation. Every company these days has to be a data company, right? They've got to be a software company, whether it's my bank or my grocery store. So how are companies actually doing data orchestration today, Broj? Yeah, it's a great question. So I think one thing to think about is like on one hand, you know, data orchestration is kind of a new category that we're helping to find. But on the other hand, it's something that companies have been doing forever, right? You need to get data moving to use it. You know, you've got it all in place, aggregated, clean it, et cetera. So when you look at kind of what companies out there are doing, right? Sometimes if you're a more kind of born in the cloud company, as we say, you'll adopt all this cloud native doing things your cloud provider gives you. If you're a bank or another sort of institution like that, you know, you're probably juggling an even wider variety of tools. You're thinking about a cloud migration, you might have things like Cron running one place, Uzi running somewhere else, Informatica running somewhere else while you're also trying to move all your workloads to the cloud. So there's quite a large spectrum of what the current state is for companies. And then kind of like Paolo was saying, Apache Airflow started in 2014, and it was actually started by Airbnb. And they put out this blog post that was like, hey, here's how we use Apache Airflow to orchestrate our data across all their sources. And really since then, right, it's almost been a decade since then, Airflow has emerged as the open source standard. And there's companies of all sorts using it. And it's really used to tie all these tools together, especially as that number of tools increases, companies move to hybrid cloud, hybrid, multi cloud strategies and so on and so forth. But you know, what we found is that like, if you go to any company, especially a larger one, and you say like, hey, how are you doing data orchestration? They'll probably say something like, well, I have five data teams, so I have eight different ways I do data orchestration, right? This idea of data orchestration has been there, but the right way to do it, kind of all the abstractions you need, the way your teams need to work together and so on and so forth hasn't really emerged just yet, right? It's such a quick moving space that companies have to combine what they were doing before with what their new business initiatives are today. So, you know, what we really believe here at Astronomer is Airflow is the core of how you solve data orchestration for any sort of use case. But it's not everything, you know, needs a little more. And that's really where our commercial product, Astro comes in, where we've built not only the most tried and tested Airflow experience out there, we do employ a majority of the Airflow core committers, right? So, we're kind of really deep in the project. We've also built the right things around developer tooling, observability and reliability for customers to really rely on Astro as the heart of the way they do data orchestration and kind of think of it as the foundational layer that helps tie together all the different tools, practices and teams large companies have to do today. That foundational layer is absolutely critical. You've both mentioned open source software. Pella, I want to go back to you and just give the audience an understanding of how open source really plays into Astronomer's mission as a company and into the technologies like Astro. Yeah, absolutely. I mean, we, so we had Astronomer started using Airflow and actually building our products because Airflow is open source and we were our own customers at the beginning of our company journey. And I think the open source community is such is at the core of everything we do. Without that open source community and culture, I think we have less of a business and so we're super invested in continuing to cultivate and grow that and I think there's a couple sort of concrete ways in which we do this that personally make me really excited to do my own job. For one, we do things like we organize meetups and we sponsor the Airflow Summit and there's these sort of baseline community efforts that I think are really important and that reminds you, hey, there's just humans trying to do their jobs and learn and use both our technology and things that are out there and contribute to it. So making it easier to contribute to Airflow, for example, is another one of our efforts. As Viraj mentioned, we also employ, you know, engineers internally who are on our team whose full-time job is to make the open source project better. Again, regardless of whether or not you're a customer of ours or not, we want to make sure that we continue to cultivate the Airflow project in and of itself and we're also building developer tooling that might not be a part of the Apache open source project but is still open source. So we have repositories in our own sort of GitHub organization, for example, with tools that individual data practitioners, again, customers or not, can use to make them be more productive in their day-to-day jobs with Airflow, writing DAGs for the most common use cases out there. The last thing I'll say is how important I think we found it to build sort of educational resources and documentation and best practices. Airflow can be complex. It's been around for a long time. There's a lot of really, really rich feature sets and so how do we enable folks to actually use those and that comes in, you know, things like webinars and best practices and courses and curriculum that are free and accessible and open to the community are just some of the ways in which I think we're continuing to invest in that open source community over the next year and beyond. That's awesome. It sounds like open source is really core not only to the mission but really to the heart of the organization. Varaj, I want to go back to you and really try to understand how does Astronomer fit into the wider modern data stack and ecosystem. What does that look like for customers? Yeah. So both in the open source and with our commercial customers, folks everywhere are trying to tie together a huge variety of tools in order to start making sense of their data. I think of it almost as a pyramid. At the base level, you need things like data reliability, data freshness, data availability and so on and so forth, right? You just need your data to be there. I'm sorry. You just need your data to be there and you need to make it predictable when it's going to be there. You need to make sure it's kind of correct at the highest level, some quality checks and so on and so forth. And oftentimes that's kind of takes the case of ELT or ETL use cases, right? Taking data from somewhere and moving it somewhere else, usually into some sort of analytics destination. And that's really what businesses can do to just power the core parts of getting insights into how their business is going, right? How much drama you and I had, what's in my pipeline and sales force and so on and so forth. Once that kind of base foundation is there and people can get the data they need, how they need it, it really opens up a lot for what customers can do. I think one of the trendier things out there right now is MLOps and how to actually put machine learning into production. Well, when you think about it, you kind of have to squint at it, right? Like machine learning pipelines are really just any other data pipeline. They just have a certain set of needs that might not be applicable to ELT pipelines. And when you kind of have a common layer to tie together all the ways data can move through your organization, that's really what we're trying to make it so companies can do. And that happens in financial services where we have some customers who take app data coming from their mobile apps and actually run it through their fraud detection services to make sure that all the activity is not fraudulent. We have customers that were on sports betting models on our platform where they'll take data from a bunch of public APIs around different sporting events that are happening, transform all of that in a way their data scientists can build models with it, and then actually bet on sports based on that output. One of my favorite use cases like to talk about that we saw in the open source is there was one company whose their business was to deliver blood transfusions via drone into remote parts of the world. And it was really cool because they took all this data from all sorts of places, right? Kind of orchestrated all the aggregation and cleaning and analysis that had to happen via air flow. The end product would be a drone being shot out into a remote part of the world to actually give somebody blood who needed it there because it turns out for certain parts of the world the easiest way to deliver blood to them is via drone and not via some other thing. So all the things people do with the modern data stack is absolutely incredible, right? Like you were saying, every company is trying to be a data-driven company. What really energizes me is knowing that for all those best-to-be tools out there, that power of business, we get to be the connective tissue or almost like the electricity that ropes them all together and makes it so people can actually do what they need to do. Right. Phenomenal use cases that you just described. The variety alone of what you guys are able to do and impact is so cool. So, Pella, when you're with those data engineers, those data scientists in customer conversations, what's your pitch? Why use Astro? Yeah, it's a good question. And honestly, to piggyback off of Mirage, there's so many, I think what keeps me so energized is how mission-critical both our product and data repetition is. And those use cases really are incredible. And we work with customers of all shapes and sizes. But to answer your question, right? So, why use Astro? Why use our commercial products? There are so many people using open source. Why pay for something more than that? So, the baseline for our business really is that Airflow has grown exponentially over the last five years. And like we said, has become an industry standard that we're confident there's a huge opportunity for us as a company and as a team. But we also strongly believe that being great at running Airflow doesn't make you a successful company of what you do. What makes you a successful company of what you do is building great products and solving problems and solving pain points of your own customers, right? And that differentiating value isn't being amazing at running Airflow. That should be our job. And so, we want to abstract those customers from needing to do things like manage Kubernetes infrastructure that you need to run Airflow and then hiring someone full-time to go do that, which can be hard, but again, doesn't add differentiating value to your team or to your product or to your customers. So, folks to get away from managing that infrastructure sort of a base layer. Folks who are looking for differentiating features that make their team more productive and allows them to spend less time tweaking Airflow configurations and more time working with the data that they're getting from their business. For help getting staying up with Airflow releases, there's a ton of, we've actually been pretty quick to come out with new Airflow features and releases and actually just keeping up with that feature set and working strategically with a partner to help you make the most out of those feature sets is a key part of it. And really, especially if you're an organization who currently is committed to using Airflow, you likely have a lot of Airflow environments across your organization. And being able to see those Airflow environments in a single place and being able to enable your data practitioners to create Airflow environments with a click of a button and then use, for example, our command line to develop your Airflow DAGs locally and push them up to our product and use all of the sort of testing and monitoring and observability that we have on top of our product is such a key. It sounds so simple, especially if you use Airflow, but really those things are baseline value props that we have for the customers that continue to be excited to work with us. And of course, I think we can go beyond that and we have ambitions to add a whole bunch of features and expand into different types of personas, but really our main value prop is for companies who are committed to Airflow and want to abstract themselves and make use of some of the differentiating features that we now have at Astronomer. Got it. Awesome. One thing I'll add to that, Paula, and I think you did a good job of saying is because every company is trying to be a data company, companies are at different parts of their journey along that. And we want to meet customers where they are and take them to where they want to go. So on one end, you have folks who are like, hey, we're just building a data team here. We have a new initiative. We heard about Airflow. How do you help us out? On the far other end, we have some customers that have been using Airflow for five plus years and they're like, hey, this is awesome. We have 10 more teams we want to bring on. How can you help with this? How can we do more stuff in the open source with you? How can we tell our story together? And it's all about kind of taking this vast community of data users everywhere, seeing where they're at and saying, like, hey, Astro and Airflow can take you to the next place that you want to go. Which is incredible. And you bring up a great point, Viraj, that every company is somewhere in a different place on that journey. And it's complex. But it sounds to me like a lot of what you're doing is really stripping away a lot of the complexity, really enabling folks to use their data as quickly as possible so that it's relevant and they can serve up the right products and services to whoever wants what. Really incredibly important. We're almost out of time, but I'd love to get both of your perspectives on what's next for Astronomer. You give us a great overview of what the company's doing, the value in it for customers. Pella, from your lens as one of the co-founders, what's next? Yeah, I mean, I think we'll continue to cultivate in that open source community. I think we'll continue to build products that are open sourced as part of our ecosystem. I also think that we'll continue to build products that actually make Airflow and getting started with Airflow more accessible. So sort of lowering that barrier to entry to our products, whether that's price wise or infrastructure requirement wise, I think making it easier for folks to get started and get their hands on our product is super important for us this year. And really it's about, I think for us, it's really about focused execution this year and all of the sort of core principles that we've been talking about. And continuing to invest in all of the things around our product that again enable teams to use Airflow more effectively. And efficiently. And that efficiency piece is everybody needs that. Last question, Varajas, for you. What do you see in terms of the next year for astronomer and for your role? Yeah, I think Pella did a really good job of laying it out. So it's really hard to disagree with her on anything. I think executing is definitely the most important thing. My own personal bias on that is I think more than ever it's important to really galvanize the community around Airflow. So we're going to be focusing on that a lot. We want to make it easier for our users to get our product into their hands, be that open source users or commercial users. And last but certainly not least is we're also really excited about data lineage and this other open source project in our umbrella called open lineage to make it so that there's a standard way for users to get lineage out of different systems that they use. When we think about what's in store for data lineage and needing to audit the way automated decisions are being made, you know, I think that's just such an important thing that companies are really just starting with. And I don't think there's a solution that's emerged that kind of ties it all together. So we think that as we kind of grow the role of Airflow, right, we can also make it so that we're helping solve, we're helping customers solve their lineage problems all in Astro, which is our kind of the best of both worlds for us. Awesome. I can definitely feel and hear the enthusiasm and the passion that you both bring to astronomer, to your customers, to your team. I love it. We could keep talking more and more, so you're going to have to come back. Raj, Paola, thank you so much for joining me today on this showcase conversation. We really appreciate your insights and all the context that you provided about astronomer. Thank you so much for having us. My pleasure. For my guests, I'm Lisa Martin. You're watching this Cube Conversation.