 All right, thank you so much. Yeah, so I'm Todd Parriston, CEO and co-founder of AeroSoftware. Yeah, I wanted to talk to you all about observability data pipelines today. A relatively new concept, kind of from a nomenclature perspective and observability. We're also seeing some folks call them telemetry pipelines, but still an emerging space. And kind of wanted to talk about our perspective on what it means to build it yourself versus buying one of the emerging technologies on the market. So a little bit of background on me. I previously co-founded a company called Inflex Data. So we built the Inflex DB time series database, as well as a bunch of other tools like telegraph and observability tools to kind of manage time series data at scale. That product interestingly started off as a SaaS product. We shifted to kind of an open source model. So I've been in the observability space in one form or another for about a decade. And then went to a company called Pivotal after that and worked on observability for the Cloud Foundry platform as a service. Started AERA in 2019, mid-2019. And we've been focusing on time series and observability use cases again. Yeah, and this is something I love talking about. So happy to chat with anybody here at any point. Feel free to reach out to me. Love talking about the space. So one of the things that we've been looking at as we, you know, as we talk to customers, as we go to market, we think about how we're positioning our own products or our primary focus in the observability space has been log management. That was a place where when we were starting AERA two and a half, three years ago, we saw a lot of opportunity for innovation and disruption primarily because the data volume of logs is so huge and the desire to keep logs around for long periods of time and have long, long retention of windows is high. And there's also a lot of complexity in logs. And so they've got a lot of these really difficult properties that make it hard to manage at scale versus metrics and traces, which have their own properties that make them difficult to manage, but it's I think somewhat less complex than logs if you just had to focus on one. So we've been looking at logs and log management. We obviously see lots of customers using Splunk, lots of customers using Elasticsearch. And as we started looking at, what does it mean to manage some of these growing data volumes? You know, we took an opportunity to survey a lot of practitioners in the IT world and ask a bunch of questions about observability, what those folks think about tooling, data volumes, data growth. And one of the places we asked some questions about was how do you think in your organization you're gonna see log volumes grow kind of as a way to get into people's mindset about how to think about managing growth over the next few years. And so this chart here is really breaking down by bucket, like how do folks think their log volumes are gonna grow year on year from where they stand right now? So if you kind of look at this, you can split the graph kind of in half. They're sort of the folks that think less than 50%. And the kind of like 10 to 50% range comprises what, 58% of the respondents. But it's interesting to note on the right-hand side of this graph, there are two pretty big bars there for people who think they're gonna see 50 to 100% annual growth, which effectively is like doubling year over year. And then there's almost 20% of the respondents think two to five X growth. So they're looking at significant growth in the not too distant future. And we're already hearing from customers that are seeing tens of terabytes a day, hundreds of terabytes a day, and the larger the organization, the more log volume they're generating. And especially with the adoption of tools like Kubernetes and containers, that there's just so many more sources for logs to come from. And so one of the things that we're seeing and kind of getting into observability pipelines a bit is just storing this data is actually hard. If you actually wanna do things with it, you wanna pre-process it, you wanna route data to multiple destinations or have kind of rules that decide what goes where, you actually need to start embracing a relatively sophisticated set of technology to figure out how you manage this incoming data volume if you wanna do anything with it. And then you still have to figure out what that storage piece looks like. So what we're really seeing is that the volume of observability data in general, and specifically logs growing is something that's on people's minds and is becoming a significant problem for a lot of the existing tools. And so pulled this quote kind of out of our paper, but 79% of the folks surveyed are concerned about the rising costs of observability and observability data management if we don't see some innovation in either the tools or the technologies that we currently have. And I think if you go back and look at the folks that are thinking about their ingestion going up two to five X over the next year, if you think about it just from like a hardware cost, if you are imagining, say you're spending a million dollars a year on hardware to manage logs right now and you're looking at that going to two to five million by next year. And that's a relatively significant increase, especially in the current climate where actually a lot of folks are starting to look at cost reduction and ways that they can use fewer tools and fewer resources to manage things going forward. And so this is pretty interesting to me that like while lots of folks are embracing observability there's a pretty significant desire to see some innovation. And so our kind of viewpoint on where we're gonna see the biggest I think differentiation is the adoption of observability pipelines or telemetry pipelines. And so one of the other survey questions we put out there was really asking have you thought about observability pipelines? Are you using them yet? Are you going to be using them yet? And so this chart here really shows kind of the blue bars are folks that have either deployed something or are deploying something. And then the brown and brown and brownie gray bars on the right hand side are folks that are looking at embracing a new technology. And so it's a pretty even split. I would say it probably is about 50-50 maybe less just a little bit less than half of actually made the decision to implement something and the other slightly more than half are evaluating now. And so it feels like looking at the data it's something that is familiar to folks in the space but something that still has a lot of room for adoption and standardization and really finding use cases in the market at scale that I think can really drive some of this innovation that we're seeing the need for in managing log data, observability data in general. And so our viewpoint here is that we think the observability pipeline space is something that will continue to be important to all sizes of organizations and will become a more important part of the stack that I think people are thinking about choosing when they start to roll out observability infrastructure in their organization. And so kind of breaking it down a little bit you've got a lot of choices when you think about what an observability pipeline is it can be a product that you buy off the shelf and bringing up some potential competitors there's like a Cribble of the world you can just like take that does a lot of the things that you might want their focus is on Splunk users primarily they do a lot of work there to try and help reduce the cost of Splunk. Again, Splunk is an expensive product Cribble is also an expensive product and it isn't necessarily the right fit for all sizes of data all shapes of data and all organizations. So that's probably one end of the spectrum and then the other is you can just grab some open source tools and you can build a processing pipeline it can be either you manage it yourself there are some obviously some cloud resources you could deploy but there are a handful of problems that you need to think about as you're deploying an observability pipeline and so really the first question is like what are the problems that you need to solve when you're thinking about what do you want to build or buy? So we highlighted a couple here the first is the rate of ingestion and this is not something that just applies to observability pipelines this is something that applies to all observability infrastructure and if you're looking at using something like I mean it could be Splunk it could be Elastic Search it could be any of these log management tools the rate of ingest is something that can be a real problem especially when you have outages and you see your log volume spike so it's not uncommon for some sort of error state to cause a lot of systems to start emitting error messages simultaneously in the event of a failure and you could see your log volume go up 10x and if the rate of ingestion of your storage system isn't able to scale up to handle that then you start either backing up writes or you start getting failures and when you're looking at trying to solve whatever the root cause is for an outage that's the time you most want ingestion. So it's a problem on the storage side but it's also a problem on the ingestion on the observability pipeline side if you have a system that's doing data processing data enrichment whatever you need to make sure that that system can also handle that burst in scale and so there's a bit of a trade off there cause some of it maybe is it simply about storage and queuing inside of the observability pipeline is it about rate of process cause if you've got workers that are dedicated to processing this data and they're expecting let's say on the order of like 10 gigabytes per day and suddenly it goes to a terabyte per day are you gonna have enough resources to actually process that data in real time or is your pipeline gonna start backing up and then you're gonna actually lose visibility into that data potentially and so rate of ingest is an important factor needs to be something that can be relatively flexible without causing you to have to deploy a bunch of like permanent resources to handle that max capacity cause otherwise those are gonna be sitting idle a lot of the time. So that's a relatively complex problem that's simple on the surface data transformation so if you think about log data in particular I mentioned this earlier it comes in a lot of different formats structures you may not even have the same fields from one system to another so being able to do initial data transformation do you standardize on a schema? Do you do a set of either like rewriting or modifying that data in flight so that you have a consistent set but you need to think about that so that when it lands in your destination systems it's in the format that you actually want it to be queried in so there are lots of different ways and different rules about how that data can be transformed and it may not be consistent from organization to organization so being able to have some flexible rules and flexible tooling so that you can do these transformations without each individual transformation having to be a new piece of code that somebody has to write or a rule that somebody has to write and test and manage is a significant part of thinking about what this observability pipeline implementation looks like. Data transport is another thinking about it's not necessarily just a input output stream there may be lots of inputs and lots of outputs there may be places in the middle like potentially dead letter queues and things like that where error state data goes so thinking about what's the global picture of all of these different input output streams and how do they tie together what are the rules for what gets routed where and how do those get managed so that's problem number three and then the last one is really data provenance which is something that we see a bit more of in kind of the data warehousing ETL space but really just making sure that you have some sort of way to track the flow like if you have data that's landing in a system is there some way either through looking at looking at configurations or configuration changes over time figuring out where did this data come from does it get tagged based on source like do you apply metadata to it so that you can know in the destination system what the origin was like so thinking about what does that mean maybe you don't even care about the problem is you just sort of want all the data to land wherever it lands that's fine but it's something that needs to be considered because once it passes through the pipeline has been touched you sort of have missed your opportunity if you then want to need to go back and do a lot of work to retag data in those destination systems down the road so each one of these things has its own concerns and considerations that you have to think about and add to the complexity of what it means to build an observability pipeline and so when you start to think about your build option so think through all those problems we talked about before and then you kind of look at the tools that are relatively standard in the space for open source tools or cloud managed tools but what are the things that you can use to compose an observability pipeline so sort of the foundation that you'll see most commonly is some sort of queuing system Kafka probably is the most common there's another newer one called Apache Pulsar and RabbitMQ is another open source option that's been around for a while most of these have some managed offering on AWS or GCP as well so if you're trying to build it on a particular cloud you may not necessarily have to manage those pieces yourself but the thing to keep in mind is that each of these has its own design trade-offs about what it means to be a queue it's not necessarily just a single pass through data processor there may be topics that you have to think about and how do you handle defining topics and what data sources go into what topics and what gets pulled from what topics and how do you actually resource individual topics to make sure that if there's one topic that's particularly noisy that it gets handled well and so each of these is sort of designed to be a very generic message bus that may or may not necessarily fit the logging workloads that you're trying to approach and then so you've kind of got those open source options there are others as well I could probably fill an entire talk just going through like queuing options but then you also have some vendor specific things like SQS on AWS or Kinesis and if you look at the design of SQS it's really kind of more designed to be a like a single really large queue that has a huge scalable processing volume but then there are limitations on payload sizes but it has nice things like dead letter queues which Kinesis does not so depending on how you want to think about think about those individual trade-offs that there are lots of very fine points that you have to go into thinking about the implementation of each of these as it relates to how you want to build your pipeline and so for the queuing piece there's a whole bunch of decisions and each one of them comes with its own potential issues if you don't get to factor them into your decision upfront and then when you look on the sort of processing side I mean you can look to some of these telemetry or metrics-based agents like if you look at Log Stash from Elastic, Telegraph we built at Influx Fluent D's been around for a long time from treasure data and then Fluent Bit more recently as more of like an edge collector and now there's a company called Calypcia around that's formed around Fluent Bit Vector was an open-source Rust tool that's now part of Datadog and each one of these has its own sets of inputs and outputs its own internal state management I guess decisions in how it's architected that affect performance and how it works, how it fails and so potentially the simplest thing can be pick a queue, pick a set of processors and you have data go into the queue and you can use these agents to pull off of Kafka for instance process data and then send it to another source it's fairly simple but then there are lots of ways where that can fail like what happens if you have a lot of like agent management, agent configuration management that you have to do and so as you start to actually deploy these you'll start to see the individual characteristics of each tool that you're choosing how it performs, how it scales what are your input and output options and you sort of have to you end up having a system that's gonna be the composite of the trade-offs that you have to choose around and so this works well for some folks but it really ends up putting a lot of the burden on you, the implementer to figure out how do you manage all of these other tools together and make it feel like a cohesive system and if you think about pushing out configuration changes like if you've got let's say you have a hundred vector agents running and you wanna make a configuration change how do you manage rolling that out to a hundred vectors how do you actually make sure that you don't have 99 that get updated and one that fails and is still putting data in the wrong place so you start to think about an ostensibly simple configuration of these tools but really it starts to come with a lot of complexity and again there are many, many other tools you could use for this this is not nearly an exhaustive list but the goal here is to get you to see there are lots of options and with each of those options come some baggage that you're gonna have to take on and so things that you wanna think about a little bit more broadly as you look at the tools or the option to potentially buy something off the shelf the simplest, I guess the first one is how do you manage this thing what does it look like to spin it up what does it look like to scale it down how hard is it to deploy in multiple places is there a centralized configuration management tool or is that something that you have to build a pipeline but build your own CICD pipeline to do configuration management so what does the operator experience look like back pressure performance this kind of comes back to the rate of ingest problem if the pipeline itself gets to the point where it's sort of reached its maximum processing potential what happens, does it give errors to clients does it have some way of buffering to disk for data that it can't process these are all like really important considerations and things that will affect your entire observability infrastructure because the pipeline now is gonna become that leading edge of all this data being fed into the system so just thinking about what happens because if you start to apply back pressure and give errors to clients then those clients that are sending data upstream then have to have their own durability and state management because you never know how long an outage is gonna last and how long the back pressure is gonna last so then you start having to think about what does it mean to have back pressure versus some sort of buffering or durability versatility is sort of just in general what are all the things that it can do if you look at the Fluent D versus Fluent Bit kind of trade off Fluent D has a lot more integrations a lot more inputs and outputs that it supports than Fluent Bit but Fluent Bit is much more lightweight so it's able to do again it's kind of more targeted at edge processing but if you needed to run with a lower footprint it's a much more efficient tool but again it comes with that trade off of it doesn't support as many native inputs and outputs so if there's something that you end up needing in one of these tools that you've chosen that you don't have you either need to build it find someone else to add it contribute back to open source whatever but just thinking about what are the set of inputs and outputs that you need to work with versus what each of these individual tools offers and again at this point I feel like a lot of them support a lot of the same things you know if you look at like LogStatch versus Vector things like that I think there are lots of they've been around for long enough that they have enough inputs and outputs that you're probably going to be well covered but you never know there may be things like how it handles a particular type of authentication for a certain input or output that may not work with the way that your system is designed so again lots of little nuance each of those things but it's an important point the next thing is sort of data manipulation what is the what are the tools that are available to you sort of within those agents to make changes to that data so for instance Vector uses Lua as a scripting language that allows you to rewrite data and I think you know LogStatch has its own and so is that is that sufficient for you to be able to make all the changes that you want is that efficient enough for you from a compute perspective to be able to do the scale of processing that you want without having to spend a ton on hardware something that's really hard it's really hard to know until you sort of test it at scale so there's a lot of work there that goes into figuring out what are your goals with it as a as a tool you know how much does it give you another piece to this sort of is configuration changes as you're deploying these agents sort of in a processing mesh or like a pool of workers whatever whatever architecture you choose how do you deploy this configuration changes do you have to worry about data loss when agents restart if you need to make lots of small changes what happens you know if you've got data sort of in an intermediate state in a pipeline like does that data get stale and get dropped or is there some way of tagging version changes between configurations so that that data continues getting processed before the new configuration kicks in so lots of again subtle nuance there but things that you need to think about how you're going to manage this from a configuration perspective and also once you start to get to the point where you start putting a significant amount of processing logic in these agent configurations now you essentially have code effectively like do you need to test that you need to test your Lua inside of your agent to make sure that it's doing what you wanted hasn't broken any previous changes and is you know ready for production or whatever do you test it in staging again lots of things to keep in mind there and then the last thing to really think about is error management so you know let's say you have a an output destination that goes offline or you know the credentials are wrong and you've been trying to write data like let's say it's datadog let's say one of your target destinations is datadog and somehow the API token that you're using to write or the credentials you're using are no longer correct well now you're going to back up a ton of logs that are going to be sitting in some sort of intermediate state potentially and it could be a dead letter queue something that's designed for these errors but how do you get alerted when that starts to fill up what happens when that actually becomes so big that that starts dropping logs so thinking about what these error states are like you know is it something that has to be manually remediated is it something that needs to have a code or configuration change to resolve but again it's a relatively important concept because errors can happen fast and errors can cause massive amounts of data and so it's you know I can imagine a single configuration change could cause an entire pipeline to stop working and then you've got potentially your entire real-time data volume getting backed up within the pipeline and you know if it goes unnoticed or if it isn't designed to be able to handle some amount of scale to store that data then you're looking at data loss and so there are lots of these choices implementation decisions that could potentially lead to undesirable side effects when you're thinking about how your pipeline operates so again things that you really need to do a lot to think about if you're building it yourself things that theoretically products that you may look to buy have thought through and solved for you and hopefully taken away a significant amount of that burden. And so getting a little bit to us and kind of how we think about observability pipelines so we started out building a product called Erasearch and the easiest way to think about Erasearch is really kind of a rewrite not necessarily a rewrite a competitor to Elasticsearch in the log management space it's a totally different architecture we wrote the entire thing in Rust it's designed to be cloud native managed with Kubernetes and the idea there was looking at the way the efficiency of Elasticsearch for log management isn't great kind of realizing that there are a lot of properties about time series data, observability data in general that need a relatively specialized storage technology to get optimal efficiency. And so again, kind of going to my background spent a lot of time thinking about this stuff at Influx a lot of time thinking about the stuff at Pivotal and looking at ways that we can optimize the storage problem for observability data and specifically log management. So we've been working on that Erasearch product for about two years now. We've tested it up to about a petabyte per day of ingest on a single cluster. And so really looking for ways that we can deliver kind of a foundational technology for storing log data that makes it cost effective and scalable for organization so that the storage piece is solved. And then as we started to spend more time talking to customers and looking for that the next set of pain points kind of leading into the storage, the observability pipeline problem kept coming up with customers and conversations. And we started working on a product called Erastreams again, written in Rust and thinking a lot about the efficiency and optimizing for these log volumes that are growing and growing. And so Erastreams really is our answer to the sort of all the problems with observability pipelines that I outlined. You know, you're going to need lots of input formats you're going to need a high degree of durability you're going to need the ability to take in lots of different formats and go to lots of different places. You're going to need the ability to have a dead letter queuing and visibility into what's required for remediation as soon as you start seeing those those error states. And so Erastreams really is a full a fully functional observability pipeline presently for log data. I see a lot of use cases coming up where I think metrics will also start to be a factor in observability pipelines right now. There isn't much yet that I think folks are doing from that perspective, but I think it's going to start to become a lot more common, especially we start to see more desire for a correlation of logs and metrics, more pre-processing, alerting things like that where I think logs and metrics can be there's some synergistic value there. I think if they can go through a single processing pipeline I think there's an opportunity to deliver more value. So we just launched a beta program for Erastreams last month. And so we're starting to work with customers now on figuring out how Erastreams can fit a lot of these workloads in larger organizations. And so broadly what we're seeing is that it's not just about logs and metrics and pre-processing. We're really seeing an evolving ecosystem that we're calling observability data management. And so we're thinking about Erastreams really as being this central hub for all of your observability data. If you as an organization are thinking about sources of metrics, sources of logs, how that data is flowing through a processing pipeline, how it's being durably delivered to these other destinations and it needs to be efficient and also needs to be fast. If you're looking for real-time observability and your processing pipeline is adding five or 10 minutes of processing latency, that's not acceptable. It needs to be sub-second processing times. It needs to be effectively processing data in real-time as it's coming through. And so what we sort of anticipate is going to happen is that as these data volumes continue to grow over the next few years, the observability pipeline or the telemetry pipeline piece is going to actually start to become a critical part of not only cost management, but of just operational management for observability platforms. And so this comes down to how do you manage configuration changes? How do you roll back? How do you look at who made what change when and how do you efficiently roll out these changes within your organization, either through GitOps or a CI CD kind of workflow because there's going to be a breaking point with just traditional agents where deploying all these configurations, managing all these configurations is no longer easy enough for organizations to operationalize. And it's going to start leading to more and more problems, more outages. And the most important part of observability from our perspective is that it needs to be the most reliable part of this entire system because if it's not, you can't make sure that your business is functioning, that your other tools are functioning, that your other systems are functioning. And so getting this piece right and getting this to be something that can be easily operationalized, scaled up effortlessly and having as close to 100% uptime as possible within realistic constraints in whatever environment you're running in is actually really critical. And so our view on it also is that the stream processing piece or the observability pipeline piece is great, but what we really want to be able to bring also is the storage technology as well. And that's where we feel like the interaction between era streams and era search is actually really important for us as we're talking to customers. The storage piece solves kind of a fundamental cost and scale issue of where do I put this data and how do I keep it around for as long as I need to without worrying about exponential cost increases. And that's something that we're starting to see with folks that have been running elastic search for a while is it's getting harder and harder to manage these growing data volumes because at a certain point, there's sort of, there are some fundamental limits to what that architecture is capable of handling without starting to introduce a lot of operational complexity and a lot of potential failures. And so the storage piece coupled with the observability pipeline piece really starts to create this holistic picture where you can have all of your data flowing in. We don't really care where you want it to go. If you want it to go into era search, that's great. We're happy to support that. But if you are a big Splunk shop, you're not gonna stop using Splunk just because something's cheaper because you built your business, you built your dashboards around Splunk and we still wanna be able to let you easily send data to Splunk. But there are a lot of customers we talk to who don't need to be putting all of the data into Splunk or don't need to be putting as much data into Splunk as they're currently doing. And so having a cost effective scalable storage technology starts to make it possible to think about what does it look like to stratify these logs in terms of value or business value and think about what's the right place for this data to live long-term and having a tool that can really manage all of the filtering, routing, transformation, aggregation, deduplication in a single tool makes that view of the world very possible. And so just looking a little bit at the era search piece and just sort of our very high level architectural overview. So we've done a lot of work to scale our ability to ingest data efficiently on the storage tier and that also relates to the same way that we think about ingest and efficiency in era streams as well and the observability pipeline piece. But the critical thing for us that we've embraced from the very beginning is having separation of storage and compute built in, being able to have the ability to store data whether it's for archives or compliance or security in some sort of cold storage tier like S3 or GCS or a Minio, being able to leverage high durability, low cost storage for data that gets relatively infrequently accessed but also have that tied into a dynamically rebalanced caching tier for hot storage. So in an era search, everything that comes in gets automatically persisted to object storage and is available immediately in our caching tier. And so you, the user don't need to think about what does it mean for data to go from hot to cold because in reality, all the data is already in cold. The only thing that gets managed for you is what data stays in hot, how much caching capacity is available, what rules you have set around time-based eviction and essentially that data just gets flushed out of hot storage when the time is appropriate. And then if you need to query against data that's in cold storage, that just automatically gets pulled back into hot storage through a rehydration process and is immediately available for querying. So that piece is a fundamental architectural separation but again, is essentially transparent to you as the user. And then having built a query engine on top of that that makes that entire process transparent to you as a user and then we've also done a lot of work to build some API endpoints that mirror what you'd see in Elasticsearch. So whether it's ingest or query, we can actually work with a lot of tools that work directly with Elasticsearch. So you can use our backend as a Grafana data source. You can use LogStash, ingest data, telegraph or vector. Any of these tools work automatically. And so again, this sort of, a lot of these properties are things that we've been backing up into the observability pipeline piece realizing that having that ability to support those inputs and outputs and be able to do that processing in the pipeline tier is really important. So I think finding a lot of these just sort of like design goals that we've built into the storage piece and focusing on those within the pipeline piece has been a big part of our design goals as we build this entire system. And so looking at era streams versus certainly versus building your own thing, we've tried to find the hardest parts of that and solve for those as we build era streams. So being able to scale easily. So this really comes down to if you don't have to think about managing your own fleet of agents, if you don't have to think about managing the capacity of your own queue, that's something that we've built into the product that is relatively straightforward from a management perspective and something that I think should be essentially table stakes for observability tooling. Like the scale should be there, the ability to handle burst and ingest and be able to provide reasonable and efficient back pressure when necessary, all is built into the product. Dynamic reconfiguration is probably the biggest. And so in this is when you apply a configuration change to era streams, the pipeline itself never goes offline. Those changes get updated automatically across every worker in the fleet. And you don't have to think about downtime, you don't have to think about what it means to make a change because the system handles that all automatically. And that includes intermediate processing that needs to be finished, anything that may come along with what you would consider normally having to restart a process for an agent to pick up a new configuration change. And so again, going back to this, there are definitely some tools will allow you to just do a non, just like a heads up kind of restart and trigger a configuration reload. But if you have to do that across a hundred different agents that are all running simultaneously, how do you manage the timing? How do you manage the synchronization? And that's something that we've put a lot of work into with era streams. And then really ease of use. We've got a UI that we've been working on, I'll show a screenshot of that in a minute. But thinking about what it's like for you as a user to define inputs and outputs for you to figure out what data flows where if you get insights into whether your pipeline is running, I think all of those things factor into how you as an operator gonna use this tool and how you wanna surface this to other folks inside of your organization. So being able to see potentially the ability to build these pipelines without having to think about really writing code or having to, I guess YAML for most people, it may or may not be considered code, but even being able to step one level above that and have a tool that lets you build some of these configurations for these pipelines visually. And so really the idea here is make this system sort of a data processing fabric really targeted observability data. So taking a lot of the other potential use cases off the table, focus on observability and make it something that scales like you would expect has changed management like you would expect and becomes a critical part of your observability infrastructure knowing that everything that's flowing in everything that's flowing out has strong durability guarantees and basically does what you would expect. And then I mentioned this earlier but having the integration with AeraSearch, again, it's not a mandated integration. We just have obviously done a lot of work to make sure that that's as seamless as possible but being able to have a low cost storage option is something we think is really important as you start to think about rolling out observability pipelines and thinking about managing your entire organizations observability data workloads. And then the ability to leverage that AeraSearch piece to really get petabyte scale search across all of your logs and having a place that, unlike just dumping data into S3 gives you the ability to run queries, use it in a real-time manner if you choose, you can use it for archival purposes, really whatever you need to do, AeraSearch opens that up for you and makes it easy to get data from AeraStreams into AeraSearch and have a tool that can let you manage all those workloads in a cost-effective way. And then just a couple of things on benefits for AeraStreams. This probably is all relatively obvious based on the things that we've said but a huge part of this is being able to reduce costs. You can do some work to either remove data in flight from your log streams. Some folks like the idea of removing data, some people just wanna be able to store everything and regardless you can do either with AeraStreams but you can use AeraStreams to route data to the platforms that are the most efficient for you. So again, coming back to Splunk, if you could reduce the data that you're sending to Splunk by half that could potentially be significant cost savings and by being able to bring something else whether it's Elasticsearch or whether it's AeraSearch or maybe you wanna use DataDog for some of that, being able to route the data to the right place and efficiently build those rules that decides what data goes where. But really with the goal of not necessarily not just saving money but being able to efficiently optimize how you're spending that money. Taking action on data in flight. Again, I mentioned that you really as an observability pipeline you wanna be able to have this data be effectively processed in real time so that you're not having a lot of latency to your observability data as it's coming into the system. So really being able to do transformations and enrichment, deduplication, whatever it is on the data as it's passing through the system before it lands in your storage destination. And then again, effortlessly managing data at scale. This is something that needs to work at high scale needs to be able to scale up for those organizations that are envisioning two to five X log volume growth in the next year. This needs to be something that works now and is gonna work at 10 X or 50 X its current size because that's the world that we're heading into is probably by the end of this decade I think the log volumes that we're doing now will seem silly in comparison. And everybody will probably be doing 10 times as much data as they're doing now. So we need to be able to build and deploy a set of tools that will scale with those data challenges as organizations grow. And I promised you a screenshot. I was gonna try and squeeze a demo in here but I don't think I left myself enough time but really this is just an overview of what the AeroStreams configuration UI looks like. Again, we're thinking about this as basically a flow chart. You have a set of input sources and on this one we basically just got a Splunk HTTP event collector input that can take data and drop some fields. You can forward the rest of that data onto AeroSearch or any other destination but then you can also fork the data off and you can have a second stream there that maybe goes through some additional filtering step and maybe the ultimate destination here is that you've dropped some fields and you store all the raw data in AeroSearch. Cool, that's efficient. But then you also wanna peel off some of that data and you wanna just strip out all of the logs that are log-level error and you wanna only store those in a different system. And so maybe that's a good thing for you to be putting into your Splunk system or maybe those are the ones that you send to Datadog but AeroStreams lets you efficiently decide and route all that data into whatever destinations you want and be able to quickly visualize what's going where, make changes on the fly and then once you've made your changes you can just hit save and deploy it automatically to your AeroStreams infrastructure and that configuration change will get picked up automatically. And there's another view here that I don't have a screenshot for but a relatively simple change history. So you can go back and look and see who made the most recent change, maybe something stopped working or maybe you stopped seeing logs that you were hoping to see in Datadog. You can go back and roll back to those previous configuration changes and deploy those automatically as well. And so the idea here is that this view gives you the ability to not only configure easily but also to make and visualize these changes, visualize pipeline health and rapidly configure and deploy these changes so that you can start to get data processing quickly. And then another thing we didn't touch on and I'll say one last thing is an important part of this is being able to do to apply some rules here for PII redaction and things like that. So if you know that you've got systems that maybe have more general access within your organization you can have one data source that has PII redacted data that a broader part of your organization has access to and then you can have one that's unredacted and maybe only a set of administrators or a set of higher authorization users have access to. And that's actually a great use for a tool like Aera Streams. It's not something that is super complicated but you wanna be able to visualize oh, this flow is going here and it's got my redaction steps and I wanna be able to see that those are working and then I've got this flow over here that's got my raw data. Make sure it's going to the right place and make sure that it's doing what you expect. And then down below we've just got a log viewer you can hop into really any stage in the pipeline and see a sample of what the logs are that are flowing through that part of the system. And I'm pretty close to time so I'm gonna wrap up here. So we have been, as I mentioned we open up a beta program for Aera Streams. There's a link here. I think you guys should get provided a link to that as well. If you're interested in checking out Aera Streams definitely reach out to us. Again, no shame if you wanna build your own but just keep in mind that there are a lot of things that you have to do and manage. We would love to help you have an easier life and help you manage those with Aera Streams. But definitely sign up if you're interested and we'll find some time to chat. And I actually don't see any questions in the Q&A box. So maybe I've just done a fantastic job talking for 50 minutes. I'll give people like one more second to see if anything pops up. And if there's nothing else I will hand it back over to the Linux Foundation. Thank you so much Todd for your time today and thank you everyone for joining us. As a reminder, this recording will be on the Linux Foundation YouTube page later today. We hope that you join us for future webinars and have a wonderful day.