 Just a show of hands, how many of you are SREs, developers, who built stuff in Rust? That's a curveball, and wow, you cut that one good. Data pipelines, anyone built one, seen one recently? Anyone have a day at all? Of course, thank you. So that's how we'll start. We'll talk about observations, observabilities of data pipelines. My name is Hong, I'm a sales engineer from Data. This is how we can keep in touch on social media and emails. So we'll start with a batch practice. How do you monitor data systems today? One of the key things to note is that if you're looking out for some of the SRE golden signals, if you like, it starts off with the latency of your system. How much lag is your system encountering as a result of a voluminous amount of requests sent to the system itself? What's the error rate? It's good, it's healthy, tolerable, within a range that you could manage. Traffic requests, so basically saturation. How saturated is your system? Utilization-wise, basically all indicating the health of your application. So one of the key things about data of the systems that you observe is that it's huge. That's why we call it big data. You want to get more of our observed data, and that's where the concept of a pipeline might actually help. The evolution of the pipeline has got to a stage where now you get full empowerment across the departments in a organization. Multiple personas can actually be affecting changes to the data pipeline. That's right, your sets of observability data could have different data players, different personas interacting with it, perhaps transform it and reach it. Nothing new because you may have heard of Kafka, you have heard of Databricks, you have heard of Data Lakes, for instance. And we talk about this. It takes something to basically put together an observability platform that is capable of observing those key signals that we talked about a slide earlier. At the same time allowing empowerment of different departments that will interact with their data sets. All in all, the whole idea is to improve quality control on the data. And you can see the example shown here, Kubernetes clusters being observed streamed across different departments or having different personas in departments have a say in terms of how we want the data to be manipulated, used. We'll see a demo of that shortly. And here's an example of why that sort of philosophy actually helps in observability today for the demo of SRE projects. It's called democratization, democratizing the data platform itself. Where you will have a means of say the security team interacting with other SREs of other back end teams such as this example portrays, making changes, manipulating the changes. It could be just simply tagging certain data sets and saying that this is a result of having a very unsecured vulnerable app. That's what the security team does. And then there is the governance risk and compliance team that could categorize this, put that into the data warehouse, make it easy for other teams to reference to visualize it in their own dashboards. So introducing a open source technology courtesy of Data.org. It's called Vector. We provide the source code, whether the link to the source code on GitHub in a short while. The whole idea here is regardless of the sources and the sinks, your endpoints. Vector is a easily configurable, installable, deployable pipeline and very importantly also highly scalable. Huge amounts of loads. The likes that Comcast, Team Robot and Zandask have actually experienced, the Vector was able to deliver. So Vector is that the data pipeline that a lot of personas might be looking forward to interact upon. One key thing about this is because it's from Data.org. We live our own dog food. So Data.org being a leader in observability. The go-to observability platform for the SRE today. We like our log management. We like it a lot. So that's our customers, the community, the SRE community. So it's highly encouraged to use their log management hand in hand together with a open source observability pipeline on Vector. So Vector comes with the concept of transformation and enrichment of the datasets. Very importantly also it comes with a concept of aggregator. We'll see in the next few slides, example of how the aggregator works. The aggregator basically allows multiple clients, agents, some of which from Data.org, some of which are from other observability tools. They could be streaming all that rich observability data into the transformation pipelines. So as to speak at stage two. And then keeping a record in different kinds of things. It could be Splunk. It could be a data lake. It could be any sort of data schema that Vector understands. So it's a long list of sources and long list of things that the Vector aggregator is able to connect. Here's an example of more possibilities, the topologies. So it could be literally storing it into a WSS-3 storage. I mentioned some of the community favorites. Kafka, Loki, Elasticsearch, all of which could be potential things after transformation of the data. That originated from the usual push sources like lockstash, Bometeus, Statsody, Syslog, has been convert and rich transform. It could be even condensed prior to storage in all these things over here on the right. All in all, highly load balanced, highly available, scalable as well. So the concept here is that pipeline is for everybody for every kind of use case. Let's take a look at the demo. So here are examples of some of the pipelines that have been built. Whenever we visualize a pipeline, the different stages of the data transformation of the data enrichment is really important to the SRE. It's really important to different personas. Remember the security team, the GRC team. So it makes sense to perhaps look at error rates. Are we getting all? Because this is huge amongst the data being streamed. At every single stage, are we having a transformation issue? Some of the data not being enriched appropriately. And then we can also see not just errors, we can also see the throughput. I'm just going to rewind that again. There we go. So you can see throughput. You could see number of events that could potentially also indicate health of the different stages of transformation. We can look at the utilization of that particular stage of the pipeline also and very importantly also diagnostics. At real time, whenever there is an issue with transformation of data sets, those diagnostic locks are pretty helpful. Very importantly also get a feel. Get a feel using a few gauge of how much in terms of the event flow is actually occurring over here. And very importantly also would be, as you can see over there, way at the top, a capability of transforming the data based on the different sources. I can see at the get go, some of the sources are pertaining to one of our favorite agents, the data agent lock. And we're transforming it based on individual key value pairs. So transformation, it's another use case. Very importantly also eventually it leads down to simple management. And you can see that from a perspective of locks, like I mentioned, we really like lock management just like the rest of customers. We could use that to pass through any amount of enriched and transform vector observability partner locks. You can take that, analyze it, and eventually even go in metadata, know where it actually comes from, which particular components were involved. Very importantly also which level of detail do you actually run. One key thing would be context. From a contextual standpoint, you could even locate exactly where in a lock stream, a particular identified lock entry, that has been picked up from observability part nine, if you could dive in deeper using data lock management. So with that, we're back to presentation sites. Now, that doesn't live alone. There's part of the observability platform that is the leader in the industry. It's a platform that provides you with insights into usability, observability driven by AI as well. Very importantly, there's the freshness, the accuracy, the durability, and the coverage that this platform delivers that makes it a whole lot easier for SREs to do their work. Very importantly, because it's a unified platform, economies of scale, 16 technology pillars all on one single platform, ranging from mobile user analytics to browser-based analytics, browser-based apps analytics to security analysis, posture management, security posture management, very monthly lock management as well, and also application tracing, all in all, smart tooling, all in one platform. Here's an example of what you could potentially do with 16 pillars all on one. With the power of Vector, helping in terms of observing huge data sets, and that is root cause analysis within the power of a mouse clicks. We could dive in really, really quickly using data with Vector to analyze data with regards to each of the deploy applications. This is deployment tracking where we could actually dive into the various signals again, average latency, saturations, picking up on in terms of issues every time any of those signals like latencies get too high. Here's an example, again, all I could be doing with huge amounts of data could be streamed over observability park lines. And that would be some of that data correlate to application traces. So you have that data in lock format correlated with application traces and visualized across the span of your modern application, which could be a merit network of microservices. All in all, all integrated, making a whole lot of sense to use a platform to investigate if there were any outages, the moment you see within the application frame graph that is indeed a module, a method call that has failed. And then dive in those locks that the Vector has streamed to say data doc. And using data doc to analyze a whole lot more of what this modern application is actually facing. So data doc, it is the unified observability platform. Did you know it's also a very, very big data pipeline? So going back to the theme of this presentation again, let's look at what's underneath of this. From the agent to the ingestion buffer to the individual aspects of the processing capabilities of data doc, it's a pipeline, it's a huge pipeline with time series database, storing all the metrics, traces and locks, what we call the three pillars of observability. From that standpoint, it's easy to build a dashboard on top of that time series database. We support 22,000 customers using millions of hosts. Collecting 10-second snapshots, yes, sampling is supported. So 10 seconds seems to be a popular sample time window of trillions of events per day. And very importantly, this is what the pipeline will look like potentially for the cloud administrator, for SREs who uses the three pillars approach metrics, traces and locks to analyze outages of their application or system issues or performance optimize them. Very importantly, the CISO's department, the CISO-1 will have a role to play using such a platform technology and that will be being able to analyze the security posture of the cloud-native environment, monitor the health from a security standpoint of the workloads and applications and visualize it in such a manner where you have what we call a huge map of all your assets, your microservices, assets, databases. And we could potentially be pivoting from one particular service to another really, really quickly thanks to a concept known as Unified Observability Platform. Now, from a perspective of AI-driven nature, that big data pipeline called Datadop actually helps also to analyze critical failures, perform the root cause analysis across such huge amounts of data and also the frequency of data changes. Very importantly, this is DevOps-centric. The concept where we could actually be profiling the health of your application, dive into perhaps the CPU time per minute at each level. So imagine if you have been doing DevOps all the time. DevOps talk about continuous innovation and integration, rolling out incremental changes in the code. The challenges is every time you have those new features rolled out, can you accurately determine how well the particular model is performing? So again, a concept of having a really big data pipeline that Datadop is built on top of, well, it's possible, it's possible. So in short, download Vector today, right, kick the tires, try out different deployment topologies. This is what you get when you visit the GitHub repository, officially where the Vector agents, the source code is made available. There are components listed. There are guides listed as well, how you would be deploying it for use case A, use case B, use case C, and so on and so forth. Some advanced use cases worth mentioning very quickly before we end. Kinesis Firehose, a log of forwarding. So you have ingesting of Cloudbox into AWS Kinesis Firehose. Can you make sense of those logs really quickly? Vector can help. Could you merge multi-line logs with Lua, one of the favorite programming languages when it comes to web infrastructure, and GenX is beyond Lua. So could you do that? So Kinesis, yes. So this is really where you would want to start getting more acquainted with Vector. It's repository. And here we go. GitHub.com slash Vector.dev, Vector, as well as there's a block, and a quick start up presentation as well, all here on the slide. And you have any other questions, feel free to ask me right now or you can drop me a mail. All right, thanks for that. They really mentioned that, but great point. They came from originally a company called Timber.io. So Timber has now part of Data.Dog. In the spirit of this, I guess the conference, we will show you that Data.Dog continues to support the open source community. Data.Dog was an open source technology company at the very beginning. So Data.Dog was open source. So we have made multiple open source projects. Vector continues to be run. I guess I can quote in terms of the business direction of the whole company but we continue to serve our customers. It's something to do very well and I think our customers appreciate it. And Vector plays a huge, huge part in our observability platform strategy.