 So, about Alulita, she's a member of Open Telemetry Governance Committee, so we'll be hearing a lot about observability today. Personally, I'm really excited about this talk. Yeah, so let's get started. Over to you, Alulita. Hey, hi everyone. Thanks Bhavin, thanks for a nice introduction. Hi everyone. Super happy to be joining in from California for this edition of PyCon India 2021. Very excited to be part of the crowd, as well as presenting at the conference. And talking to all of you today about what's happening about observability in the cloud. Again, many of you already may be involved in the cloud-native space, and so I'll just start a little bit about who I am for those of you. I know a lot of folks who have joined and spoken and presented and participated at PyCon India before, but again, super excited to be participating this year. I'm a principal technologist at AWS, and I lead the open source observability strategy and engineering. For AWS, I also am a co-chair of the CNCF technical advisory group for observability, and of course a member of the Open Telemetry Governance Committee. I also, once many other hats I wear, am a board director at the Unicode Consortium, having done many years of language technology open source work, and also served on the boards of the open source initiative OSI, as well as SFLC India. A little bit about my work, I've served lead engineering teams at Wikipedia, Twitter, Paypal, IBM, doing a lot of open source work over a long time. And again, please follow me on Twitter or on Slack if you are reaching out and chatting with me anytime. So with that said, today I'm super excited to be talking with you about an area I've been deeply involved in for quite a while, open source observability in the cloud. I interestingly started my career as an engineer building monitoring software for real-time networks, and early on I recognized that the power that utility computing brings to the table. As you today know, utility computing is also called the cloud, and of course I have been very passionate about the benefits open source brings to the table. So today I'm going to deep dive into talking about what the cloud is, what observability, what cool things are happening in open source observability, and how do we leverage that. So what is the cloud? What does the cloud mean? We're all about it in different contexts. And to me, and the general definition really is, it is a distributed compute services, you know, set of services, and the cloud today is multiple clouds, right, the private public clouds as well as private clouds, and you know, cloud computing is everywhere. It's the future of computing, and it's happening right now. The cloud is rapidly evolving into, you know, using more open standards and interoperability, and today I would say that the cloud is just like electricity, water, and the internet. It's a utility that everybody can use to spin up for computing and to be able to build and run services and applications. So the cloud I think today, and as you will see in through the course of this talk as well as, you know, as you've read and followed up on, the cloud depends on open source. It uses open source very heavily. It also, the cloud also, you know, leverages open standards to interoperate across multiple clouds, as I said, to see mostly support not only on-prem installations of components, but also public and private clouds. The open source also, you know, ensures that there is no vendor lock-in, and of course it also helps accelerate innovation, right, when you're building something in open source, you really get the best of breed in terms of engineering, engineering collaboration, and really a great solution. So today, you know, every part of the cloud, and today's cloud especially, should be fully observable. And what does that mean, right? What does fully observable mean? It means that every aspect of the cloud needs to be accessible. You should be able to observe every layer of the cloud architecture, you know, you should be able to observe every service, and every component, and every feature. It is that detailed, and it needs to be observable everywhere. So what does, what does that mean, right? If you look at the, a typical cloud architecture, it means that you have multiple layers in any, any typical, you know, cloud service architecture. You have the lower layers with hardware and bare metal, if you will, where you are looking at, you know, getting metrics from CPUs, so like CPU utilization, memory utilization, disk utilization, and other kinds of, you know, data to monitor your hardware, right? Then you have your networking layer, where you're looking at throughput, latency, bandwidth, you're looking at the infrastructure layer, where you have different components like VMs or storage, you know, use tons and tons of storage for every one of our applications. Then you have frameworks, like languages, right, supposing you have written your application in Java, or you've written it in JavaScript, or in Go, again, different language frameworks, you know, have built-in observability components in them. And then you have databases, where you are, you know, expecting data for monitoring and understanding, you know, how those databases are performing. And then last but not least, you have application in the entire, you know, on top of this whole architecture of layers, where you have web apps or mobile apps that you're building and running, or e-commerce apps and others, right? There are lots of different kinds of apps. So at the end of the day, as an end user, you're interacting with the application, but you really, as a cloud, you know, native developer or a cloud native provider, you are wanting to understand, you know, what is happening in each of these layers. And there are two very, very fundamental verticals that you understand and know about today, which is security, as well as observability, that need to support all of these layers in typical architecture, right? So again, you know, keeping that in mind, let's look at some of the observability projects that exist, you know, in the ecosystem of, you know, how do you actually look at different components in the services and the applications and the components that we built? There are lots of observability projects, right? There are lots of choices for monitoring, for tracing, for logging, for metrics collection, for analysis of that data, for being able to visualize the data in the cloud. And as you can see in this diagram, you can see some of those choices are open sourced and some of them are proprietary, they're products and some of them are SaaS services. So the good news is that the industry, in general and at large, is increasingly getting behind open source solutions and open standards. So you see projects such as Prometheus or FluentD or Cortex or Yeager or OpenTelemetry or Thanos, again, becoming very popular in large scale, you know, companies who are using, monitoring for their applications and services or any user of any cloud provider today, right? And in this whole wide set of projects, these are some very specific projects that are being, you know, gaining steam because of the compute that you use, for example, if you're using Kubernetes-based and building applications in that environment, you probably are using Prometheus, you're probably using Grafana. And, you know, that combination comes built in natively to be able to collect data and be able to process it, analyze it and then visualize it. So what exactly is observability? Now here you've seen so many projects coming about, looking at different aspects, you know, making features available to be able to collect all this data to be able to process it. So fundamentally, observability today means that you have an active, you know, understanding an active pulse of the behavior of a system. And you can track the dynamic states of a system. So your constant understanding of what is the pulse of the system, you know, through the data, the telemetry data that you're collecting. And also you are tracking the dynamic states. You also are taking into account through observability the understanding of uncertainty and variation in that behavior so that you can see what kind of anomalies are, you know, occurring in the regular functioning of a system. So the best solutions today in observability are actually live and open source. And what's new, you know, today compared to old school monitoring is that observability, you know, the capacity of computers and networks that we have used and we use prolifically has exponentially exploded, right? We have the cloud. It's run on, you know, literally millions of computers and an entire huge, huge network, the internet included. And their scale and the diversity and compute power as well as commodity infrastructure, as well as fast speeds vastly increase the volume and complexity of telemetry data that we can collect, you know, for observability and the, you know, amount of data that we can collect from the different layers of the cloud services as well as the applications that run on top to be able to empower advanced data analysis, you know, using machine learning, for example, to understand patterns at scale as well as visualize them. So very interesting code, you know, which is quite popular in the observability space is that monitoring tells you about whether the system works. Observability tells you why it's not working, right? That is, you know, it's again understanding the behavior of how systems work at scale and how they work in the cloud. So today's observability, you know, definition really has three major signals. Many of you who have worked, you know, in monitoring, in the monitoring space for a long time may already know this, there is, you know, three signals tracing, where, you know, you can see tracing, you know, traces track and very discrete transaction through its life cycle in the system. And the request for, you know, a span of traces is very scoped based on the request that is made. Metrics, for example, measure an attribute of the component, you know, at a very particular point in time. So it's a data point, you know, at a point in time for the system. And then metrics and along with metrics logs, which most of us are very familiar with, you know, writing applications, debugging logs, you know, then logging records are part of recording information about discrete events, right? So across those three types of data, there is petabytes and petabytes of data that's collected from all these millions of devices, whether they're in services and applications that are either sitting in the cloud or on, you know, using the clouds or being able to talk to the cloud, you know, through edge devices, right? And events are emitted, you know, from enormous number of devices, as you know, and those are logged metrics, which are point in time data are aggregated and aggregatable, if you will. And tracing, of course, as I said earlier, is very much, you know, the transaction and tracing that lifecycle of the transaction as it, you know, runs from start to end. So why does observability matter? Why is it essential to the cloud? It's because we want to be able to achieve the following capabilities. We want to be able to diagnose, you know, problems in the cloud. We need to be able to have prognosis capabilities to predict behavior, to understand the health of systems at any given point in time and across and over time. We also want to establish a basis for self-healing and self-provisioning of systems, which is super important in order to form a snapshot of, you know, what exactly is happening in these complex layers, right? And then these complex interactions across multiple layers, across multiple services. And last but not least, future-proofing using techniques such as machine learning for the smart adaptation of systems, right? So we are constantly trying to learn how the systems can actually self-heal, can self-provision, or at least, you know, provide patterns where you can actually correct and auto-correct and provision systems as needed. And that's a big deal because it really, you know, makes sure that you're optimizing your resources for your services as well as your applications. So what do we get out of that, right? We want to achieve all these capabilities using observability. And then what that does is for the end user, it really, really provides the ability to maximize uptime that is your application doesn't go down, it's always available. You know, that's the idea of having utility computing that you have, you know, you can spin up a cluster whenever you need to and spin down, and once you're done. But you can also have it running all the time as we do for production systems. Similarly, you know, strengthening security, which is very key for large-scale systems, especially in the, you know, large enterprises, ensuring reliability and also ensuring scalability of systems is super, super important. So that said, again, I'd like to talk a little bit about observability patterns, right? And why this matters. As many of you know, who have been in the industry for a while, even in the last decade, we have moved from building monoliths to microservices. The cloud today consists of not only multiple clouds, but it also consists of thousands of microservices. And what that means is that, you know, there are services for each kind of function. Monoliths just don't work because they are fragile and they're expensive. And they are very expensive to maintain for an end user who is interested in having scalability, reliability, availability across the world for their services and applications of the building. So microservices, you know, today, which are really modern architecture now, enable commodity scaling, as well as diverse interoperability, which is super important for, you know, again, being able to provide the benefits that we talked about before of uptime, of security everywhere, ensuring reliability, ensuring scalability. And that's something you want to keep in mind as you're designing microservices, as you're designing your applications and how do you optimize for using those services best. So I will talk a little bit about, you know, what do observability pipelines mean now, right? Observability pipelines, you'll hear about it. If you're working with any of the cloud providers or any kind of, you know, cloud architecture, you also understand the concept of pipeline. And in the observability environment, and observability architecture is typically composed of at least three stages. And that means that you are discovering data sources and collecting data. Then you are processing and analyzing that data. And then you want to visualize those patterns so that you understand the behavior of your system. So OpenTelemetry is a very interesting project, you know, it's a very, very large open source project, second only to Kubernetes in terms of the number of contributors to the project, as well as the implementation that is extremely popular in the industry. OpenTelemetry provides, you know, an open source collection agent that's called the collector, as you can see in the middle of the diagram, as well as libraries with APIs and SDKs to be able to instrument your applications to send telemetry data, as well as a data protocol that is based on open standards, called the OpenTelemetry protocol to be able to unify the monitoring, managing, and debugging of applications and services, right? So again, it's very, very much, you know, a method and then the project that is very popular to use as a collection agent to instrument your applications, to collect data from your applications, whether that's traces or logs or metrics, and also be able to collect data from your services to understand behavior of the full stack. Interestingly, OpenTelemetry also actually supports all the free data signals, as I just said, in 11 languages, which means, you know, if you're using Java or if you're using JavaScript or Go or Python or, you know, .NET, it is all available and you can just use one of the SDKs and APIs and just run with it. So again, take a look at it. It's OpenTelemetry.io, but this is, you know, the first part in a pipeline, right? Collection. The next part is actually, how do you manage and control, right? Which is part, the core part of monitoring. And many of you who are working in the Kubernetes space, especially, should have heard about this, where Prometheus, you know, which is very heavily used in the Kubernetes world, especially is an open source systems monitoring and alerting framework, right? It's a toolkit, and that has grown in popularity over, you know, last few years. And it's a very powerful way of actually instrumenting your Kubernetes applications and orchestration, being able to discover, you know, services that to collect data, telemetry data, so service discovery is a very important part of that process. And then being able to actually push that, and the Prometheus core server, of course, you know, acts as a data store, where it collects the metrics, you know, receives the metrics, and then is able to store them as time series data. And then also at the same time, along with that time series snapshot, you have key value pairs that are labeling the kind of, you know, the kind of data that which node is it coming from, which cluster is it coming from, you know, and that kind of meta information that is also sharing. Prometheus also comes with an alert manager, which actually gives you the ability to trigger alerts based on, you know, the type of data and the thresholds you're setting for different types of roles based on, you know, what you're observing, right? So if you're using an alert manager in Prometheus, you should be able to send that set of alert and then send notifications to different, you know, sources, if you're managing in production network, you should be able to send it through email or through your pager or anything else. So it's very interesting because Prometheus, you know, kind of is a great example of an open source system, which is very high quality, robust, and, you know, just as open telemetry is and to be able to manage and to manage your data is that is manage your observability and control it, right? So let's move on to the third part of the visualization, which is, you typically, once you have the data collected, once you've been, you know, collecting it over time, you want to know more about it. And, you know, Prometheus helps you analyze that data, helps you, you know, trigger alerts on it, helps you process that data and store it. So what do we do with that data? And fundamentally, what you do is you visualize it. You want to look at, you know, your dashboards, but you want to also be able to completely configure them, you have you want them to be auto configurable, and to be able to show the behavior of all your systems at any given particular point in time. And Grafana is another open source exam project, which is, you know, extremely popular for visualization. And it's a visualization framework that allows you to query the data that is stored in Prometheus as a data store, visualize that data, and also, you know, be able to use the alert manager out of Prometheus to be able to observe your metrics and alert and modify, right. So one of the great ways of really looking at this data is through a single pane of glass. You don't want to have, you know, in the old days of monitoring, you used to have 10 monitors all around you, looking at different kinds of dashboards. Now, you know, what we can do with observability components in open source is that something like Grafana can actually pull all your dashboards together and be able to, you know, clearly show only what you need, right. You don't have to see five or to 10 screens in order to go and find the information you're looking at, you can configure it and see everything in a single pane of glass. Similarly, there's also tremendous flexibility to be able to ingest data, you know, to any backend or to any database. And of course, as I said earlier, alerting is done through Prometheus and alert managers. So here you're looking at, you know, all the three parts of the basic pipeline, which is collection, you know, discovery collection, processing and analysis and storage, and then the third part being visualization, right. So all this with open source, right. These are open source projects that are used by literally, you know, a lot of the enterprise companies for being able to observe their systems. So one of the things I'd like to do here is do a call out to how Python is used, right. And I'm taking an example of Python in open telemetry, because, you know, Python is a very popular language. And in fact, you know, as many of you know, it's one of the most popular languages, in fact, the second most popular language in, you know, the language analysis and research that is done every year by Redmond, Python was, you know, again, measured to be the second most popular language in the ecosystem in terms of usage. And there are many ways that Python is used today, not only in DevOps, not only, you know, for observability, but also for machine learning and other applications as you know. So going back to Python in open telemetry, again, as I was mentioning to you earlier, there are 11 languages that open telemetry supports. And Python is one of the very popular ones. So in open telemetry, if you go to, you know, again, all the code is on GitHub. If you go to open telemetry and go to the GitHub repo for open telemetry Python, you'll see that, you know, there is a set of stable APIs and SDKs that actually enable you to collect traces and be from your application. So you can, if you have a Python application, you can actually instrument, you know, the application using the open telemetry SDK and API, and then just be able to send it, you know, using the collector, which is the collector has an exporter component, which enables you to export it to any kind of, you know, backend observability platform that you want to write. So you could be using Prometheus, you could be using some other, you know, proprietary system, or you could be using, you know, any other monitoring platform that where you can ingest a very particular format. But remember, on the, you know, on the wire, within open telemetry, the open telemetry data protocol is used OTLP. So that said, again, I'd love for you guys to, you know, and all of you to go and check out the Python open telemetry API package, as well as the Python open telemetry SDK package, which are both done downloadable from PyPy, and, you know, go and instrument your applications, try it out, you know, you can ingest traces, you can also have, there is also development we're going on for metrics. And of course, also for logs. Right now, facing is stable project is, you know, a great place to actually contribute. It's a very welcoming project. And you can, you know, help build out some of the functionality for metrics, as well as for logging. And, you know, go check it out. I shared a snapshot here, you know, there again, as you can see, exporters, the API SDK. And also, another thing I'd like to call out is that the open telemetry project is very interesting, because it actually also has an open standard, as I mentioned to you, and it also has a specification, which means that it has an observability collection mechanism, you know, which is actually a specification as a technical standard would. And you can actually go through that, take that as a reference, and then implement it in any language you want. So, you know, out of the 11 languages that exist right now, if you see that there is another language that you'd like to build out the API SDK in, all you need to do is, you know, hey, file an issue on the project, I'm interested in this language, it's not supported yet, and, you know, create a community of your own and build it out. It is an open specification. And that's the great advantage that you can take that implementation and be able to just run with it. So, that said, again, just, you know, super, just welcome you to come and participate. You know, it's a very great way of actually leveraging open source to really build some innovative observability components, as well as actually, you know, be associated as a maintainer, which is always a great thing in open source, right? So, last but not least, I'd like to actually call out on, you know, how to get involved in these large projects, because, you know, open source observability projects, many of them are very large. They have literally hundreds of contributors, you know, Kubernetes, as you know, as a great example, the multiple SIGs there who discuss different kinds of topics, different areas of topics, you know, for example, and open telemetry is also very much similar. There you have SIG meetings, you know, where different contributors, you know, are coming up with discussions and questions can come up and talk to each other, they can talk to the maintainers, discuss that these are all virtual and online. So, anybody can join in from anywhere across the planet. And, you know, they are held both on US time, as well as, which is also friendly to Europe, and then also on APAC time, which is friendly to India. And, again, super easy to participate in, and I'd welcome you to, you know, come and join some of these SIG meetings. They are recorded, you know, and they are held very regularly. The meeting notes are all public and, you know, documented very heavily. Very transparent projects. You have mailing lists also that you can follow if you're not, you know, if you cannot make it to the SIG meetings. And, of course, all the code is on GitHub. So, you should be able to, you know, go and file an issue or, you know, go and follow up on a particular problem you're having and just go and look it up. Also, as I was saying earlier, nowadays, you know, it's very flexible. So, every project, every open source project, especially, can go and have their own YouTube channel. So, most of the projects today in open source, you know, record their SIG meetings or their discussion meetings that they're having and make it available so that, you know, any contributor who's looking at a particular section or interested in a particular question can come and look at it again later. Right? And then, last but not least, of course, their chat channels, you know, just as via Picon today have chat channels. It's similar to that, you know, every project, whether that's at CNCF, ASF or other projects, you know, all have some form of chat channels, whether that's Slack or others. And in this particular case, you know, both open telemetry as well as Kubernetes have Slack channels, you can just join them and just participate. Again, I would really, you know, welcome all of you to join in and, you know, participate on these projects. Observability is a huge area and very, very compelling area. There's lots of solutions to be built. And there are lots of, you know, there's a lot of work that is happening in terms of instrumenting applications as well as compute services, as well as other different types of applications as well. Right? So, again, please take a look. And, you know, if you have any questions and I can help you, I'm always available. You know, ping me on Slack. You should be able to get hold of me on the CNCF Slack. So, that's it. I really, really appreciate, you know, joining in today and, you know, again, sharing some of my experience and really hope that, you know, all of you enjoy PyCon for the next couple of days over the, you know, 35 plus talks that are ongoing at the conference. I know we live in a virtual world today and it's so cool because I can join in from California and be able to present and, you know, talk with you. And I'd be happy to answer any questions you have at this point. Thank you so much for the opportunity and I really look forward to seeing you in person, hopefully next year. Thanks. Awesome. That was great. So, yeah, people, if you have any questions, go on the Q&A tab and add your question there. We have a couple, so I will start with them. All right. So the first question goes, I use Prometheus and Grafana, but my data collection mechanism is via NodeExporter. Is OpenTelemetry something similar to NodeExporter, like computes, collects and passes the data to Prometheus or is it different? It is, it is. It's very similar. And you have the JS API and SDK that you can use where you can just, you know, instrument that it goes and does the service discovery, it goes and does the collection for you and then can pass the data to and Prometheus server on the back end through the Prometheus exporters that are in OpenTelemetry for sure. And then you can of course visualize the data from Prometheus into Grafana, so you can, you know, interchangeably totally use that. Awesome. Let me get to the next one while we are waiting. So the question is, what is better? Use Prometheus DB as data source for Grafana or influx DB as data source for Grafana? Actually, it depends. If you're, you know, the fundamental difference really is that so influx DB, for example, just like Cortex or Thanos uses a remote write, you know, way of actually sending data, pushing data to the endpoint. And Prometheus DB on the other hand is a pull, right? So you, by default, so you want to be able to, you know, again, it depends on your use case, obviously influx DB or Thanos or Cortex are used for scalable solutions, where you're pushing data from a customer application into, you know, a server for monitoring and observing data. So it really depends on your use case. The feature set is the same, but it really is, you know, what is your use case? Are you pushing data or are you pulling data? Awesome. There are a couple of questions flowing through now. So let's get to them one by one. Totally, totally ask questions. It's a great way to actually share information and share and learn from each other. So what do you use for searching and storing through logs? Well, typically, searching for logs in as an open, you know, solution has been done with elastic search. Elastic search is a very, you know, popular way of actually searching to log information. But there are several other tools today, which are, you know, which you should track and use. Loki is one of them. And, you know, again, there are other tools. Also, there's logging being built, logging collection being built in open telemetry. But then again, for storing, you would probably end up using something like elastic search or, you know, Loki. And you would actually, if you were using a proprietary solution, you know, you could even use LogZi or one of the other products, right? But I mean, here we are looking at open source very specifically, you know, again, and I recommend using an open source solution. Okay. Next one. Can you please give a brief high level overview of Grafana? I think I went over a high level overview, so you probably can catch up, you know, on my slides. You know, again, Grafana is a visualization, open source visualization framework. And it enables you to, you know, again, create dashboards, manage them in a single pane of, you know, single windows as it is called a single pane of glass, and also be able to actually generate alerts, which, you know, push data then to these dashboards real-time, and you are able to visualize them, right? You can also actually query data in Grafana and be able to set rules and thresholds to be able to visualize that. So, fundamentally, it's a very sophisticated visualization framework, and it's open source. It's used a lot with Prometheus Lockstep, but it's also used with, you know, open telemetry and other, you know, frameworks. And you can actually go to github.com slash grafana to go and learn more. It's all open source. You know, there's some excellent documentation. Please take a look at it. Next one. How do you determine the cost-benefit analysis of having to store tons of metrics long-term versus storage costs? That's a complex question because it really depends on a lot of factors. And again, cost-benefit, you know, analysis really comes from factoring in a lot of the different, you know, components like the number of services you're using, the type of, you know, pricing mechanisms that different services have if you're operating on a single cloud versus a multi-hybrid cloud environment, you know, how are your environments and your computes configured. There's a lot of different factors that actually, you know, you have to consider when you're doing and cost-benefit analysis. And storage cost is not the only factor that, you know, influences that. Obviously, there are lots of other factors, especially, you know, when you're pumping in terabytes and petabytes of data from real-time systems that you're monitoring or real-time services or applications you have. So again, you know, again, there is a lot of, you know, there are a lot of cost-benefit models that are available for different providers. Also, the factors, you know, as to how you can use, you know, a matrix of considerations and factors. So again, I would say it's not, there's no one answer to this. There really is. It's a complex matrix. But happy to send you, you know, if you ping me on Slack, I can definitely share some links with you. Definitely. There are a lot of questions. So I'll be filtering out a few. Allulita will be available and we still have time. So I'll pick up the ones which are most voted or something and then we'll go from there. Awesome. There is one related to open telemetry, which sounds interesting. Can open telemetry be used to intercept requests to database and write to a file? If not, any suggestions in such scenarios? Yeah, open telemetry can be used to actually discover data sources and then be able to send that to a file or any other, you know, backend monitoring service that you're using. There is a file exporter in open telemetry and the collector. So you can take a look at that. And there's also, you know, you can also use a new component that is landing into open telemetry for database queries and database data metrics being picked up. It's a SQL commentator, you know, donated by Google to open telemetry. They are also actively working on the project. But you know, all of us, there are a lot of new contributors who are getting involved. And that's a great component to actually use and be able to use for database data metrics that are getting ingested for observability. So yes, absolutely. You can do that. But just to be clear, you know, you can query with SQL commentator also. Again, if you're also interested in working on the project, you're welcome to join in. This is going to be slightly interesting. Are there any new innovations in the observability space or all of the projects are mostly identical? This is going to be interesting. Yeah, definitely. I mean, this is a good question. I think that by no means are all projects identical, different projects as I was highlighting earlier, you know, in the list of projects that I was showing you and sharing with you. There are lots of different types of projects because open source, you know, enables and really, really enables everyone to be able to contribute and their ideas. And if you have a cool idea, you know, about implementing a particular type of query or optimizing query languages, you know, to get and collect data or innovate on service discovery algorithms or add machine learning or add sampling. Again, there's so many applications in the observability space that it's not just, you know, collection and processing and export and visualization, right? There's a lot you can do in that pipeline. And, you know, obviously in the hundreds of projects that exist today in open source observability, there are some which are very popular because they do provide end to end solutions. But on the other hand, there are also very specific projects that actually support specific functionality that you can plug in into a larger project like Yeager has a whole bunch of plugins or Zipkin has a whole bunch of plugins or, you know, open telemetry has plugins. So you can plug in, you know, what you need and build that out. So again, just answering the second part of your question, all projects obviously are not identical. The first part, in terms of new innovations in the observability space, as I was mentioning earlier, there's a lot of innovation because, as you know, you know, our entire method of computing and at scale has changed. We have, you know, today we live in a generation of computing where there are literally, you know, trillions of computers sitting on the network and being able to, you know, share data, operate as clusters and being able to actually, you know, build and scale out huge applications, you know, which are available across the world as services, right? So the innovation that is happening in that space is really related to these areas. How do you actually scale large scale systems? You know, it's a year at a generation we've never been at where scalability is, you know, at absolutely a new level in terms of the number of machines working with each other, right? We used to talk about high performance computing today. High performance computing is utility computing, right? So similarly, how do you make systems reliable? You know, there is a whole area of work which is happening, you know, how do you make systems self-aware, self-provisionable, and self-healing, right? I mean, how do systems understand that they need to adjust their behavior if they're going out of threshold? Do they need to add another node? Can they go and add a cluster themselves without you having to go and manually provision it yourself, right? So on these large scale networks and these large scale, you know, systems, the cloud needs to be very self-aware of itself, and that's the goal we are driving to it. So in the scale of things, reliability, security, scalability, as well as feature sets that actually support these and performance are areas where observability is key, right? If you don't understand the behavior of a system, how do you actually go and enable it to be self-aware? And observability is all about that. So needless to say, there's enormous amount of innovation going on in the observability space and all in open source. Slightly related question. So which features are you excited about in the Open Telemetry roadmap? I think, you know, I mean, we just are on the cusp of actually going one dot over all of tracing right now. And so that's super exciting because we are, you know, at this point in Open Telemetry, tracing is stable. And now we are all working on metrics. And that's something which is really exciting because metrics, you know, has very diverse space. One of the areas that I have been working on personally, which I'm super excited about is interoperability, ensuring that the Open Telemetry protocol, data protocol is actually compatible with the Prometheus protocol. And being able to actually transmit, you know, data back and forth interchangeably across, you know, Open Telemetry as a collector and then pushing it to, you know, Prometheus for analysis, right? So I think that that's an area I'm super excited about because I do think that, you know, interoperability is always key. When you have different open source projects or open source components, you still need to have an common protocol. Because at the end of the day, the end user benefits and you avoid lock in if you have a common protocol, right, which is an open protocol. So maybe that's something that I am super excited about. And then, of course, logging is coming up. You know, logging is also in beta right now. And it should be stable. Both metrics and logging will be stable by Q1, next year, just in 2022. So metrics, metrics will go stable first Q4. And then we are targeting to have logs stable. So again, lots of functionality coming up. Is there any community agreed standard for observability, like CNI, CRI, OCI in Kubernetes? It's a good question, because I mean, you know, that's one of the huge benefits that the Open Telemetry specification has had. It is actually a common agreed standard by the industry, which is the Open Telemetry protocol. And that's something that is used interoperably. So yes, just like in Kubernetes, you know, there are common standards that are forming in observability also. And you'll see an adoption of a tremendous amount of, you know, and a tremendous amount of consolidation, if you will, towards open standards, as observability projects become larger and evolve. One of the things that is happening, as I said, OTLP definitely is a common standard. Another, you know, protocol that is being worked on right now is the EVPF low level protocols. So again, at the kernel level, you know, how you collect metrics, how you collect data there, and what's the common protocol there. Similarly, Prometheus protocol has existed, and that's a common, you know, protocol. And then you also have, you know, other W3C specifications, such as the distributed tracing specification, which is also another observability standard for tracing. So yes, you know, there are definitely standards that are evolving and they are interoperable. Okay, there are a few questions which have got a lot of upwards. So I'm selecting those. How is Grafana different or similar to Kibana? Or is that even the case? At what point would a user want to switch to Grafana when there are already ELK capabilities available? Well, I think, again, it depends on your use case. Just historically, you know, Grafana actually was a fork of Kibana originally, right. And again, it evolved from, you know, that space. But needless to say, Grafana actually has a completely different architecture today, and has been, you know, natively built out to, it's a React-based, you know, framework. And whereas Kibana actually, you know, was an Angular earlier and they've been migrating it. But the point being that, yes, it is, it came from the same origins, but it's completely different today in terms of feature sets and just the ability to consume more data sources and to be able to consume different types of, you know, and to be able to process different types of data, right. So again, you can apply for a lot of complex visualization, you know, patterns and rules in Grafana. And again, the evolution trees are always different. You know, no project is the same as the other. And, you know, again, Grafana has been very heavily used in the Kubernetes space. Kibana has been more traditionally tied. And especially with the license that it has now, it's very much tied to elastic switch. It's not really used in any other use case. Whereas Grafana is far more ubiquitous and used in a lot more data sources and a lot more types of pipelines. Your thoughts on pulling metrics versus pushing them and when does pull works better than push? Well, push is very popular for services. And again, push, you know, capabilities are usually used when you have customers, you know, or users who have, you know, large enterprises, especially who have their own security protocols, right. So when they don't necessarily want an external server, like a Prometheus server to go and come and pull their metrics, they want their, they want to be able to push their data to and, you know, monitoring network, right on a monitoring service. So typically, push metrics, you know, pushing metrics is used a lot, especially in open telemetry, you know, the Prometheus remote right exporter that, you know, actually my team contributed to the project was built and is very popular for, you know, groups and users who are using the secure, you know, secure environments, whether those are banks or whether those are other, you know, the institutions who have, you know, compliance that they need to follow. And in that case, typically, you will use push protocols. You will not use pull. Pull is more that if you're running self-managed, for example. And this is again, points back to the origins of pull, you know, being typically very easy to do for a self-managed setup. If you have a small setup that you're running, you know, for your startup or for, you know, 10 engineers in your company who are building an application self-managed, you know, using pull servers is probably very easy and very easy to, you know, setup. But push metrics, I really push is used for really heavy grade service, you know, when you're sending in petabytes of data and you don't want to think about it, right? You're just completely pushing data as a stream. That's probably the last question now. What's your take on AWS CloudWatch logs and metric services compared to other tools as the observability tools we discussed? I think, you know, it depends on the generation of applications you have. If you're in, you know, building, for example, if you're building Kubernetes-based applications or, you know, using Kubernetes-based services, the chances of you using an open source, you know, solution or pipeline is very high for observability. You can also use, you know, something like the managed service for Meteos, which, you know, AWS also rolled out. But again, you know, it really depends on your use case. And, you know, if you've already instrumented with CloudWatch, you can use CloudWatch if you're using, you know, using some other pipeline like Kubernetes-based applications or other, you know, applications, you have a lot of different choices. And again, use the best solution that works for your use case. So there's no one perfect answer here.