 Hi everyone, I'm Alulita Sharma from AWS and I am also an open telemetry governance committee member. I will be presenting today on building Prometheus support in open telemetry which has been a great effort in collaboration and give you an update on what's in progress. A little bit about myself. I'm a principal technologist at AWS and I lead open source strategy and engineering. I'm also a co-chair of the CNCF technical advisory group for observability and I have a great appreciation for open standards and interoperability. I see Prometheus support in open telemetry as a great opportunity to multiply the usefulness and reach of observability data. And I see the effort that we are making on the open telemetry project to fully support Prometheus as an effort that really will bring great value to both projects as well as to the users who use these. So many of you know already about observability and as you know observability brings a deep understanding of the behavior of a system. The best modern solutions today in observability live in open source and open telemetry as well as Prometheus are great examples of such amazing and powerful open source projects. Tracing metrics and logging as you know are the core data signals in observability and they give the opportunity for the user to be able to use metrics for aggregation tracing for each request which is transaction based as well as logging and events that are collected for telemetry. So observability in Kubernetes is an area that is near and dear to all of our hearts because we have been working in that space and it's something that you know really is important for Kubernetes developers to understand and use telemetry data to understand in turn what the behavior of their code is during runtime and having more observability data, whether that's metrics or traces or logs really helps in the effectiveness of Kubernetes monitoring. So a little bit about the architecture of open telemetry just to set the context for the components that are being enhanced and the overall objective of the project is that open telemetry not only provides an open standard for collection of telemetry and data format in the form of the open telemetry protocol but also provides open source components such as the collector agent and API and SDK libraries in 11 languages which provide the ability to use and customize your applications to build in different languages and provide and send telemetry data which can then be processed, analyzed and visualized down the pipeline to better understand what your applications and your infrastructure is functioning as is it healthy, does it need provisioning or is there something that can be done automatically. So open telemetry obviously provides the ability to collect that data and process it and then send that over to a very important component which is used for analysis in the Kubernetes space for applications running in the Kubernetes environment. And Prometheus as many of you know has been around for a while it's an open source framework for alerting data store it's also used as a data store for time series data, metrics data and of course is used for processing and monitoring your metrics for telemetry. So Prometheus collects, you know, and stores this metrics data as time series, and then it also makes it available for an alert manager to trigger alerts and notifications to users who are looking at this data and understanding the behavior at any current point in time about their applications and services. So when we started building Prometheus support in open telemetry earlier this year. The objective really was as a fundamental tenet of the commitment that open telemetry has as an popular and well rounded collection. Agent open telemetry is committed to having full support for Prometheus, and we needed to support Prometheus protocol compatibility, that is at the given that open telemetry actually has a full data protocol to recognize the Prometheus exposition format to interoperate with it. And similarly have the components that are able to support service discovery in a scalable way so the collector specifically which is used in conjunction with the API as an SDKs for collection or by itself dynamically. And this this particular component should be able to support scalable service discovery data scraping receiving and exporting Prometheus data and being able to use push and pull mechanisms exporters to be able to send and that data to a any kind of a service backend or the Prometheus server server itself. And then you have the open telemetry APIs and SDKs, we should be able to ingest and export Prometheus data interoperably. And then of course using OTLP on the internally but being able to completely support and ingest as well as export Prometheus data formats right. So you can see on, for example, applications that are running in the Kubernetes space or Kubernetes itself with this orchestration and different containers, as well as services which are which could be hosted services such as Amazon EKS are emitting metrics all the time. And the collector as well as the open telemetry APIs and SDKs can be used to be able to collect Prometheus data or metrics being able to receive use the receiver process in the collector and being able to write it using and Prometheus remote write exporter to and manage service in this case the example is of manage Prometheus from AWS, and then being able to even visualize it with and service such as manage profana, which is a visualization platform commonly used in in the Kubernetes space. So you can see that you know a whole pipeline requires that end to end Meteos support, especially when users are using metrics for observability. So these are the requirements that we needed to support and we needed to figure out you know how to make sure that the data protocol and the components were all being able to not only consume this data but also be able to process it and then send it to the data sync of choice. So this involved you know supporting and making changes first of all to the data model and this meant that all the Prometheus metrics data types including counter gauge summary histogram should be supported by the metrics data model for for open telemetry and the second part was taking the Prometheus existing receiver in the open telemetry collector and being able to add and enhance service discovery as well as the scrape configuration support in in the receiver itself. The operator also be needed to improve and have modified to be able to now handle a diversity of deployments types such as deployment even set and stateful set support to be able to then trigger any collector you know using the open telemetry operator and with the CRD and be able to run multiple instances. Similarly modifying and enhancing the exporter that exists for specifically the Prometheus remote right exporter to be able to handle. You know finer details of Prometheus process processing using stale markers nans and others other specifics that are required for how Prometheus server you know it requires the data to be ingested and then you have of course compliance tests to verify that the Prometheus remote right exporter is fully compliant in in the data that it is emitting passing all the tests that have been defined by the Prometheus project as well as an area that we are working on right now is extending the Prometheus receiver compliance tests so that those are also available for being able to ensure that what the Prometheus receiver in the open telemetry collector is consuming is fully compatible with the Prometheus protocol. The other part of which was which has been really cool in working towards adding interoperability support in open telemetry for Prometheus has been working with the Prometheus community and closely and facilitating enhancements and Prometheus such as adding HTTPS support that I would that rolled out in 2.29 of Prometheus version as well as defining remote right compliance tests compatibility tests and helping also define some of the receiver compliance tests so again a lot of collaboration here and a lot of you know really cool improvements on the open telemetry side to reinforce full support for Prometheus. So some of the wins that have been very very you know exciting on the open telemetry side have been successful coordination working with talented engineers across you know both projects open telemetry as well as Prometheus to discuss and solve some technical challenges you know sometimes a single engineer may not know all the moving parts of these complex open source projects and so learning we have had over you know working on this interoperability requirement over the last few months has been that every org has limited resources right and combining these resources really leads to the creating a larger whole which is essential to collaborating successfully and and bring in and creating results and and we have been you know very successful in being able to build out the support in a very clear clear way on both the protocol the data specification as well as the implementation on the open telemetry side to ensure Prometheus is fully supported similarly some of the challenges have been that the you know there was a bunch of code that was inherited from the open census in the open telemetry collector and as we looked at it and we worked through it it really required some rewrite and really iterating through that slowed us down a fair bit because you know there were a lot of assumptions that were made in open census which necessarily you know have don't apply anymore or have changed and the assumptions in the open telemetry project are different they're extended they're more scalable you know so taking those requirements into consideration has been very important similarly one of the other challenges that we have you know ran into and have been working through has been that there's never enough resources right even on open source projects you know we are completely a contributing contributor driven work and there is no not enough resources in any particular contributor so we had to effectively figure out you know how to get so many different players to collaborate productively I mean how do you you know bring everybody together and you know ensure that interoperability support is built so that has required leadership mentoring study communication on the projects as well as you know the key it has been instrument very key and instrumental in producing the collaboration results that we have had and huge thanks to the premier TS and open telemetry contributors who have participated in this whole process of and work is still in progress it's something that you know as we are doing we are looking at requirements design discussing you know what's the best design and then implementation and testing and that whole life cycle takes a lot of you know coordination reviews code reviews design reviews discussions from all the contributors so huge thanks to them the process that we have followed you know in in adding this interoperability support which is something that you know at an open source project scale as large as open telemetry is is something that really works is has has been having effective work groups and and being able to have a weekly discussion open telemetry has some excellent things where you know and maintainers and contributors come and discuss different areas they're working in or different questions they have and what are some of the architectural design decisions that are taken so the work group that we created for weekly for me yes specific discussions has been really very useful and various instrumental in making taking decisions that your engineer experts coming in from different companies different projects you know discussing these issues together and figuring out what is the best solution for a particular implementation similarly maintaining a very transparent and up to date backlog has really helped us in tracking enhancements and changes to the protocol the code itself for the collectors as well as API's and SDK's and as well as the technical documentation including providing more clarity on use cases which is always you know a very tough area for projects to collect but on the other hand open telemetry you know has really had the benefit of having a huge number of different users you know both developers and end users coming and discussing the use cases which has been very helpful for us in identifying what are the specific areas we need to enhance technically you know in terms of implementation first and prioritize that and and the other aspect which has been very useful has been regular communication as you as you know on open source projects communication is key and working on such large components which have an industry impact requires regular communication so open telemetry collector and API SDK groups which meet in other SIG meetings we have updated them regularly from the Prometheus web group making sure that everybody's on the same page as their decisions taken technically as well as from a design standpoint or data protocol standpoint so that all changes are not duplicative or repetitive right so again communication is key there the commitment also from the open telemetry collector approvers and maintainers to ensure that code reviews and design reviews are done regularly and you know keeping Prometheus support as has first class interoperability on the project has been very instrumental in making sure that we made good progress so far that said work is still in progress and we are tracking the backlog at the Prometheus web group on open telemetry you can go check it out anytime to see you know what the project group is working on and you can track the issues right here on the web group there also are you know we track and triage the backlog regularly so again there's some some element of project management here which is very essential in making sure that across you know multiple components the same functionality and feature set is being supported and the same requirements are very clearly stated so please go and check it out but this leads again back to you know what are we trying to achieve here and building Prometheus interoperability support has been very key in being able to drive and build out metric support for open source observability on open telemetry and the status of the project in terms of moving tracing stability metric stability and log stability is work in progress we maintain that very regularly on open telemetry IO slash status from a metric standpoint again the open telemetry protocol is now stable as of April 2021 the data model is also now stable as of July 2021 and work on the Prometheus support is still work in progress with some more enhancements that need to be made in the collector so that work is in progress as well as the APIs and SDKs the metric support you know being built out as well as full interoperability with Prometheus both for push and pull mechanisms is work in progress so that's supposed to be landing in November hopefully you know if we can actually gather up some speed maybe beginning of November or end of October but at this time again you know there's lots of work on the project there's also a lot of areas that we need more contributors on so again if you're super interested in open source observability or in or you understand you know from as an end user or a developer who's worked on Prometheus or on open telemetry before please come join us I help us you know build some of these components as well as code review super valuable to have expertise being shared across all of us to really build out a stable and steady support for not only Prometheus but metrics in overall so thanks for listening and I really appreciate the time from all of you for you know this update on what progress we have made on in Prometheus support the future is bright we are actually looking forward to having full support and continuing to maintain that for the two projects to work closely together and also building out end-to-end protocol support so that any large scale implementations of telemetry and ingestion as well as export of the petabytes of data coming in for observability is fully operable and usable on the open telemetry site so thank you again and have a great day