 Hello, everyone. Thank you for taking the time to attend my talk, which is titled Metrics All the Way Data Driven DevOps. I'm Hema Virathi, a senior software engineer working with the recently reorganized open services group, which is part of the office of the CTO. And in this talk, I would like to briefly go over what exactly is DevOps, what comprises of having a solid DevOps culture within an organization, and also start thinking about why measuring and metrics are highly important and critical throughout your DevOps journey, followed by a small demo where I will walk through some of the data-driven-based principles that we've adopted as a team, taking into an example a monitoring application that we have deployed. So with that, let me get started by quickly introducing myself. I'm currently actually based out of Boston, but today I'm speaking from Bangalore, India. I came here to visit my family a month ago. I completed my undergrad in computer science here, in fact, in Bangalore. And after that, I worked for a year or so with an analytical-based company based out of Bangalore and post which I pursued my masters in computer science with a specialization in data analytics at Boston University. And during my Boston University journey, I was fortunate enough to get a summer internship opportunity at Red Hat post, which after the completion of my graduate degree program, I was able to also join full-time as a software engineer. And I started my journey at Red Hat full-time in February 2019. So I'm up to completing three years this coming month. So it's very exciting to be part of this great company, and I've had a lot of fun over the journey. And I'm also fortunate enough to say that I've attended DEF CON CZ in person just before the pandemic ended, and of course, upset that I cannot see you all in person, but still giving a big virtual hi to all of you who are attending, and hopefully we can meet again in these conferences in the near future. So with that, first, let's start off this talk by actually thinking about what exactly is DevOps? Now, DevOps is part of the agile manifesto. And in terms of its principle, I think this is highly relevant, which is our highest priority is to satisfy the customer through early and continuous delivery of valuable software. Now, in this modern era of a data-driven world that we're living in, this actually corresponds to the continuous delivery of valuable insights from your data. So I think this is a great way to sort of interpret how this data-driven world is actually enable us to form a better DevOps culture and also further adopt this kind of mindset as we embark on this DevOps journey. So a solid DevOps culture is critical to any organization. And traditionally, this DevOps culture is actually built upon three main pillars, which we would like to identify as people, process, and technology. Now, the people essentially focuses on building a healthy community, which further helps you to improve communication, collaboration, and having a better shared responsibility among your team. And the processes aspect basically focuses on how efficiently we would like to operate and manage the various workflows and development processes that we have in place. And finally, the technology, which is an ever-changing and ever-growing aspect of your culture, where you're constantly looking into the latest and greatest tools and technologies, ranging from looking at improving your automation processes, improving your software development processes, improving your security aspects and quality assurance and so on. So now, this are basically your basis for lays the foundation for your DevOps culture, and oftentimes adopting this sort of culture and making this transition to this data-driven DevOps world is might be a little tough and might be challenging in the sense that not everyone is clear on what the vision or the goal looks like, and it cannot be obvious to everyone. So how do we actually make this transition of this DevOps culture into an organization much more smoothly and much more easier? With the help of data, of course. So data is highly powerful in the sense that it can give a large impact by providing equity of knowledge across your teams. So with the right amount of data, you actually also help harness the right amount of knowledge, which you propagate among your teams and ultimately also across your organization. And having this right set of knowledge, which is available and accessible to everybody on the team, will also help you to take the appropriate actions required for you to further enable better communication, better collaboration, and ultimately improving the organization as a whole. So this basically tells us that data can do a lot of great things if you acquire it in the right way, and if you capture it in the right sense. So now the question comes, what kind of data should we be looking at? Where do we go and get this data? And since we all work in very open and interconnected sort of fashion, especially at Red Hat, we have multiple teams which often collaborate and also contribute in different ways. So when you think about this, you need to make sure that you're taking into aspect data from different personas. So in your entire project, you have people which are product owners, your project stakeholders, you have your software developers, you have your technical architects, your operation specialist, you have your analysts and so on. So ensuring that you collaborate and get each of these insights is what makes the entire DevOps culture be more fulfilling. And of course, combined with metrics or your suitable choice of measurement is also a great way to understand where you are currently at in your DevOps journey and where you would like to track and where you would like to be progressing towards in the future. And thus gives rise to this data driven development. So like we saw, you are trying to aggregate your information from all of these different people working in your team, and you want to gain insightful information from this, and you want to convey it in meaningful ways to various other team members. And doing so, you're providing greater visibility into all the progress and the work that's being done on the team, which further helps to foster better decision making processes, not just within your team, but also as an organization in general as a whole. And finally, you're able to answer questions quickly. That is, you're able to combine your data from different data sources, you're able to look back in time on your data, looking at historical data, you can sort of tweak and filter out the granularity of your data, and then eventually you will be able to create some insightful dashboards, visualizations, or any other reporting that you can share among the team. So these sort of three factors is what gave rise to this data driven development. And also keeping in mind that when you think about this approach, there is a lot of measurement that's also being involved. And this measurement is highly important because you have just started this DevOps journey, and you would like to track how much progress you're making along the way, how are we measuring a success, what success looks like to you as a team, and sort of jotting down everything as we go along the way. And what better to do this than looking at metrics as your means of a successful impact of the work that you're doing. So that leads us to why metrics is essentially the secret ingredient for any DevOps transformation that you're trying to adopt in your organization. And with the right metrics in hand, you have the ability to leverage all the insights from the various experiments that you create, you can further capitalize on all the success that you're generating. And also you can change courses rapidly in case you see that there's no benefit in the product or services that you're building or providing because the metrics are clearly showing you that something is not working out. And that gives you the advantage of making the right decision, taking the right calls, and essentially improving your processes in in the future. So it's very similar to how cooking is a continuous learning experience. Similarly, exploring, identifying and collecting the right metrics is also a continuous learning experience throughout your DevOps journey. And oftentimes, we always have this question, what is that one metric that is very important for DevOps? Now, unfortunately, there is no one size fits all answer to this. Metrics will vary depending upon your use case, depending upon what the vision and the goals look like for you as a team, for you as an organization. So it's very important that we also categorize our metrics, just like how we have a wide range of food categorization with vegetables and fruits and dairy products to sweets and so on. Similarly, we need to also carefully figure out how you would like to categorize your metrics. And when choosing your metrics, there are actually two basic things that you need to keep in mind, which is the input and your outcomes. So for example, let's say today, it's a Friday evening, you decide you want to have a nice Italian meal at a restaurant for dinner. Now that being your ultimate goal and outcome. And the inputs are things which are in your control, which can help you reach that outcome, which can help affect that outcome. So in this scenario, your inputs would be the choice of restaurant is up to you based on the selection of restaurants that you have available. You are the one in control to choose which restaurant you would like to go to to have that perfect Italian meal. So similarly, in the context of DevOps, if your outcome as a team is to improve customer satisfaction, then the inputs that you need to take into control are things like how do we improve the response time to tickets? How do we ensure that we meet our SLAs on time, which are your service level agreements? And then maybe look at things like how do we improve the customer experience and so on. So having this sort of classification of what your inputs are, what your outcome is will better help identify what kind of metrics you would like to start categorizing as. So now that we know sort of how we can categorize our metrics, we will look into how we can actually start thinking about the tips and tricks that you can incorporate when choosing these metrics. So firstly, you're trying to cook some Italian dish. Talking a lot about Italian, I think maybe I'm also craving Italian food. But anyway, so if you're trying to cook some Italian dish, let's say pasta. The first step is always identifying that right set of ingredients, the right set of produce that you need to start preparing this dish. Similarly, you also need to start identifying what are the right data sources that you need to look into that you can tap into to start collecting reliable data and reliable metrics. And we as a team primarily look at Prometheus. Now Prometheus is an open source monitoring tool. Not sure how many of you are familiar with it, but even if you're not, Prometheus is a widely adopted monitoring tool, which essentially collects a lot of time series based metrics. And since most of our applications and services are built on top of OpenShift, Prometheus is also easily available to be hosted on top of OpenShift. And it can scrape metrics from these applications and services. And you can look at these metrics either directly from the Prometheus UI, or you can also store them in a suitable storage like Ceph, Postgres SQL databases, whatever your choice of database is, you can start collecting all of these metrics. So once you have these metrics, and once you've identified what is the right data source for you, you are ready to move on to the next step, which is to actually start exploring and analyzing your metrics. Now this is a very important step. This is basically where you're playing around with your metrics. You're essentially cooking up with all these metrics. And to do this, we find it best that you programmatically explore your metrics. That is to say, in our team, we use Jupyter Notebooks. Now Jupyter Notebooks are nothing but a web hosted development application, which are primarily used by data scientists. And it is supported by essentially Python language. So you can interactively run your code inside these notebooks and use relevant Python-based libraries to start developing your work. So we use these Jupyter Notebooks as our main tool to actually explore some of our metrics. And we have also developed a Python-based Prometheus API client, which will help you to fetch these metrics and store them and process them into suitable data frames so that it's easier for you to further understand the behavior of these metrics and process them into maybe charts and visualizations and so on. So once you've cooked up with all of these metrics, the final step, just like how plating the dish, is very important. Similarly, presenting your metrics in a similar fashion is also equally important. So to present these metrics, we use suitable visualization tools, one of which is Grafana. Now Grafana is again an open source visualization tool, which is highly compatible with Prometheus. You can create customized charts and customized dashboards. Just like any other visualization tool, you have bar charts, pie charts, line graphs, etc. And you can run your queries to identify the right way to represent your data. And then of course, you can save it as a dashboard, which can also be exported as a JSON so that you can save your work for later and also share it among your team members. So these, I would say, are essentially some of the most important things to keep in mind when you are thinking about monitoring a given application or services and what essentially would sum up to sort of start thinking in this data-driven mindset that you can start adopting for not just monitoring but in any other aspect of your project. So with that, I want to go through a quick demo, which will share some of the resources with you for how you can start this monitoring journey and how you can start incorporating some of these data-driven guides and principles by taking an example application that we develop and use in our team. So I have a GitHub repository. I will make sure to share all of these links in chat once I exit my presentation mode. But this GitHub repository is essentially what we think consolidates all information and guides about documentation about how you can foster this data-driven thinking. And to make it easier and visually appealing to everybody, we have developed it as a Jupyter Book. Now, Jupyter Book is nothing but a web-hosted web page render which consolidates all of your markdowns and notebooks and basically shows it as a hosted web page somewhere. So instead of a notebook that's running somewhere locally, this Jupyter Book will help you to add some documentation along with it and have it deployed somewhere permanently. So these Jupyter Books actually we have created a pipeline that you can also try out for yourself if you're interested to know how you can set up this Jupyter Book and this is called the Meet Your Pipeline. So all you would need to do to have your own Jupyter Book created is to enter the URL of the GitHub repo that you want to build your book for, specify which branch you're interested in and you also have the option to select how long or how active this Jupyter Book should be. And once you do that, your Meet Your should start running and over here we have a list of some of the successfully built Meet Yours which are running and if you take a look at one of them you get to know which repo it's building it for and in here you can click on the rendered web hosted page as well as the Jupyter Notebook itself. So that's how this notebook is being rendered as well and two things to make sure that the Jupyter Book works for you is you need to have a config YAML and a table of content YAML. So the table of content essentially is the structure of your Jupyter Book. It has all the various chapters, the sections however you want to format your Jupyter Book to look like and then you have your config. So in the config you're essentially just mentioning which repository, which branch you want to build it on top of, you can add a title, author and so on. So that's all that you would need to do in your repository, rest is however you add content and then your Jupyter Book is built and created over here. So we have taken an example application called Kebyshet which is a bot that automates managing dependencies in your project and it currently supports Python-based projects. So it looks into the Pippen files and the requirements.txt files to help update your dependencies accordingly. So we've listed out the architecture of what this tool looks like, some of the workflows that this application is being triggered by and further understanding how we can monitor this application. So this section describes all the tooling that Kebyshet requires and then we move on to actually coming up with metrics. So like we saw in the presentation, we've categorized our metrics to target different personas. So from a product owner to an operations specialist to users to an analyst, you can come up with different metrics. So as a product owner, you might be interested to know how many active users do we have for Kebyshet over time and then if you look at from an operation standpoint, you're more interested in the performance of the application. So you want to look at the average time that it took for a Kebyshet workflow to complete successfully and so on. So once you have all these metrics defined, you go ahead and start collecting and storing these metrics. Like I said, we use Prometheus and we use S3 as our storage and then you go ahead and access these metrics within a notebook which is our choice of preference by connecting to different APIs like the Prometheus API client. You can fetch all the required metrics and store them in like a data frame format so that you can analyze it better. So now that we've sort of explored these metrics, we've also created a Grafana dashboard here. So this Grafana dashboard, as you can see, it starts off with giving us a status on if all the components and services are up and running, followed by all the product owner metrics like number of new active users over time on a monthly, weekly basis. And then from an operations standpoint, it has things like how many workflows were created, how many workflows were successful or failing. You can filter out those. And then what was the average time taken by these workflows categorized into different buckets based on the durations. So this is some, is a dashboard that you can essentially share with your teammates and also from an operations standpoint, you can also take a look at it to further debug and figure out if your application is down or, yeah, further understand the application to be serving better for your end users. And that actually concludes my short demo. Here are some references. I will make sure to add them now into the chat. But thank you so much for your time. And if you have any questions, I am more than happy to take them now. Thank you, Hema, for the presentation. I do not actually see any questions. But anyways, I know that you wanted to talk to people via WorkAdventure. So we are going to provide the link to the WorkAdventure. So anybody who will get a question later can get in touch with you. That sounds good. All right. Thank you so much. Thank you, everyone. Hope you enjoy the conference.