 Hello and welcome to the Open Source Summit 2020 Europe. My name is Aftar Sirathan. I'm a developer advocate at Timescale. And today we're going to be talking about open source software and observability. Thanks so much for coming to this talk. I'm really excited to delve into our topic for today, which is going to be purpose-built observability solutions using open source software. Now, this is an entry-level talk. So you're probably watching because you're interested in open source and monitoring, but haven't deployed yet, or just early in your deployment journey. Now, this sounds like you. You've come to the right talk. But regardless of your experience level, I'm going to make this talk valuable for beginners and pros alike. So let's get started. Before we get into the content for today, I just want to tell you a little bit about myself. So my name is Aftar. I am from South Africa, but I'm speaking today from New York City in the USA. I want to know a little bit about you and where you're from. If you can put that in the live chat, that would be really great. This is an international conference. So it's always good to see where people are from and where they're tuning in from today. But the second thing is that I'm really interested in how to use technology to empower people. That's why I really enjoy being a developer advocate at timescale. And part of my job is to learn new things all the time. And I document that journey on Twitter and on my website, aftar.com. So if you'd like to follow along, you're more than welcome to. So let's get into the roadmap for today. There's four main things we're going to cover today. Four main topics rather. So the first one is why use open source software for observability. Under this topic, I'm going to be talking about why observability is important. Give you an overview of the areas of observability and compare open source tools with all in one proprietary tools. And we're going to look at why might you want to use open source tools for observability. In the second section, stories from the field, I'm going to be giving you concrete, real world examples, hopefully to inspire and motivate your own deployments and inform how you think about your monitoring system needs. Under this section, I'm going to be going through how three companies have set up production observability systems using open source software and give you the pros and cons of each of their approaches. In section three, we're going to get our hands dirty and build a sample monitoring system for metrics in five minutes. And here I'm just going to show you how to get up and running quickly with your own open source monitoring stack in five minutes to monitor Kubernetes cluster. It's going to be a sample system with tools like Grafana and Prometheus. And then lastly, we're going to do a recap of what we covered today and have time for your questions. So if you have questions at any point during the talk, please put them in the chat or using the question interface and we're going to get through it at the end. Now just to set a little bit of expectations, this again is an introductory talk. So I'll try to assume as little as possible and leave you with a lot of actionable next steps for you to continue your open source monitoring journey. My goal is that by the end of this talk, you will have the inspiration for how to build your own monitoring system from open source components based on the real world experience of other people, as well as a foundation in the different tools and different tips and advice to help you get started quickly. So I'm really interested to hear what you're hoping to learn from this talk. So if you can put that in the chat, that'll be really great and it informed me to understand your motivations. So if you can let me know what you're hoping to learn from this talk, just put it in the chat. It's good to see what different people are looking to get out of things. Okay, so let's delve into the first section of today's talk which is why use open source software for observability. Firstly, I just want to give you a quick overview of why observability is important. And at a high level, observability is using outputs from your system to better understand its internal state and its impact on the users. Now there's two main reasons why observability is important to developers. The first one is that it helps solve small problems before they become big problems. And the second one is that they help preempt issues before they affects users badly. There's a common saying among developers in the observability space that I want to see my problems before I hear about them from my users. So it's really a lot of preemptive by giving you information to preempt the problems so that it gives your users a better user experience at the end of the day so they don't have to deal with things like slow software or shutdowns or any other problems with their services or the product that you have. Now, what does this view into the internal systems, into the state of your internal systems allow you to do? So in other words, what does observability allow you to do? Now there's three quick examples that I'm gonna take you through. The first one is that it allows you to optimize things like your resource usage on your infrastructure and applications. The second one is a classic performance debugging and optimization you want your systems to run as quickly and as performantly as possible. And the third one is you get to respond to problems faster with things like an alerting system and things like that. Okay, now given the importance of observability, it's really important to choose the best tools for the job. These are tools that allow you to get value out of the box but also scale with your needs as your software gains more usage and your deployments become more complex. Now, generally you're choosing between all in one proprietary tools or a do-it-yourself approach using open source software. And many developers just turn to the well-known names in the observability field and those are usually proprietary software tools like Datadog, Splunk, New Relic, there's number of other ones that I can mention. Now, the problem is that while these tools are good, they're not a one-size-fits-all solution. And especially in today's times with COVID impacting the economy, money is scarce, budgets are cost, even in the tech sector, these tools aren't a viable solution for many teams. And the main reasons are that they're too expensive, they're too rigid and they're not useful enough for the needs of smaller fast-moving teams with specific needs and also teams on a tighter budget. So let's take a look why in a little bit more depth. And there's three main themes that we can compare proprietary tools and open source tools in. So I'm gonna introduce these themes to you now and you're gonna see this coming up later in section two stories from the field. And just a note about these points, the points that I'm gonna cite right now come from my own experiences but they're also a distillation of feedback from developers that I've talked to in my job as a developer advocate but also just generally from what I've read online. Okay, so the first factor is costs. Now most proprietary tools have a very opaque, difficult to predict pricing model. So for example, many solutions have per metric or per series pricing and this can leave you with large unexpected bills because these are the kinds of things that you're not really sure how to predict, especially when you're adding a new deployment or you're adding a new component to your system. Now this also develops, it also limits developer freedom because with per metric or with per series pricing, you may not be able to afford as many custom metrics as you need and this might actually limit the effectiveness of your observability efforts because you won't be actually be able to see the things that you wanna see because you're limited by price. Now let's contrast this with open source tools and open source tools are free or cheaper for hosted open source solutions and the big thing here is that cost is a big factor for a lot of smaller startups, smaller teams and as I mentioned earlier, during times like this where coronavirus has caused a lot of budgets to be cut, companies are really looking to save money wherever you can and so that's what makes open source tools a really attractive solution, especially during times like this. The second point is on flexibility and while many proprietary tools provide a good out of the box experience to have things like pre-made dashboards and whatnot, a lot of the problems that I've heard with them come one, three or six months down the line and the main downside is the limited features and customizability of the platform and for example, here you really limited the confines of what the platform has. So for example, the templates that they have and you might not actually get the most useful information about your system because of the fact that you can only really measure what the tool can allow you to measure. The other thing is that you might have a lot of unexpected maintenance work pushed back on you, that's another thing that I've heard from some of the users but the point that I put here in parentheses is scale and this is because as you start to scale up, pricing becomes really prohibitive because at this kind of per metric and per series, it might not necessarily allow you to monitor everything you want because in order to do that you would have to pay much more. And so this is where you get the worst of both worlds where you don't get something as flexible that allows you to monitor everything and it's also very expensive. Contrast this to open source software where traditional one of the benefits of open source software is that you have total control and extensibility where you can fork and modify the code as well. And then lastly on community support, one of the things that I heard during my user interviews with the developers which you're gonna hear about in the next section is that oftentimes these all-in-one tools and the companies behind them can be slow to release useful features and things that your team might want. One of the quotes was that the best you could hope for was an hour-long sales call and a vague discussion of how much a feature might cost and when they eventually deliver and then you're left waiting for six to 12 months and you can't actually use the thing that you want to. You might have actually experienced this before. So the main con there was that it's slow to release useful features and not really responsive to user feedback. Contrast this with open source tools where especially for projects like the CNCF projects, the Cloud Native Computing Foundation, these projects tend to have a breakneck pace for new features and fixes and just progress in general. And in general in the open source tool space, especially in observability, there's a high velocity of new and useful features. So you have the support of a large community and what this means is that the tools become more mature with the larger community that they have and they also become more well documented. And this makes it easier to bootstrap from the community best practices. So right now you're probably sold on the value of open source software for observability. You like the cost aspect, you like the flexibility aspect and you like the community aspect. So this is you going on Google to get started and you're searching open source observability tools. But then bam, you're hit with a seemingly endless array of projects and configurations. And this is one of the things, one of the downsides of open source software is that there's so many options to choose from and things can be a bit confusing. Do you use Fluent D or do you use Fluent Bit? What's the difference between these two? Do you use Prometheus? What would you even, what components do you pick and what do all of these things actually do? So this is one of the areas in which I wanna help you today. Let's get started with section two, which is the main event stories from the field. Who better to learn about open source software and observability systems from than the teams who've actually built and deployed them to production. So today I wanna share the stories of three companies who have actually built production of observability systems using open source software. Now for each company, you're gonna learn why and how they've gone about implementing their open source observability stack and how their configurations have fared as well as the pros and cons of each approach. Now I'm sharing these stories in order to give you a concrete real world example or many examples in this case to inspire and motivate your own deployments and to help you think a little differently about how you view your monitoring needs. So the goal here is really to help you to understand when, where and why to use open source technology. We're gonna cover for each company these four things, which I actually already mentioned. Okay, so let's get on to the first company, which is when I work. So when I work is a employee scheduling platform, they're based in the USA and their use case is metrics monitoring. And the reason why observability is important to them is because if their platform fails, if they have a problem with their software, employees don't report to work. So the users, the employees of their users don't actually report to work and the customers will actually be really mad because of that. So it's pretty high stakes and it's actually really important for them to build something that allows them to, one, make sure that their systems are performing properly and really get visibility into what's going on. So let's look at why they decided to use open source software. Now the story is that when I work was originally using Datadoc and they decided to move to an open source monitoring stack for two reasons. The first one is a pricing model. So they didn't like the paper series and paper metric model that Datadoc had. And they ran into the limits of this pricing model once they started to add custom metrics. And as I mentioned, this actually stifled developer freedom because they couldn't have as many custom metrics as they needed. And that mainly gave them less insight and less useful information about their system. The second point of why they moved was dashboard lock-in. And this is because at when I work each product team maintains their own dashboards. And so that meant that moving and sharing required a really big effort. And that's because the dashboards are saved in the proprietary system, Datadocs UI, in their file format. And they have to be rewritten entirely if you wanna move. So it's basically you gotta do all the work again. You're not actually benefiting from any of the work you've put in the specific tool. And they really wanted something where they could just switch providers overnight and not have to rebuild the whole system. And that's actually something that they've done with their new model. Okay, so let's take a look at their observability stack. So I'm gonna go through how folks like when I work have set up their stack. And then I'm gonna show you how to get up and running with your own open-source monitoring stack using some of these components. For example, Prometheus, Grafana, TimescaleDB in five minutes during the demo in the next session. So just hold tight for that if you're interested to see how to build some of these things or how to get a system with some of these things yourself. So there's really a few things to mention in this architecture diagram. So let's take a look at how where the metrics are coming from and what they're using each of these components for. So the first one is that they use AWS CloudWatch to collect metrics from their managed services. So that's not pictured here, but that's also a source of metrics. Then they also use a tool called WorldPing from Grafana which provides external monitoring and Pings to make sure that their system is up and running. So that's a WorldPing tool. That's not something that we mentioned before people do use it. And then they use Prometheus as the main metrics monitoring and metrics collection system to collect metrics from their Kubernetes clusters. So you can see here from their app cluster, they have Prometheus here. And then their GitLab cluster, they have two Prometheus, two Prometheus, I think that's the correct term here as well to monitor the actual monitoring system. But then also in the application itself. And so what happens is that these metrics get collected from the application and get sent to timescale DB in order to aggregate them and store them for the longterm. And they also take metrics from the monitoring cluster as well. And you can see somewhere there's an arrow here. These metrics also make their way to timescale DB. Okay, so timescale aggregates and stores the metrics for the longterm. And then they use Grafana as the visualization tool that provides them this single pane of glass for their system and for their applications. So they have both a staging and a production version here. And they also have Prometheus Alert Manager, which is the thing that they use for alerting, very popular with Prometheus that allows you to define a bunch of rules on your metrics and when you can trigger alerts. And that's set up to Victor Ops and their Slack so that engineers can get notified. And a bit about why this deployment is important to them. When I work said that their customer-facing deployments are actually auto-scaled based on the usage metrics. And then also another thing that this provides them is their status page on their website is actually driven by a Grafana query. And another example of where they've used these metrics is they've actually had to roll back a broken release before it caused a lot larger problems. And that was all thanks to the real-time metrics that they were picking up from Prometheus here. So you can see the setup has actually been very useful for them so far and they've quite enjoyed working with it. So let's get into some of the pros and cons of this setup as reported by when I work. The first big pro is the simplicity of service discovery and custom metrics within Prometheus. That's been a huge advantage for them. The second pro is the variety of data sources supported in Grafana. So Grafana really has allowed them to read and compare and alert from multiple data sources. And previously they've had different data sources but they were never able to fully combine them despite things like the data dog integration with AWS. So that's the second benefit, the variety of data sources in Grafana. The third major benefit is allowing them to test new versions of their software in parallel with the production deployment, test new versions of the observability software and versions of the new tools that are coming out. So for example, there was a new connector between Prometheus and timescale DB and so that now they can actually test this in parallel with the old connector. So you can actually see what performance improvements and things you're getting. And then the fourth point is the fast progress of the CNCF projects Prometheus and whatnot. The advantage here was that there's always new fixes and there's always new features being released from these projects and there's always a bunch of useful stuff to use but limited time in the day. So that's actually a very good problem for them to have. On to the cons. The first con is just the time it takes to develop the system. And so they took them a while to understand how they can fit together different pieces of the observability puzzle and they needed to decide which components work best for them. And the second con was the time to learn some of the advanced features. So things like Prometheus rules and timescale DB continuous aggregates. And the issue here was really teaching the rest of the team how to use things like Prometheus rules and timescale DB continuous aggregates. And that's because they have sometimes a steep learning curve and that means that they went through this phase of a lot of inefficient dashboards where they had a bunch of duplicate or inefficient queries powering their dashboards whereas if they use some of these advanced features which they now do, many of these queries could have been aggregated ahead of time but it's just being difficult to teach people how to use that. So that was one of the corny time to ramp up with some of the advanced features. Okay, so that ends the first company. Let me know in the chat what you've learned from it. Jungle is the second company that we're gonna feature in today's talk. And Jungle is a energy and industrial monitoring and forecasting company. They're based in Portugal. So I thought it'd be good to use a European company for this conference. And their use case involves metrics, logs and traces. And here the main thing is Jungle provides you that visibility for their users to gain control of their assets at a very high level. So let's see how that actually manifests. So the motivation for Jungle to use open source software is mainly for these three reasons. The first one was cost. So they were budget constrained and so they decided to pursue open source projects because of the cost factor there. The second point was the previous experience with some of the technology. So some of the team had previous experiences with certain tools and the integration with Kubernetes. And then the third and the reason that really sold them was the community support. And so as they built a proof of concept they found that the support of a big community around major projects like Prometheus and FluentD they had constant updates and they were always releasing new features and bug fixes. And they also found that many of the new features many of the features that they actually wanted either existed already or were being merged into upcoming releases. And so that really made them confident about the maturity of these tools as well as the quality of the documentation around in order to actually use this in a production deployment. So that's why they moved over to open source tools. So let's take a look at Jungle's observability stack a bit of a less complex diagram than the previous one. So Jungle stack is composed of a mix of metrics logs and distributed tracing between the different components. So the first point is that they use Prometheus to collect the metrics they're using Prometheus to collect metrics in order to gauge the health of their services. The second tool that they're using is Yeager which allows them to trace requests from their product in order to see how they travel through the various microservices. The third tool that they use is FluentBit and this is a tool for getting the logs from their container in order to help them detect different application errors. And then they also use timescale DB once again to store and query their metrics data. So what they're doing is sending all the metrics data to timescale DB and they deploy a single timescale DB instance per Kubernetes cluster. So this is all running inside Kubernetes as the icon here is supposed to indicate. So you can see here in this stack and then they also use Grafana, sorry for displaying metrics and logs information so that they can visualize some of the things about their systems. So you can see here this stack involves a little bit more areas of observability versus just a metrics monitoring stack that I showed you previously but that's how you can fit together some of these different components. So let's take a look at the pros and cons that they found so far. The first pro has been the operational cost they found is just much more cost effective compared to paid alternatives. The second one has been the ease of maintenance. So this is something that you might not often hear from open source projects but this is mostly around the great community around the core projects like Prometheus and FluentD and Yeager. That's made it easy for them to get updates and new features and they've really enjoyed that. And the third benefit has been the easier ingestion and querying of large volumes of data on timescale DB. So they're able to have this large volume of data on a single database without the downside of maintaining the integration of several systems by themselves. Okay, and the main con that they asked me to relay is that it really took more time to build than other of the box solution and this was good for them because they had planned ahead and they had a lot of time in order to execute the system but just be wary of the time and effort that you might need to build and you don't want a situation where you're pretty crunched for time and have to try and put together a solution that needs more time to actually become effective. Okay, so that's the case of Jungle. And next I'm gonna move into a company that I know very well because I work here, Timescale, the makers of Timescale DB. All right, as I mentioned, Timescale are the makers of an open source time series database called Timescale DB. They're based in the USA but they have a global team and a global user base. There's actually a lot of people in Europe on the engineering team. And this is a bit of a meta example because it's a case of an open source company, the makers of an open source database, using open source software to monitor its own systems. And in this case, the system that is in question is the managed database as a service product called Timescale Forge. And there were already three reasons why they decided to use open source software to monitor this database as a service product. The first one is cost. They found that Datadog was too expensive for their needs and the budget that they had. The second one was customizability. And they found that even with all-in-one solution like Datadog, it wasn't possible to get the exact views that they wanted of their data. And they also wanted a more integrated solution where they also wanted a more integrated solution where they could do things like linking with their internal identifiers. And they felt that they'd be more comfortable doing this with open source software and building their own system. And then thirdly, they found it was easier to actually bootstrap from the open source community. So again, having a big community around some of these projects gave them things like free charts and allowed them to have a more plug-and-play element as well as queries in order to monitor certain things where you don't have to come up with your own queries. There's existing ones there to monitor common things. So let's take a look at their observability stack. All right, so again, pretty complex diagram like when I work, but we're gonna make our way through it one by one. So the first thing to take note is of Prometheus. Forge, timescale forge runs in Kubernetes on AWS. And so they monitor things across the AWS stack. So you have node exporters that on the different EC2 instances that are running. And they also run Postgres instances for their customers because that's the nature of the product. And so they need Postgres exporters. So all of these things in the Kubernetes node here, you can see prom exporters. This is actually sending data to Prometheus. The other thing that they have is the default metrics exporters for their clusters. So you can see here in the Kubernetes master, you have your KubeState metrics and your Kube metrics server here. And then they use Go Prometheus to monitor the other microservices. And then they have alert thing using the Prometheus alert manager. So everything in yellow here is supposed to be Prometheus related. So alert manager here to have rules on their Prometheus data in order to trigger alerts. And this sometimes wakes up my team member Sam at 2 a.m. to go and check out what the problem is here. So that's how the flow goes there. And on the next, onto the next point, so that's how they use Prometheus. You can see a number of different exporters that export metrics in Prometheus format. The second point is the use of timescale DB. So they use timescale here again to store metrics for the infrastructure and their product itself. And it allows them to have things like Kubernetes metrics to power the user facing graphs in the application. And essentially they store a lot of metrics in timescale DB, which they wanna join with user data. So let me show you an example of that. So here's an example of the hosted and managed product that timescale has, which is timescale forge. And here, this is just an overview of the metrics of a particular database that I have. In fact, let me just show you this in my timescale forge setup itself. So I'm just gonna take a certain instance that I have here and then I can view metrics about it. And so these are metrics that I actually collected from a by Prometheus about the service running in Kubernetes. And they're put in timescale DB so that you can power graphs like this where as a user I can actually see all I wanna know my CPU and my memory and storage at a certain point in time. So they have to get that information from somewhere and then put it in a form that the user can actually make use of it. So this is why they use tools like Prometheus and timescale DB to do this. Okay, another example is the ability to join the data from the metrics and user data in order to make dashboards to get information about specific users and how they're using the system. And that's because when you're dealing with a managed product, you need to be able to support the product quite well. And so they're able to leverage open source software where for example, in this case, my teammate Rob, this is the different services that he has and you can see here like the status of them just to help support the project a bit better. There's also a number of other dashboards some of which I can't actually show you publicly but this is just the taste of the fact that you can actually join your metrics data with your user data to get more informative dashboards and just views of your information. Okay, so that's some example of how they use the timescale DB to join the metrics data with the user data. Next, they use Grafana as a visualization for the DevOps data. So in this case, the metrics data but also they use it to visualize platform metrics and things about user behavior and whatnot. And here's an example of a dashboard that's made in Grafana about a sudden instance. So again, this is important for things like supportability and so you can see the information about the services running so that if a customer says, hey, I've had this problem, you can actually see, okay, they've actually had this problem. This is the information about the database. So in this case, you can see things here like the network traffic every five minutes, a disk usage, CPU utilization, the instance state, whether the service is enabled or disabled information about it. So you have tons of dashboards like this for the customers and this is all powered by Grafana and Prometheus and Timescale DB itself. All right, so that's the Grafana visualization use case. I think that you've given a good taste of what do these dashboards actually look like. To end up here, Timescale also uses Yeager for tracing on pod requests. And then for logging, they use elastic search mainly because it's very battle hardened and it allows you to really search and index your log data really well and they use the Kibana front end in order to visualize their logs. And then they use FluentBit. So often you'll see the AlkStack, but in this case, they've opted to use FluentBit instead of LogStash as the agent to pre-aggregate the logs. And here that's just pulling the logs from their Docker containers and shipping them over to elastic search. So that's an overview. Pretty complex architecture, especially if you're seeing it for the first time when you're just starting up, but hopefully that's a shed some light on the different components and what they're used for. So let's get into the pros and cons of this setup. So the first one is just the pace of upgrades. The team have found that they've benefited a lot from the constant improvement and quote unquote, when they upgrade things, they actually get a lot better. And so the Prometheus services, projects like Prometheus are constantly evolving and they're really benefiting from the fast moving upgrades that they're getting from it. The second point is the documentation and the community. So the benefit that they've felt here is that you can actually Google a lot of your problems and you'll find that there's a lot of documents or GitHub pull requests describing the problems that you have and they tend to be widely available. And the third benefit is the full control and custom features that you get or that they've been able to build rather and they've had the freedom to extend and modify things to suit their needs. So for example, let's go back to the previous point. So for example, they've had to do things like custom authentication using Google or so that people can access the internal Grafana pretty easily. And that's only really possible with open source software. And there's a bunch of other features, Fox to Elasticsearch and Fox to different things that you need to modify for their specific use case. And then the fourth point here is the tighter integration with the application itself. Building an open source software allowed them to do things like link their metrics to internal user IDs so that you can get a more overview of the system and a better overview of what's actually going on so that you can better support and create a better user experience at the end of the day. And then going into the cons, some familiar cons here that we've heard in the previous two examples. The first one is the time and energy to ramp up on the best practices. And here there's a lot of things to learn and sometimes the docs can be lacking or convoluted and there's a lot of upfront time in order to understand the best practices for how do you use different tools? What tools do you use together? Hopefully some of these have been minimized a little bit from you coming to this talk. And then secondly has been the resource overhead and deployment complexity. In particular, they found that Elasticsearch has been very resource intensive and scaling out the logging stack is pretty intense in the words of the engineer that I spoke to. So that's a con that you might want to look out for but they chose Elasticsearch because it's the battle-hardened solution and it's serving them pretty well but it does have that resource intensive downside. All right, so now that you've seen how three different teams have actually deployed open-source software in their production observability systems, I want to show you how you can get started quickly building your own system using a tool called TOB so that you can apply what you've learned today. So before I get into that, I wanna know in the chat if you could just type the same what are you looking to monitor or what you want to monitor or what are you monitoring right now just so I can understand how you could apply this and so you can think about how you can apply what I'm about to show you to your system. Okay, so let's go through the sample open-source metrics monitoring system. So I'm gonna spin up a metrics monitoring system to monitor Kubernetes cluster as well as the nodes and the pods running within it. I'm gonna be using Prometheus to collect metrics about the Kubernetes cluster and the services running in it using Prometheus exporters. And then I'm going to use timescale DB for long-term storage and to analyze the metrics and you're gonna have a connector that connects timescale and Prometheus. And then lastly, I'm gonna use Grafana as a visualization tool so that I can query things from Prometheus and from timescale DB itself. Prometheus uses a query language called PromQL and timescale you just use a good old SQL. So those are the components you may have seen some of them in the previous examples. I'm gonna show you how you can get up and running with those ones now. So let's actually go over to the terminal and let's actually set this up. So I'm just gonna clear my screen and so the thing that I'm gonna use today, let's go back to GitHub, is something called the observability stack for Kubernetes. So what I'm gonna do is just install it. I have a Kubernetes cluster here. So let me just show you the contents of what's in my cluster. So this is in one of the namespaces called demo. I've got a bunch of microservices here. This is actually a shop. So I've got things like a payment service, checkout service, I wanna monitor this. And I've got another namespace here called monitoring that I've set up before, but there's actually nothing in it. So let's put some resources in monitoring. So what I'm gonna do first of all is going back to this GitHub. I'm just gonna follow the instructions here. I'm gonna install this tool. It comes as a command line interface, just as an easy way, but you can also do the Helm charts by themselves. So I'm gonna install this tool. It's gonna download complete. And then I'm going to install it in the namespace called monitoring. Okay, so while this is installing, let's actually take a look at what's actually going to be installed on this next slide to tell you a little bit about this tool called TOBS, the observability stack for Kubernetes. And really what this, one of the benefits of the open source community and open source tools is the volume of tools that allow you to get started really quickly. And this is one such tool, the observability stack for Kubernetes. And this is what it's composed of. So the main difference between this and the previous slide that I showed you is that you have this additional component called prom scale, which is a component that comprises both the connector between Prometheus and timescale DB, as well as timescale DB itself. So this is one component. And again, we have, and also have an additional component called prom lens, which allows you to build queries and it's basically a tool to allow you to you interact with that using promql and it's very useful for interacting with Prometheus. And let's take a look at how our installation is going. Okay, so this has actually been installed. And so what I'm gonna do is show you some of the benefits that have come from it. I'm just gonna start up my Grafana so that I can show you some of the dashboards that we get for free here. Okay, so let's take a look at what's actually been installed. So as I've showed you the components, I wanna shrink, okay. So this is the components that actually get installed. So I've showed you in the diagram, let's just match some of these things. You have the Prometheus server, you have these node exporters, which I mentioned to you there that actually collect the metrics from Kubernetes pods as well as the Kubernetes itself. We've got timescale DB for long-term storage and analytics. And then we've got promscale, which comprises the connector and timescale DB. And then we have this tool prom lens that I've just mentioned to do things like query building for promql. And then you have Grafana, which is running up here for visualization. What I'm gonna do here is very quickly just get the passwords. So you can find all this information on GitHub. I'm gonna give you a link to it now. So I'm literally gonna do what I usually do is and look at the docs for the commands because I haven't memorized them. And what I wanna do here is just get my password for Grafana so that I can, okay, one more time that I specify the namespace. So this is just my password for Grafana. Obviously you can change it and there's actually commands to show you how to change your password here. But since this is a demo, I'm just gonna keep it the way it is. And so what I'm gonna do here is I've actually port forwarded Grafana already in this terminal window to a local port on my laptop so I can access it. So let's actually take a look at Grafana so I can show you some of the dashboards that you get for free. Okay, so in this case, is it still setting up? Okay. Okay. So in this case, you get a lot of things for free using this TOBS, the tool that I'm showing you. Going back, one of the benefits of this is that you don't actually have to set up each component itself. So let me just go back to the command line here. So all these tools are already set up and talking to each other, their data is being sent from Prometheus and TimescaleDB sent from Prometheus to TimescaleDB and TimescaleDB is already connected to Grafana and Prometheus is already connected to Grafana. So what you get for free are a bunch of these dashboards that you can build off. So one of them is this dashboard for custom monitoring which shows me information about the different services that I have running and their states and number of nodes and the status of the pods and whatnot. Obviously I just set this up now and this is something that I don't actually make. It just comes with the tool. It's something that you actually get for free. And then the other one that you get is this Kubernetes hardware monitoring one which shows you information about your Kubernetes cluster. So things like the memory usage, CPU, file system usage, the network and you can actually choose which node of the cluster. I'm running a three node cluster and you can see the difference between the different nodes and then you have a bunch of other dashboards that come for free here. So for example, your CPU usage, the memory usage of your different processes and things like that. There's a bunch of useful information that you get here to help you get started. You don't have to start from zero. But going back to the different components here, all of these components are open source and they have a large community around them. So for example, Grafana, Prometheus, what this means is that you get new features all the time as well as all of these components have production users as well. The one con of this tool that I've shown you so far is that it only does metrics. So you can add in your own logging and tracing components as well. And the other thing to note is that these two components, Prom Scale and Timescale DB are actually optional components. So if you don't want to use Timescale DB or Prom Scale, you want to use another long term storage, you can just swap them out by editing the Helm chart. So you can find more information about this on the TOBS GitHub page, which I'm gonna link to just now. But you can see here, this is the thing that I've been looking at. Okay, so that's some of the benefits of the tool that I just showed you, TOBS. One more thing to mention is that with Timescale DB and Prom Scale, you can actually, so let me go back to the Grafana dashboard. One of the other benefits is that in addition to these dashboards that you get for free, you actually have a connection that's already been made between Prometheus and Timescale DB. So now I can create new dashboards in Grafana. It's already being configured for me as these as the data sources. So now I can create dashboards and new queries in PromQL using Prometheus as well as SQL using Timescale DB. Okay, there's a bunch of other benefits that you get from this tool as well, including things like long term storage and Timescale aggregation and down sampling as well as custom retention periods per metric. But those are some more advanced things and you can read more about that on the GitHub page itself. So I'm gonna go back to the presentation and we're gonna wrap things up. So this is the stack that we just created in five minutes using the observability stack with Kubernetes. And you saw some examples of the ways for you to get started quickly as a result of the work that other people have done in the community, in this case, in order to monitor Kubernetes. Okay, so the last thing that I wanna leave you with before we end is some advice from folks who have used production, used open source software in production for their observability systems. And I asked them the question, what advice do you have for those looking to build their own observability systems using open source components? And this is what they said. They gave me seven tips to convey to you. The first tip is to look at existing models and documents or rather documentation. And the key point here is that there's a lot of working stuff already available. A lot of them have decent documentation to help you go through the process of combining the different pieces of the observability puzzle. So for example, there's models out there that already exist to help you collect metrics and logs from the infrastructure. And there's tools like TOBS that I just showed you to help you get started quickly. That's the first. Point number two is some factors to consider when evaluating open source tools. So how do you actually pick tools? So you wanna look at things like how active is the community? Go and check out how many GitHub issues they have, how active their forums are, how active the Slack is, the Discord, the email list, what kinds of things are people talking about? Are they talking about things that are actually relevant to you and developing features that are actually relevant to you? The second thing is who is using the tool? So you wanna look for things like case studies of different companies and different users' experiences with the tools. Maybe such hack news to see like how someone has used a particular tool and what they've said about it, or just Google the different tools and who's using them. And the third point here is how active the project is. So you wanna look at things like cadences of new releases, look at their release notes to see how many things are actually they adding in different release and look at the different issues in GitHub as well. So you wanna pick things based on one, how important they are to your core business, and two, wanna pick things that have a larger community, a lot of people using the tool and as an active project that tends to be a good way to go. So point number three is about knowing what metrics you care about and which dashboards you want to use them. So here there's sometimes a tendency to over dashboard. So you end up with these large dashboards with 30 to 40 panels. And if you're not sure what to graph, you're just gonna graph everything. And what you end up with is a dashboard with more noise and not enough signal, which is opposite of what a dashboard is supposed to do. And the antidote to this is to really define what's important to you first and then go from there. So perhaps you wanna look at just thought with things like response times and durations. And thinking about this beforehand also helps you define what metrics you care about in the long-term. So for example, for long-term storage and aggregation you wanna use in your analysis down the line, thinking about it first and thinking about it in the beginning actually helps you simplify what you actually need to build and what you actually need to visualize at least on the dashboarding side. Then the fourth piece of advice is to get start quickly with a well-maintained Helm chart. Just to see, I've shown you an example of how to do this using TOBS. So what TOBS does is install the Helm chart underneath and you can modify the chart as you go. But this point number four really starting with a maintained Helm chart gives you a head start. Then point number five is to consider your team size and your maintenance costs. So often you have to choose between self-managed open source software or hosted open source software. And generally smaller teams wanna do less maintenance. Just you can consider a hosted open source software for your team if that makes sense to you. But just a point here is that the software is free but the human cost isn't free. And also the human cost of using proprietary software isn't free. It's often just a bit less upfront. So keeping that in mind and considering your maintenance costs and you're seeing what other tools have worked for teams in your industry and how much maintenance they've had to do is important. Point number six is really to understand things like the cardinality, the count and the volume of your different metrics and things that you're measuring. This helps you better understand and better predict your pricing as well as the performance that you're gonna get. And then lastly, perhaps one of the most important points especially since this is an open source conference is to contribute back. And so the best thing you can do here is contribute a PR and really share your story so that others can benefit from it. So for example, the timescale team contributed back a feature to the Postgres Prometheus exporter to help increase the number of queries they can expose. So even small things like that can actually, these things add up and they turn into really massive benefits for the community. So being part of the community contributing back is a final thing to do. So I hope that gives you a lot to think about as you go about the journey of building your own systems. I've also linked to a bunch of resources here to help you get started. So what's next in this journey? I'll put up the slides and resources from this talk on my website. So you can go to this URL and then check out to get the slides here, save them for yourself, go through them later. I also put up a recording of the talk there once it's out. And then we also have the link to the tool that I showed in the demo called TOBS. You can go to this short link here to go to the GitHub page that I was showing. And then lastly, it is a plug for my employers, great company timescale DB. They make Postgres for time series both open source database and a hosted and managed solution as well. So you can check them out at timescale.com. Again, thank you so much for coming to this talk. Really appreciate you. And I hope you learned a lot about open source software and observability solutions. If you wanna get in touch with me, you can do so on Twitter. My handle is on screen there at Aftar S and then also keep in touch on the website Aftar.com where you can find some of the other things that I've done. And if you have any questions, you can email me at Aftar at timescale.com. Thank you so much again for watching.