 Hi, my name is Drew Ripper and today I would like to take you through my journey in learning about the Kubernetes ecosystem through the eyes of network observability. So to get things started, I would like to talk a little bit briefly about myself, give you perspective as to where I started. I graduated high school in 2020, so I was part of the COVID class that probably went online in the beginning of spring. I found myself now as a computer science student at the Ohio State University and I got me started in distributed systems through Science Fair actually, where I worked on a project looking at optimizing the raft consensus algorithm. And then if you look into where raft is applied, you see it's applied in SCD and that is then applied in Kubernetes to replicate, do some replicated state management. And replicate state data as to status and things of that nature in pods to make sure everything's synchronized. So really where I got my start in Kubernetes then is when I started looking for summer work in the winter of 2020 because I wanted something to bridge to get myself a little bridge between the start of my summer, sorry, the start of my senior year and the start of my freshman year of college. I also wanted to earn a little money and this was something I was not really expecting to get a job in software that paid. It's very, very hard as a high school student to get this kind of opportunity because you basically have no experience and what experience you do have is probably not applied or really all that practical. But as somebody who like to browse Angelus a lot for small companies and startups that may be looking for some more, some less developed candidates, I found myself applying to Nermata, which is a company that does day two enterprise grade cluster management tools because mostly because they listed Golang on their Angelus profile, I'll be honest. That was basically the only language that I knew proficiently and that I was comfortable developing software in. So I applied, not really expecting it to position and they have actually been amazing to me since and I have loved my opportunity working with them and developing open source software over the summer. And that is mostly going to be what we're talking about today. My experience working with Jim and developing this tool and just a quick reprieve, Jim is actually giving a talk with some people from VMware and Google about the multi-tenancy working groups. I definitely go check that out if you haven't already for Nermata. But the project that I was working on over the summer was Kubernetes and as the project was presented to me, it was an EPPF based Kubernetes network monitor deployed as a dating set and I am going to be honest, I didn't understand any of these words. I even had to listen into my calls or into my calls with Jim and try to figure out how he pronounced Kubernetes because I had absolutely no idea how to pronounce it, didn't know how to say any of these, didn't know what they meant. Pronounce it Kubernetes for a while is what it is. But as I would find out, EPPF would be collecting the networking statistics, Prometheus would be exporting these and then this application would be deployed as a dating set which basically means it is a pod in each node in the cluster as deployed by Kubernetes. But really what is the goal and why were we trying to develop this in the first place? Well we were looking to create a tool to get a better sense of what was happening in the cluster. We were looking to get all-encompassing networking statistics at the very, very base layer. So how much traffic is going in and out, where are things happening and what are causing these, what's causing this traffic? And using Prometheus, this would give us very easily interpretable statistics to be visualized or processed however we want it upstream. So this tool is open source, however, it would be very easy for enterprise solution to process these statistics in whatever they like really. And as you're going to see, really the goal is all-encompassing is we want it to be simple. We want a drag and drop solution that somebody can easily use to get a better idea of the state of their cluster. So here's what I'm going to be talking about today. I might talk first, we're going to start off with really what is EPPF and how we use it to do some monitoring and gathering of network traffic. Then we're going to be talking about how we track these stats and aggregate them, how we get a better idea of what they look like and how we send them to other applications. And then how we deploy these binaries and applications across our cluster so that we can monitor the individual nodes. Then I'm going to give you a quick overview of Kubernetes, how we tied all of these different components together and then we're going to demo it. And I'm going to talk just briefly at the end about what I took from this experience because it was a really, really incredible journey for me knowing nothing about Kubernetes or the CNCF ecosystem to actually being able to build an open source project and speaking with you today. So start off, we're going to talk about EPPF, which if you are not familiar with stands for the extended Berkeley packet filter. And while this might sound like a very big jumble of mishmash of words, it's actually not as complicated as you might think. At its base, at its most basic level, it is a byte code that runs in the Linux kernel. So you can run sandbox code very safely to interact with any process that might be happening in the background or really do just about anything inside of the Linux kernel, which if you haven't heard of it yet might just immediately hit you as being very powerful. Just as of now, even though it's a rather new technology, there are implications of networking, observability, security. For instance, Cilium is a big user of EPPF in the observability space. And Falco and security and really much more, as you can imagine, there are really endless possibilities what you could do with such a modular system. And how do we implement EPPF? Really, what you're going to do is either write EPPF byte code or probably you're going to write a subset of C called BCC, which is maintained by Iovizer. I believe it started at Netflix. This also has Go bindings as well, which most of the libraries that I utilized in Kubernetes were based off of GoPBF. If you want to write really custom code, you're going to need BCC though. It lets you write EPPF byte code and using LLVM as a back end. Though I will admit the BCC option is much more complicated. If you're not familiar with C, absolutely immediately just leverage open source code. There really is not a huge advantage to writing your own, especially if you're unfamiliar with C code. I tried this in the very beginning while I was doing research in EPPF. And I can tell you, you can easily fall down a rabbit hole of doing something very, very simple that'll take you two days. So definitely leverage the open source community. I know I used an EPPF package from Datadog that they have in their Datadog agent repository and I can talk about later. If you'd like, we can talk about it later if you'd like. But definitely leverage open source code, BCC and EPPF, while conceptually rather simplistic, it can get very, very difficult to implement as it is, again, built into Linux. So for the beginning of this is a start, but you can really use many of the different libraries that are developed by some larger organizations that have really, really amazing, well-developed open source packages, for instance, Falco, which is a part of the CNCF landscape, as well as Cilium, BPF, Trace, BCC, which is, of course, how you're probably going to write EPPF if you'd like, and really many, many more. I would like to give a quick shout out to Albin from Kinvolk because he helped me greatly in navigating some of the beginnings of the space. It was a really, really interesting journey trying to figure out where to find all the necessary documentation and he helped me come up and figure out interesting bug that came up with newer versions of the Linux kernel. So he's really, absolutely an expert with this stuff, so if you would like to learn more, I would definitely recommend attending his talk beyond the buzzwords, EPPF's unexpected role in Kubernetes, which is actually, I believe, just later today at 540. Definitely going to check him out. He's an absolute expert. He's giving a talk with his co-founder, Andrew Randall. So the next part of Kubernetes was the Prometheus aspect. This was going to play into how we visualized and saw these statistics after we obtained them with this EPPF piece. And I'll talk more when we get to the kube-netsy part about how we tied these all together. But some of the main points that you should probably know if you do not know already, I know Prometheus is a very popular project inside the CNCF landscape. But just to give you a quick idea, one of the important points here is that Prometheus is primarily based on pulling data rather than pushing. So if you're used to collecting data in SQL, maybe, this is going to be very different. Instead of pushing your data into the SQL database, which you're actually going to be doing is letting Prometheus aggregate your data, probably using the Prometheus API as built in Go, or whatever language it has bindings for. And then what you're going to be doing is then querying some HTTP endpoint to collect this data at certain intervals. This is a little bit different. You might be used to, but it actually works quite well and has some very, very interesting implications in visualization as well. It's very, very easy to create and handle time series data, and it's a great way to monitor your services. So what I really want to hit is these opportunities for visualization. Because this was one of our, like I talked about in the beginning, one of our main goals for kube-netsy was to be able to visually understand the traffic that was going through our clusters. So, as you can see in the bottom right here, a very, very popular method of understanding the statistics that are sort of exported, the metrics that are exported by the, but Prometheus is by importing them into Grafana. And what Grafana allows you to do is allows you to create these dashboards that monitor and scrape your Prometheus endpoint that is implemented into your code. So, say, scrape every five seconds, pull the current, call the, say, pull the current bytes in from a certain IP, and then you're going to graph them, and you're going to be able to see all this traffic and visualize, I mean, really whatever you want. Which makes Prometheus a very, very interesting manager of time series data. Which can also do with Grafana is do a little bit of aggregation and processing here before you even push any of this data upstream into any of your other systems is you can do some, you know, some averaging using Prometheus QL, Prometheus QL. You can put it into some bins, figure out where your 9F percentile is for latency and whatnot. And Grafana is a really, really nice, easy open source solution to visualize where you're at with your Prometheus data. Kubernetes provides a base dashboard. So you can hook Kubernetes up, follow the install script we'll talk about in a second and immediately have a dashboard that may look something like this, but has some preset charts and graphs to give you a good idea of what the networking and traffic looks like in your cluster. So then I'm going to go quick through here because, you know, you're probably mostly familiar with Kubernetes, but just to give you a good idea of how we implement it and just some of the basics that we, the community, generally finds pretty interesting about Kubernetes that I especially was very interested in learning about and understanding at a very conceptual level. So here is how I visualize generally Kubernetes in my head. So say you on the left are a user and you need a way to take this configuration, this state I want to put this cluster and these resources in, and you need a way to automatically manage that, have a tool that automatically partitions the resources in a meaningful way that, you know, elegantly manages this state. So what Kubernetes is going to do is going to take it, it's going to take it, stamp, it's going to stamp out the state and it's going to put say pod1 and node1, pod2 and node2 and that's great. You have the exact state you wanted that was specified in this config.yml. But things don't always go as elegantly as that and say pod or node2 just dies or something happens to it. Kubernetes, this is where Kubernetes shines because what Kubernetes is going to do, it's going to see, oh, it's gone, but we still need to maintain this state that the user wanted. So even though we don't have two nodes now, we still need to maintain these two pods, these two containers that the user wanted. So what we're going to do is we're just going to push pod2 into node1 because there's probably a rather crucial service that is in pod2. So really what this comes down to is this state, which is specified in this config.yml. Specifically, in the case of Kubernetes, we're looking at the daemon set. There are a few different kinds of what are called Kubernetes objects, but mainly what we focused on in Kubernetes was the daemon set because this allowed, specifically, allowed deploying a pod on each of the nodes that we had available. I believe the most common is something more like a deployment, but if you have questions as to how those work, I would go and look at some of the examples that are out there for deployments that are quite common. And that sort of brings me on to my second point in terms of stuff that I have taken from this experience is mostly that using examples. I found specifically that the YAML format is a bit hard to understand if you just immediately give it a glance. It looks very flexible and it's very fluid as to how you structure it, which is a good thing for some cases, but it is very difficult to see it and understand what's trying to be told and said. It's not very self-documenting, so when you can use comments, use comments, but definitely use some examples, and if you want to develop your own specification or configuration, I would definitely start with a configuration that already exists, maybe pull a YAML from an open source project. I know a big inspiration for sort of the structure and some of the interesting parts that I was having trouble with were pulled and were referenced to some selium, some selium YAML, because they also used diamond sets, and it was a good idea. It was a good way of getting a good idea as to how they structure things and the necessary parts of the YAML that were going to need to be there. So, just to give you a brief overview of Kubernetes before we get into the demo and a bit more detailed look at its components, I would highly recommend checking out the GitHub repo. There are some basic install instructions if you would like to check out and try it for yourself. It's quite basic, shouldn't take you more than a minute or two to get things up and running. Like I said, there's a Grafana dashboard supplied along with all of the code in the repository, so if you just want to install it really quick and point your Grafana to it to the Prometheus endpoint, you should be great and you should be able to see all of your cluster traffic. It's very, very, it's quite simplistic, so really, Kubernetes, one of its goals for Kubernetes was simplicity. So really, it's a great start for new contributors, so I highly recommend if you're new to the cloud-native compute landscape, I'd highly recommend giving some issues here a look. I have this next slide, but if you would like to read more about some of the development process, the structure of Kubernetes, and then I'm going to talk about it in a second. Definitely check out a blog post I wrote over the summer. You see the bit.ly link here, and that's a blog post. It's about a four-minute read, not too long. But like I said, there's really some great starter issues here if you're new and you'd like to get involved. These are just some small enhancements that through Jim and I's calls that we came up with and some things that we decided that we might want to implement in the future, some more complicated features that probably weren't going to get done in the time that we had available. But just some interesting things that we saw that the community might have some more use for, and then a few bugs here, as you can see, that were not able to be resolved, that kept coming up every once in a while for different reasons. So definitely a highly recommend checking out the issues here. So just to give an overview of how we used each of the different components and the technologies that I talked about briefly, there are three main packages that make up the binary that is deployed across the cluster. This is EPPF in the tracker package, Prometheus in the collector package, and the Kubernetes API in the cluster package. In the tracker package, we have EPPF from the Datadog agent slash EPPF library that aggregates some very basic rudimentary networking statistics, mainly bytes in and bytes out, and we track this for each of the different connections for both the destination and the source. So we get any of where it's going, where it's coming from, how much and how little. And this is very, very helpful when we get to the Gravano visualization because with the visualization we can actually see, with Prometheus and the Gravano visualization, we can see over time how does this connection, what's happening with this connection, and we're seeing a lot of traffic from this one IP, this one service. And that's where we also see some help from the Kubernetes API because what the collector does is whenever it sees an IP being reported from the tracker, it looks and queries Kubernetes, and it says, is there an IP or is there a service with this IP? Is there a node with this IP? Is there a pod with this IP? And if there is, it queries for its name and some more identifying labels about it and tries to fill those in. So what's actually going to be exposed by Prometheus is some more detailed information about the state of the cluster rather than just the networking. So when available, we fill in information about the Kubernetes cluster rather than just the raw networking. And the Kubernetes binary is going to implement this collector and it is going to start this process which binds this HTTP server to expose this metrics endpoint. So now I'm going to give a quick demo of some of the EBPF capabilities and packages that we used and some of Kubernetes just to give you a quick idea of what was developed over this time. So now for the second part, I'm going to talk about some of the Kubernetes functionality and how this implements the tracker package to provide a good overview of the networking statistics in your cluster. So the first thing that we're going to do is we're going to open up Kind. I have a kind cluster running right now that is working with Coop CTL. So to follow this demo, really what we're going to be doing is going through the Get Started page of the GitHub repo that is also detailed, it's a little bit more in detail on the blog post that I wrote. You can go through the manual instructions there as well that was on the previous slide. So what we're going to do is we're going to use the applied that was on the readme to just quickly install the daemon set in your cluster. You could use the install.yaml that's found in the source or you could just use this. This is much quicker if you just want to get it started and give it a quick try. So after we apply it and we get it in and installed, we are going to get the pods to see and demonstrate that our Kubernetes pod is now running under one node here. You are going to need to use all namespaces here and specify the namespace in the future as it is installed into Coop's system. So as we see, we have a Kubernetes pod, Kubernetes R79JM. I'm going to go ahead and just copy this so we can use it later. So the next thing we want to do is we want to port forward this pod so we can access it from the outside really quick. Pod name as well as we're actually also going to need the dash n for Coop's system to specify the namespace which it is in. And then we want the source port, which is 9655 and we want it to be bound to 9655 on the other side as well. So now this is going to forward all of our traffic from the outside in. Now what we can do, now that it's on port 9655 on a local host, we can do things like curling the metrics endpoint from Prometheus. So now we can go ahead and curl the metrics endpoint that is actually running in our Kubernetes cluster and we got all these Prometheus stats. And as you might understand here, these aren't very useful. It's fun to see and it's maybe nice information that includes different statistics about the network but also some information about where the source and destination and the namespace that they're in. But that's not really all that helpful. We don't want to have to keep looking through all of these. We can get a little bit better of an idea of specifically the ins and outs if we use just maybe quickly grep something. So we're going to quickly grep bytes received, pipe it into grep and look at bytes received. Okay, this is maybe a bit more useful. We can see all of the bytes received and bytes received per second. This is some interesting data. Let me just take a quick look here. We see all of the different, a bunch of different labels to differentiate the specific connection. We see, oh, it's coming from a, it's going to a pod. The destination is kube controller manager, Drew Cluster. And a bunch of different labels about the Kubernetes environment that it is in. Now, this is very, this is interesting. But as you might want, this is extensible. And you can always pipe this into Grafana. Like I said, we won't be doing that today. Is it, we won't be doing it today. But as you can see, it should look something a bit like this, a bit like this where you can see the different networking, the different connections, and some of the information about the filled in, the pod name, you know, sorry, the name of the object, and bytes per second and a lot of connections. So yeah, this should hopefully give you just a quick look at what Kubernetes does, even though it's a bit simple. I do believe that, you know, with some a bit more love, it can grow to be a helpful networking observability tool. So that was just a quick demo of some of the functionality of Kubernetes, along with some of the EBPF code that was utilized to create it. But now I want to go into some of the takeaways and what I really learned and gained from this experience developing open source software over the summer. So big one here, open source and cloud native communication projects are for everyone. At a high level, they're really just not that hard to grasp. I recommend everyone, if you're unfamiliar, just take a look and really learn by doing them. And what I mean by that, you don't absolutely need to, you don't need to develop the next network observability application to learn from them. Just creating deployment or using kind to start a dam set like that, it really is the best way to start learning about these different tools that'll absolutely benefit you in a professional or just a learning environment. But if you can, I would absolutely recommend contributing to these projects. Leverage all the open source code that has been written and do your best to create something that is maybe just for fun. If you want to contribute to any of the cloud native projects, that is one of the best ways you can possibly learn and it is one of the most beneficial of the ecosystem. You're supporting open source software, you're supporting the entirety of anyone using cloud native apps, really. So that brings me to the end here. I absolutely would love taking more questions. You know, at the Q&A here or Slack, just send me a message or you can send me an email here. You can see my GitHub profile slash Drew Ripp. Also absolutely, I would definitely recommend looking at Armada. They have some great open source tools on GitHub available. Major one, Kiverno, doing some policy management. See them on Twitter and the website. So yeah, that brings me to the end. Thank you all for listening.