 My name is Shannon Weirich. I'm the VP of Research at a company called NS1. And today I'm talking about Pack Advisor. It's pronounced Pack Advisor. The talk is deep network traffic observability with Pack Advisor and Prometheus. Let's jump in. So I'll talk about this project that I've been working on called Pack Advisor and also a project that's grown out of Pack Advisor called ORB. And together, the goal is to supplement modern observability stacks and help you with edge network observability. Both of these projects are free and open source. And their goal is what we call dynamic orchestration of business intelligence at the edge. If you haven't heard of this term business intelligence, it means extracting actionable insights from these data streams. And the goal of these projects is to push that process out to the edge. So I'll give a little context on deep network observability, talk about the Pack Advisor project, build on top of that to using Pack Advisor with Prometheus, and then get into the ORB project. So deep network observability, what do we mean by that? So this is this idea that we can examine the traffic flows that are happening between end users and applications or among the services that make up your application. And we can inspect that network traffic. And if we can do that, we can often gain insight into helping us run the applications. Helping us operate them, helping us debug problems, increasing security. But analyzing this traffic, especially in a deep way, and collecting the results of that can be challenging. Especially if you have a very distributed application, maybe among hybrid infrastructure, among many geographic regions, among nodes that are coming up and down frequently. So there's a lot of challenges there. And the question is, how do we orchestrate that to be able to extract these insights that we're talking about? So a little context from NS1, where these projects are coming out of. So at NS1, our goal is to connect applications and audiences. That means application traffic optimization. And one of the ways we do that is with our managed DNS network. So we have this global Anycast network with many pops around the world that's delivering this DNS traffic. And we need to maintain this network, right? We need to tune it. And so much like a CDN, our goal is to get these endpoints, close to our end users, so we can give them the best experience. And so we need visibility into those traffic flows so that we can do that tuning. We are often the subject to attacks, right? Melissa's traffic, DDoS attacks. And that's actually one of the ways that Pack Advisor was originally created was to give us visibility into this type of traffic to help us understand it, to help us protect against it. And then this visibility we're talking about has sort of two different ways that we need to understand. One is this at the node, very high resolution, very real time, understand exactly what's going on, help us debug individual nodes. And then we need this ability to back out and get this global view of our entire network to gain insight at that global view or maybe at the regional view. And we wanna be able to look at different dimensions over time of this network traffic. So what are the questions that we wanna ask of this data? What are these insights we wanna pull? Well, it certainly includes simple metrics that you're all familiar with, counters and rates of things. So of course, how many packets have we seen on a certain interface or what's the rate of ingress or egress. But this is where we wanna go deeper. We wanna use streaming algorithms to do this deeper analysis and give us better insight directly at the edge here. And so one example is frequent items, otherwise known as like a heavy hitters, top 10 lists across different network dimensions. So that'll be things like, what are the top 10 IP addresses? What are the top 10 queries? We care about cardinality. So again, right there on the edge, we wanna know things like how many unique IP addresses have we seen in a certain time period, right? How many unique query names have we seen in a certain time period? Quantiles and histograms, also very useful of course, what's the P90 or P99 of a transaction timing? What's the histogram of different response payload sizes happening right now? On the security side, so amplification factor is useful. Are we seeing very small queries come in but very large responses go out? Are people abusing that right now? For anybody who's familiar with DNS at all, you might have had this problem before, which is I deleted a DNS record. Who's still querying this record that I thought I deleted, right? And trying to track that down, find the sources. And also when we care about the sources of things, we care about what geographic regions are queries coming from and what ASN numbers are querying our network. And then just to summarize the security bit, again, understanding whether traffic is malicious or legitimate is something we need to do. And it's not always that easy to figure out. Just because we see a spike in traffic doesn't mean that it's malicious, right? It could be legitimate. It could be, in our case, it could be a recursive resolver farm whose cash has gone cold and they're just sending a lot of queries into us at the moment. But if we do think it's malicious, we need to understand what type of attack it could be. Is it a random label attack? Is it widely distributed? And that widely distributed piece, by the way, is where this cardinality can help us. If we see the number of unique IP addresses blow up in a very small period of time, it could indicate a distributed attack. So again, what protocols are there? What's the target? These are the questions we wanna know. So let's jump into the packet visor project. So it is called PKT visor, but we say packet visor. So this is a free and open source observability agent backed by NS1. And again, it grew out of our need to observe our critical infrastructure over the past several years. It's gone through a few rewrites. And in the most recent rewrite, we've had a couple of goals. It's traditionally packet capture focused. And so it's a tool that currently operates mostly in the way that you might be familiar with like a TCP dump or a wire shark where it's analyzing packets on the wire. But one of the goals of the most recent rewrite is to have a very pluggable system both for the way we get data into it and the way that we analyze the data. And so there's a modular system for doing this. We can imagine many more types of input sources. This is where we might start seeing things like Sflow and NetFlow input sources, DNS TAP, EBPF, Envoy TAPs. The goal is to have this packet visor agent sit close to data sources and be able to passively observe using this TAP methodology. Another big goal of the project is to drive observability through dynamic policies. And that means that the tool itself exposes a local REST API where it can be reprogrammed in real time as well as offer a way to collect the metrics that are coming out of it. And of course we want it to plug into modern observability stacks. And a big part of that is offering native Prometheus output. So here's a view of packet visor as it's watching in this case raw DNS traffic. And I'll mention that of course because it grew out of this NS1 company and it grew up sort of observing DNS for us. Some of these examples are DNS focused but it's not limited to DNS. It's able to do inspection on different types of traffic, different protocols and certainly as we get more input sources that'll expand as well. But packet visor's view is this data that's flowing along the wire. It's coming in waves, it's going up and down. And it's able to access information that's inside the packets. And that's certainly things like the source IP and the destination IP. But when we say deep analysis, that means it's going in and it's getting to that application layer information. And it's pulling out things that are specifics to applications. So in this case DNS. And that means query ID, query name, number of records that it's returning, the type of query and so forth. And so it's observing this in real time as the traffic is coming in and it's summarizing. It's using these streaming algorithms to pull out all the answers to those questions that we asked earlier. And so we're able to get one minute bucket by default one minute buckets of information. And these are the top queries and the rates and the percentiles and all the metrics that we mentioned before. So here's a view of PackAdvisor and its policy system. And we're focused in on, again, a raw input stream here, which is an ethernet interface. And I've got three example policies. And all three of them are currently tapping into this ETH0. And when they do that, they're able to specify a filter at this point. And in this case, it's port 53. And this is filtering at the kernel level. This is a BPF filter. And it means all of these policies are able to process network traffic, which is focused on port 53. And then these three other policies that we see here are working concurrently. And so the first policy has two different analyzers attached to it. One is a network analyzer, one is a DNS analyzer. And both of these are working, again, with these streaming algorithms to pull out that information. And so we're pulling out the network information, we're pulling out the DNS information. But we also, in this first policy, have added another filter. And that's where we are focused on NX domain traffic. And in the DNS world, that means a record that was not there was queried. And so this first policy is geared towards finding the queries that are coming in that are causing NX domain traffic to happen. So it's gathering the results of those metrics in memory and it's making that available for collection. And it's making that available both as Prometheus native output, as well as JSON output to be scraped. Again, on this local web server that's running. So concurrently then, we have another policy and this time it just has a DNS handler attached. And it's focused on in a different way. It's pulling out all the same information, all the same metrics, but across a different dimension here where the query name is endsinfood.com. And so we'll get all the same metrics but only for records which endinfood.com. And again, that's exposed on a separate endpoint. And then finally, we have another policy which is just looking at network information but it's been configured to collect every 30 seconds. And so we can also imagine that there's multiple inputs happening at the same time. As we add things like DNS tap support or Sflow support you can concurrently run other types of inputs and other types of policies. And we can spin these policies up and down in real time. So here's a console view. On the left hand side, you'll see that we're querying this local pack advisor instance which is running on a port here. And it's accessing the JSON output. And so on the left side, you start to see things like the rates, the top Q name too, the cardinality of Q names that we discussed. So it's all available in a generic JSON format. And on the right hand side, it's querying a slightly different endpoint. And you see the Prometheus version of that output. All the exact same data but in Prometheus compatible output. And here's another way to view it. So included in pack advisor is a command line, user interface. And so this is attached on a local node to the local pack advisor. And it's reading that higher resolution data in real time and displaying it in your console. So at the top again, we see a summary of a lot of the packet information, a lot of the DNS header information. And then most of the real estate is taken up by these top lists that we care about, the top query names, the top slow transactions, top geolocation and so forth. We've tried to make pack advisor very easy to use and install so it's available as a Docker container. It's also available as a static binary. But in this instance, we're seeing that we can just pull the pack advisor. If we wanna start with a very simple default collection policy, here's a command line that'll do that. We're telling it that we wanna run the pack advisor daemon. We want Prometheus support. And then we give it an ethernet interface, substitute in of course what's appropriate for you. And then we see a familiar endpoint that lets us collect the Prometheus metrics directly. And finally, as I mentioned, the command line interface is included inside of that Docker container. And so you can attach and watch those metrics directly on the node. This is an open source project. It's hosted on GitHub in the NS1 Labs GitHub organization. We do welcome contributors. There's a lot of different ways you could contribute to the project. And we'd certainly love to see that. Please start the project. If it's interesting to you as well, it lets us know how much interest we have in the project. And let's talk about pack advisor with Prometheus. So we talked about pack advisor itself, which is directly on the node. Now we want to get a more global view than a single node. Now we wanna pull in multiple nodes into a central Prometheus database and let us visualize all of that information. So that gives us this global view and we're able to scrape the pack advisor agents as we saw from that endpoint. We're also able to use remote write, which I'll show in just a second. And once we do that, all of those metrics that we saw from all of those policies now go into the Prometheus ecosystem and we're able to use familiar tools for exploring that data and visualizing, creating dashboards and alerting. That looks something like this. So we can imagine that these pack advisor agents are installed in multiple locations, hybrid topologies, it's a sidecar agent style. It can be in containers, VM servers. We started putting this directly onto routers and switches. And again, the idea is to pull this information into Prometheus database and then set up your tools to get this broader view. There's a Docker container that's available that helps you with a push based system. So pack advisor prom write is available on the NS1 labs organization. And this is basically pack advisor that's been packaged along with Grafana agent. And so it knows how to locally query the pack advisor, do the scrape for you and then push it out to any remote write compatible interface. We do have some sample dashboards out there. And so this gives you a head start on using the actual metrics that come out of pack advisor. And that's available in the Grafana labs community site. Here's a quick little movie of what it looks like to see some of those metrics. So this is just the explorer section of Grafana. If we type DNS in, we start to see that list of metrics that pack advisor is generating. And we can quickly drill down, select what we'd like and create graphs, help us create dashboards. And then this is what our premade dashboard looks like. And this is basically all the information we saw inside of the pack advisor command line interface. But now we've pulled out to that larger view. Now we're able to select different nodes from different regions, show multiple agents in the same dashboard and so forth. Okay, I'm gonna talk quickly, finally about the orb project. And so if we have our very local node view of a pack advisor and we have the ability to pull the information from those pack advisors and get a global view, we still have some challenges. One of them is how do we organize all of these pack advisors? Now we have the ability to have policies that are spinning up and down inside of the pack advisors in real time. But how do you orchestrate that? How do you actually collect and access these agents that you've installed in different places and send them policies and collect the information? So that's what orb was created for. So this is a very new project. It is also open source and it's a platform that helps us solve these issues. It's based on IoT principles and in the system we actually consider the orb agent which includes the pack advisor as sort of a software IoT device. And the platform itself provides a user interface to be able to manage these fleets of agents. It provides a REST API to be able to do automation against them and it handles agent communication to talk to these software agents. And the whole goal here is being able to orchestrate the agents that are, sorry, the policies that are going to the agents and to be able to collect the information that's coming back for them as they spin up and down and get them syncing to your time series database. So orb itself, it's a multi-tenant system. It allows for this fleet management. It allows you to understand what agents are collecting into your system to organize them, to group them based on a tagging system. And that's what'll let you address the agents that are out there and send them policies. So it lets you organize your policies as well, be able to send them out in real time. And then it handles being able to understand which policies are running on which agents, collect the information back for you and you can send it to your time series syncs. It's meant to be used in two ways. And so there is a help chart available that will help you deploy to Kubernetes and you can run the control plane yourself. We will also plan on having a free SAS. If you don't wanna run the entire control plane yourself and you'd like to use the functionality and just install the agent on your devices, we'll have, we're planning SAS to this web address. And there is modular support for the observability agent too. So Pack Advisor is essentially a module for orb. We might even run other types of observability agents inside the orb agent in the future as well. Here's a quick diagram of what this looks like. So we've got the Pack Advisor agents coming into the orb control plane. They use the MQTT protocol, which is a known protocol that's used with IoT. And we've got a couple of microservices that are handling the control plane functionality and the user interface, which lets you organize your policies and your fleet. A key point here is that orb itself is not concerned with visualizing the actual output from the agents. It will help you organize things. It will help you send the data to your data sinks, of course, including Prometheus, but you would still be using all of your familiar tools to actually work with the data that orb is collecting for you. Quick screenshot of what the user interface looks like. This is on a heavy development. By the way, in terms of where orb project is, our goal is to have an MVP at the end of this month to for the first time have end users install the entire control plane and use the system. So again, it's a very new project. But here we can see that there's a list of agents that have logged into the control plane. We can understand when the agents are online. There's a heart beating system. There's a tagging system. We can see what policies are running. And this is also an open source project, also on GitHub at the NS1 Labs organization. And that about winds it up. Just to summarize again, this is pack advisor and orb. And the goal is to fit into your observability stacks and to help you with edge network observability. They're open source projects. And the entire goal here is dynamic orchestration of business intelligence at the edge. With that, I will wind it up. Thank you very much for your attention. You can find more information at these web addresses. My email address is here. Please contact me anytime. If any of this is interesting you, you'd like to discuss some of this further. We do have an announcement list here. If you're interested in the SAS, please join there. We have a Slack as well. And I'll be here all week. If anybody would like me to demo some of this functionality, I'm happy to do that. And of course you can contact me on the virtual platform as well. Thank you very much. All right, does anybody have any questions? Can you hear me? Yes. A little cue from Obstruz. So really cool project, especially the one that you just discussed at the Pack Advisor. How do you handle encrypted data when it's coming in? Yeah, this is one of the reasons why the packet analysis can only do so much, right? So the strategy there is to use a tap system where the applications themselves can expose the information directly. And so for example, in the case of DNS, there's a system called DNS TAP where instead of watching the packets, almost all of the major DNS providers support this DNS TAP system, which gives you almost all the same information. And this is gonna be important because DNS over HTTPS, DNS over TLS are coming out. And so you would be observing by tapping directly into the information that's streaming out of the application. Envoy TAPs is another example here where the applications themselves are beginning to expose streams of data that we could analyze. But when you talked about even EVPF, that's still a little bit of an issue. And EVPF, yeah. Is there something going on there that we can tap into? There's information from the system that's network related. This is something that we're just exploring now. So I don't imagine it's going to be using EVPF to analyze the packets themselves. But if anybody thinks that's possible, I'd love to talk with you. All right, great. Thanks. So one, if I can add on overhead on running pack advisor. Yeah. Streaming algos to expect this task. Yeah, so it's meant to be efficient, of course. It's written in C++. It's using efficient algorithms. But if we're pushing compute to the edge, then of course that compute exists at the edge. And so there is some overhead. This is one of the reasons why the policy system is important because with policies, you should be able to craft exactly the policy you'd like in terms of only collecting and analyzing the data you'd like and only collecting in the information you'd like. And so the idea is that you're able to not just stand up an agent that uses an arbitrary amount of CPU, but decide exactly what your policies should look like. And so you get to choose what CPU to use. Any other questions? All right, thank you very much. Thank you, everybody.