 Okay. Next we have Simone Minority who will be talking about augmented network visibility with the help of sub minute resolution metrics. Thank you very much or give a man. Hello. Thanks for being here today. And yes, in this presentation, we are going to see actually the benefits of having high resolution metrics in the field of network visibility. And then in the second part of this presentation, we are going to try and use open source software to build an actual network visibility solution. And we will see an OpenG for the monitoring, Grafana for the visualization, and MfluxDB for the store of data. But let's start with network visibility. Can you hear me okay? Yeah. So, well, let's start with network visibility. In general, network visibility is provided by means of metrics. These metrics can be simple metrics such as the bytes and packets transmitted or received by a given host or by a network. There can be also other metrics that include layer 7 application protocols. For example, the amount of a YouTube traffic done by this host or by this network. But there can also be more complex metrics such as moving averages, exponentially weighted averages, this kind of metrics. But the point is that metrics are sampled. This means that there is no way to have a 100% reliable, accurate representation of the real metric because we only have samples of the metric. This means that at fixed interval of time, we pick the value of the metric. So basically, there is an amount of time which is basically equal to the sampling interval that is completely unknown. This means that let's take this chart, for example. We have a metric and its evolution in time. And we have two samples, S1 and S2. Between S1 and S2, nothing is known. Nothing can be said. We don't have an idea of the behavior of the metric between S1 and S2. So it could have followed the green path, the yellow or the blue path. But in practice, actually, we don't know. We have only S1 and S2. So it's pretty clear now that the shorter the sampling interval, the closer we can go to the actual metric, the closer we can go to the real metric. So it's beneficial to be able to reduce these sampling intervals. Let me show you this with an example. In this example, I'm going to transfer 10 gigabytes of traffic, two times. The first time, I'm going to transfer this 10 gigs over a free link. And the second time, I'm going to transfer the same 10 gigs over a fully utilized link. And I would like to show you what happens if we use five-minute samples on one hand and 10-second samples on the other hand. To do this experiment and to make it reproducible, I'm going to use IPERF to send the data and N-top and G to actually do the monitoring. And we are going to see N-top and G in detail later in this presentation. So let's start with the five-minute samples. The first bell-shaped curve that we see in this chart is the metric of traffic captured during the 10 gigs transfer with a free link. The second bell-shaped curve is the same metric, the same traffic metric captured during the 10 gigs file transfer over the fully utilized, over the congested link. Do you see any difference? No. The two bell-shaped curves are the same. So it's completely undetectable the fact that the link was congested if we use five-minute samples. Let's see now what happens if we go from five minutes to 10-second samples. Well, as we can see from this chart, this is a different story. The first file transfer, it's a beautiful straight transfer. As we see, the metric almost immediately reaches the line rate of one gigabit per second. And it does a steady transfer and it completes in a very short interval of time. If we look at the transfer over the congested link, that's really different because we see that the throughput is not even one-half of the throughput done over the free link. And there's also a sawtooth behavior that we can see in this chart. So you see, going from five-minute samples down to 10-second samples actually unveils patterns that are interesting and that are completely unnoticed if we use five-minute samples. Again, these two charts, if you look at the time scale at the bottom, these two charts are two instances of a monitoring tool that are running and monitoring the same traffic, but one is taking samples at five minutes and the other is taking samples at 10 seconds. So why we should care about this? Well, there are many reasons. Let's see, we should care about having an eye resolution, for example. To have a more accurate representation of the throughput that is received by our applications that use the network because there are certain applications that are sensitive to the throughput. For example, VoIP calls or video streaming, this kind of application have certain requirements in terms of minimum throughput. And if we can't guarantee this kind of throughput, then the user experience can worsen, the application performance can worsen, so being able to monitor the throughput in an accurate way can help you in improving or even providing a good service for your applications. Similarly, another thing that you can do is to look at bars, bars that can go unnoticed if you use five minutes and one minute samples. Because bars are not really good for the networks, because bars tend to feel the cues of your network elements, so they tend to introduce jitters or increase the delays of the transmissions. So bars are not really good in the networks and being able to spot them can help you fix certain kinds of issues. Okay, so now that we have seen the motivation and how useful it can be to have eye resolution metrics, let's see how we can build a solution using open-solution software. What do we need to build this solution? We need three pieces of software. We need a software that can do the monitoring and that can see the packets, so it can generate high-resolution metrics. Then we need a data store that is able to ingest high-resolution metrics and provide timely response to the queries. If we use high-resolution metrics, we end up generating hundreds of millions of points per day in real-world networks and corporate networks. So we need a really good data store that can on the one hand store the metrics and on the other hand provide you response to your queries in a timely way. At least, but not last, we need a visualization tool that can show us the relevant behavior that can allow us to do the drill down and to detect or to analyze and alert on our traffic. So the solution we have proposed is composed of three open-source software n-top-ng for the monitoring, influxdb for the store of data, and grafana for the visualization. I'm not going to go into the details of influxdb or grafana. I mean, we have had Carl write before me. So what I would like to focus now is n-top-ng, the monitoring tool because I am one of the developers of n-top-ng and I would like to tell you how we brought n-top-ng from five minutes samples down to 10-second samples for the generation of 10-second samples. So n-top-ng is open-source. You can fork it on GitHub. You can download and try it. It's just an open-source software for the monitoring of your network. And we have 3,300 stars now on GitHub and counting, so feel free to download it and test it. So the architecture of n-top-ng is multi-threaded. This means that basically we have one thread that is reserved for the capture of packets. And then we have other threads that run in parallel and are reserved for the generation of samples. So in our recent activity, in our recent work, we extend those parallel threads for the metric generation and we change them so that they can provide metrics up to a resolution of 10 seconds. So we were using RLDs. RLDs are another store for metrics based on plain text files, so based on files that is basically used if you want sample in the order of minutes basically, otherwise it will be too slow. So we extended n-top-ng to produce 10-second samples and to push those samples to the influx DB database. So how easy is to set up and configure n-top-ng to generate 10-second samples of your traffic using influx DB? Well, it's pretty easy because you just visit the configuration. I'm going to show you a quick demonstration in a while, it's pretty easy, you just visit the preferences of n-top-ng, you set up the URL of influx DB, a database name, and you are done. n-top-ng will start pushing data into influx DB. And on the other hand, if you want to build your dashboards using Grafana, for example, what you have to do is only add a data source to Grafana, an influx DB data source to actually connect to the same database used by n-top-ng. Influx DB hosts database name and you are done. With Grafana, you can start playing, you can start doing experiment in creating dashboards, for example. And as we can see in these two charts, I have recreated the same dashboard of the file transfers and recreated the same dashboard using Grafana. And as we can see, obviously there is a 100% match between the two charts because Grafana and n-top-ng are both pulling data from influx DB for the visualization. The other interesting thing that you can do with Grafana is the alerting. Alerting that now you can bring down to a 10 seconds resolution. This means that you can evaluate condition every 10 seconds. So you can set an easy threshold. If you want to monitor a web server, you can set a threshold to be timely alerted if the traffic of your web server goes above a certain threshold or goes below a certain threshold. The point here, the interesting thing here, is that you can create alerts with an evaluation frequency of 10 seconds. So you can receive a notification. You can receive an email very, very quickly when something happens. Let me show you now a quick demo. I want to show you how easy it is to set up this kind of 10 second monitoring using the tools. So here I'm going to start n-top-ng, which is the first tool for the monitoring of data. Do not look at the other options that are not really relevant for this demo. The interesting thing here is that we are going to run n-top-ng using interface EN01. I'm going to start it. So now it has started monitoring the traffic on this laptop. So now I'm basically monitoring the network traffic that my laptop here is doing from and to the internet. Yeah, because I need to start. Now I can also start influxdb. Same thing. So now, yes, I have started influxdb. So n-top-ng is generating 10 second samples and posting them to the influxdb running on the same host. Third, I also run Grafana because I would like to see the dashboard in action. Okay. So let's see what happened. I can point my browser to localhost 3000. I have created a dashboard here. Here we are. So this is a dashboard that I have created using the Grafana and connecting to the influxdb that is storing our 10 second samples of the traffic. I created a dashboard that is showing me the throughput of the network interface and the traffic of certain relevant application protocols, including DNS, Dropbox, and Google. So this is the traffic. What I can do now is to create something more interesting. Maybe I can set the time interval of now minus five minutes. And I can create, can set refresh of five seconds. So this dashboard is going to refresh every five seconds. And let's see if we can generate some interesting traffic. Let me do... I'm going to download an ISO of Ubuntu. And if I go back to the dashboard, we see that the throughput has increased to almost 60 megabits per second. And you also have the throughput in bytes and packets shown right into a Grafana dashboard. You can use Grafana. I mean, Grafana is a very useful way to create your own dashboards and you can pull also metrics from other data sources. So you're not limited to the use of N-top-ng because maybe you have another data source that is monitoring the number of requests and response of your web application. So you can combine in the same dashboard the chart of the interface throughput with the chart of your HTTP requests, for example, to detect certain kind of anomalies or to detect peaks in the HTTP or in the response errors of your application. So, yes, it's pretty... It's almost everything what I wanted to show you now. So let me wrap up. So we have seen that having high-resolution metric can unveil certain patterns that go... that are unknown or unnoticeable if we use five-minute samples or if we use lower-resolution samples. If you want to build your own monitoring solution, you can do that using open-source software N-top-ng, N-flux DB, and Grafana. And least but not last, we are growing and we are hiring. So if you are interested, if you like networks, if you love packets, let's contact us. I mean, we will be around the whole day. So contact us. You can work from remote. We are based in Italy, but you can work from remote if you want. So contact us and we can start doing something together. That is my email. Thanks for your time. If you have questions, I'm here. So please feel free to... Yes. Okay, let's raise the mic, okay? Hi. So on the demo, you showed applications in the graph, like Dropbox and Google. Yeah. How do you distinguish that and do you have to create a dashboard to filter for that specific application or can it be dynamically populated? Yeah. So N-top-ng has an engine which is called deep packet inspection. It performs a deep packet inspection of the real traffic packets to detect the actual protocol that is flowing in that particular time. So let me show you this. I can edit this stuff. And as you see, I do an influx DB query on the deep packet inspection. I face deep PI and I can change the protocol and here I have all my protocols that have been dissected by N-top-ng. So I can pick another one. I can pick HTTP. I can pick Google, GitHub, or whatever you want, Instagram. So you create the dashboards of the Layer 7 application protocols that are traversing your network. So those are application protocols. So this means that it's the Layer 7 payload. So it's not a Layer 4 protocol, such as TCP or UDP. It's the actual application carried in the Layer 7. I question the interval. The N-top agent pushes the 10-second interval measurements to the influx DB. Is it still five minutes or is it also 10 seconds? No, you can choose that. You can choose that. Okay. Not only the measurement interval, but also the push interval. Because it collects every 10 seconds, the interface statistics, and it pushes the statistics to the database also every 10 seconds or every five minutes. The push is basically every 10 seconds, also the push to the database. You can control the flash, but the point is that statistics are every 10 seconds. So you may see them with a little bit of delay in the database, but in the database they are 10 seconds. Okay. So question. How are you doing down sampling of data in influx DB? If you are doing down sampling? Yeah, we are doing down sampling. We are using continuous queries. This is one of the latest developments. But then you save them to some other data source time metric in influx DB, and then you have to select another data source to see the historical metrics or what? Yeah, that's true. Hi Simone. Hi. So obviously what you are saying is 10 seconds is better than five minutes, I assume then... It's not necessarily better. It could be useful depending on what... You get more resolution, for example, and then maybe one second is more resolution than 10 seconds. Yeah. What are the main sticking points or problems going lower and lower resolution? The lower you can go, the more close to the real metrics you can get. The point is, do you really need to go so close to the real metric? The answer is it depends. It depends on what you are looking. There is a phenomenon that is called microbars. So microbars because they happen at microseconds during microseconds events. They can create troubles in the network as well. You could be interested in inspecting microbars. If you need to inspect microbars, yes, you need to go to the microsecond resolution. So you need harder time-stampings, you need dedicated network cards, you can do that. So it depends. It depends. The answer is it depends. So the higher the resolution, the more you can see. This is for sure. It depends. You can be happy also with five minutes. I don't know. It depends. This is plain Ethernet sniffing, let's say? Yeah. This is for overlay networks like VXLAN and stuff like that. Like what? VXLAN. Yes. I mean, provided that you can access the traffic, you can do this kind of monitoring. To do these 10-second samples, you need the traffic. So if you have an overlay network, say if you have VXLAN, you can do that, provided that you can find a way to feed and OpenG with the traffic that you want to monitor. Hello. Hello. So you are not working with RRD files anymore, right? Say it again. You are not working with RRD files anymore, right? No. You can choose that. It's an option. You can choose if you want to use RRDs or if you want to use InflexDB. Okay. Okay. About the alerts. Do you send them to OpenG from Grafana? Grafana has another engine built-in. And OpenG has another other thing built-in. You can do email, Slack notification, this kind of alert. So you can create them in OpenG or in Grafana. Yeah. It's up to you. Yeah, it's up to you. And as OpenG is an open-source solution, I don't know if you have this information, but do you know how much host is the enterprises monitoring with OpenG? How does it scale? Yeah. It scales up to tens of thousand hosts inside the enterprise. It can monitor many more hosts, but if you have an enterprise, you are interested in having more information for the host inside the network. So OpenG divides the host between your enterprise network and the rest of the world. So this is done for efficiency, basically, to save memory and for the efficiency of the software. So it can scale up, let's say, to 40,000, 50,000 enterprise hosts and 100,000 hosts outside your network, your enterprise network. But you use different instances of OpenG? No, a single instance. One for five, 50,000 hosts. Yeah, single instance. And then if you want to monitor more, you need to set up another one? Yes, yes. You can set up parallel instances. Yeah, but they don't talk with each other, right? They won't talk with each other. Okay. Thank you. Just a note. I know you guys are not, and girls are not aware of this, but if you keep talking in the back in front, this is relatively loud. Of course, this room has really good or bad acoustics. We can debate this. But please keep it quiet. Thank you. A question related to a previous one. If you have experimented with use cases where the resolution is very fine grained, like in milliseconds or something, and if yes, which are the most critical aspects to be considered in this kind of workflow and architecture, in these specific kind of use cases, because the data volume could grow considerably. Yeah. Actually, I didn't work on milliseconds scales. So I didn't even try to push data into InfluxDB so I don't have an answer. We have worked on microbars, but not using these solutions. So I don't have an exact answer for this. Thanks. Hi. Thanks for the talk. I'm just wondering if you're doing a lot of high-frequency sampling. How does that scale in terms of resources? What way is it managing RAM or CPU usage? No, say it again. I'm just wondering if you're doing a lot of high-frequency sampling. What way does it scale in terms of resources? How are you handling likes of RAM or CPU usage if you're doing a lot of sampling? Yes. So the point is that N-top-N-G is real time. So it sees all the packets. The fact that we are generating samples at 10 seconds, it's not really pushing more pressure N-top-N-G-wise. Because it is already processing and inspecting every single packet. So the point is just running another thread that every 10 seconds crunches some statistics, which is not really what impacts the performance of the software of N-top-N-G. Influx B-wise, I can really tell you the impact. I didn't do certain performance or load measurement if we go from 5 minutes to 10 seconds. So influx B-wise, I can really tell you. N-top-N-G-wise, I can tell you that this is not what is putting the highest pressure on the software because N-top-N-G per se is real time. So it already sees every single packet and it inspects every single packet that is monitored. Any other question? OK, so thanks for your time.