 So, hello everyone, I am Sylvain Bobot and this is Sylvain Hachin and today we're going to talk about Skydive, which is an analyzer for network topology and traffic. And so, we created Skydive three years ago and at the time we were dealing with SDNs, many kinds of SDNs and those are complicated beasts. They're implementing the network in very different ways and when you have to develop or just use them, it's pretty hard to understand what they really do. So, we had a need, the first need we had was to be able to see what SDN was doing. So, in terms of topology exploration and just to be able to visualize what it was doing. And this, we also, one of the primary use cases was to be able to troubleshoot it even on production machines. So, we had to have a way to capture network traffic and to analyze it, to see what's going on, why packets were dropped, many different issues. As I said, we were dealing with different SDNs, so we didn't want to have a solution that is tied to one particular SDN. So, we wanted it to be agnostic, whether it's an open stack or currencies or you name it. We had to be able to do this in real time. So, as I said, maybe sometimes in production machines, but also after the problem happened, we wanted to have a solution that would allow us to analyze an issue in the past. Sometimes you have an issue with VM, then you destroy it, and then you have no way to see what was wrong at the time. We wanted a lightweight solution, like everyone, and we wanted to have something easy to deploy, because on production machines, you don't want to install a lot of stuff. So, we came up with this architecture. At the bottom, you have the agents, which are skydive agents. They need to run as close as possible to the physical machine, so typically they would run on your compute nodes of your cloud or the Kubernetes nodes. So, those agents collect the topology locally and then forward everything to another kind of component, which is the analyzer, which can be located out of the infrastructure. Those analyzers are replicated together, so we have high availability. We also support load balancing, because the agents are connected to multiple analyzers. You can see here that the data model that we used to store this topology is a graph. This graph is even-based, so we try to not use polling. When we get information from the different subsystems, we try to subscribe and to get events from them. It's a PubSub mechanism, so you can write your own clients and just publish to this graph, or you can even subscribe to this, so you get the events on this graph, so what happened? You can have those events through web sockets, so the web UI that we have is using this mechanism, and then you get all the events, so the node creation, the updates, the delete, and everything. For every modification on this graph, we keep the revisions of every graph modification, so we are able to rebuild the graph as it was at a certain point in time. Where do we get all this topology information? Well, from everywhere we can. We use Netlink to get information from the network interfaces, to get the metrics, the routing tables, the FDB. We also have, I won't go through all of those, but the main ones are OpenVswitch, and many SDNs are using OpenVswitch, so you have out-of-the-box if you have support for many SDNs. We get also information from physical machines through LLDP. We have probes for LibVerts to get the information from our VMs. We have a Kubernetes probe. We also have Docker integrations, so you can see all your Docker containers. We also have completely unrelated network probes, which one is the block device, so you can get information about the disk on your systems and stuff. That's pretty much all we have for the topology. We had a demo, but for some reason we are not able to demonstrate it. So that was the topology part, and now Sylvain is going to talk about the network flow part. Okay, yeah, perfect. So now we have a way to build the view of the topology of the network topology. So every interface, bridge, and router and stuff like this are mapped to a topology view, like a graph, and what we wanted to bring is to bring the ability to start packet capture for troubleshooting, monitoring, or stuff like that. So we implemented on top of the topology view a way to do distributed packet capture. So with a simple API call, you can express that you want to start packet capture on different interfaces in the topology. We do support multiple methods for that, and depending on the use cases you want to address, some of them bring a lot of matrix and capabilities like TCP, defragmentation, and stuff like this. Some of them are more efficient for, yeah, sorry, more efficient for every number of packets. We can write, when we start to capture, you can express that you want only a few packets, like playing with the BPF filtering mechanism. And we do support L2 and L3 flow tracking. Tunnels are supported too, meaning that then we do support GRE, VXLAN, MPLS, GiniV, and maybe some others. The goal of this is when you have an overlay and you want to track the traffic of a container or a VM, and even within a tunnel, you can do this with Skydive. What we do, in fact, we generate, when we see that a packet is going to be embedded in a tunnel, we speed the packet in two parts and we generate two flows within the system. We don't keep the packets themselves, in fact. That's possible, but that's not the main goal of Skydive. What we do, we take the packets and we generate a flow table. I can show you maybe the, yeah, here. So here you have the probe at the bottom, but that's topology probe or flow probe. And what we do, we generate a flow table within the agent and then time to time the packet are forwarded to the flows, generated, forwarded to the analyzer and then to the data store. So that's basically how we do this. Finally, the size of the data within the agent is pretty low regarding the original traffic size. So now we have a way to generate packets, to generate flow, and we want to export them. So with Skydive, you have multiple ways to consume the packets, the flows generated or the packets generated. One of them is to start a capture. So you're right, your expression matching multiple interfaces, for example. And then you say, I want to have the packet going outside of Skydive, Skydive scope. So we do support as flow, net flow and ear span. And you can do this for all the interfaces that you see on the topology. So meaning if you want to export the flow that you capture on an interface within a container, that's possible. The flows and the metric are stored in a time series database. So if you want to do calculation on that, you can. We do have a way to subscribe to the flow bus. So meaning when the agent forwarded the flows to the analyzer, you can write a subscriber to the analyzer and then you will get the flows. And that's useful if you want to write your own flow process. We do have an example where the flow are captured and then converted to a VPC log format. And when you want to recreate the flows, you use the exact same language that you use for the topology. That's just to retrieve the traffic for the HTTPS traffic for a Docker container. And now we have a way to inject packets. We may want to generate packets in order to generate specific truss or specific troubleshooting in order to get specific metrics. So we do support ICMP, TCP, UDP. There is also a way to inject already generated pick up truss, for example. That's you can write long running packet injection. So you can say I want to have 10 packets or 100 packets and you can define the rate, for example. That's useful if you want to generate a ping mesh, for example. So that's an expression, an API call that you can do and it will generate a ping between all the VATH. And then you will get the RTT reported and you can grab them. You can grab the information if you want. And there is also a way to not give a file but to have a socket where you can inject pick up traffic. And then it will be injected within the Skydive infrastructure. So it was an overview of the topology information retrieved and the flows. And on top of that we do have more. So there is a way to do alerting. So you write an expression matching the flows or topology information. And then you can trigger something like a web book or call or an execution of a script. Something like this. So we do have a way to write a workflow like I want to start a capture, then I want to check some metrics. Finally I want to stop the capture. I want to trigger something with this. If you want to play with it, we do have a Golan Python client. We do have a non-sebal module. So if you want to deploy an infrastructure and do something on top of that, injecting some information within Skydive, you can use the non-sebal module. There is a way to add plugins. So we have a collective plugin. So you can get all the metrics reported by collective and inject them to Skydive as metadata. And in order to consume the metrics, we do have a graph and a data source. So basically you use the exact same query language and you will have, for example, the RTT reported or more. Thank you. I invite you to look at the website, especially the last line, which is the new version of the web UI. And I think that's it. Thanks. If you have questions. So yeah, I didn't say that. I'm going to repeat the question. So the question was we are storing the information within the time series database and which one. So that's optional. It's not mandatory to store anything. By default, you can start Skydive without any database and you will have real time but not the post-mortem stuff. And for the time series, it's not a real time series database. We are storing that in the form of a time series and we are using Elasticsearch. And there is another backend which is OrienDB, but I would advise to use Elasticsearch. Another question maybe? Okay. Thank you. Thank you.