 Hi, my name is Rymco Eindlaren, and I work at TANUS in the Netherlands. Seven years ago, I started as a network engineer and worked on the infrastructure of the combat management system, Tacticals. In the next half hour, I'm going to talk about the new cloud-native Kubernetes-based infrastructure for future tactical systems, and I will show you a few things about how we make the new infrastructure battle-resistant. Its development is still in progress, and unfortunately, I cannot show you everything because of our domain, but it will still be an interesting talk. I will first give some context about TANUS, and then go into the infrastructure, and of course dive into the network details. Let's go! TANUS is present in 68 countries with over 81,000 employees worldwide, and is active in these interesting markets. TANUS Netherlands is active in all markets except aerospace, but mostly in defense and security. Our main focus is mainly on the naval activities. TANUS Netherlands produces roughly three kinds of products. The first is radars. Here you see one of our biggest surveillance radars, the smart L, the new NS100, and the STIR. That's a fire control radar, which can follow tracks very precisely. We also built a number of optical sensors for naval vessels like the gatekeeper and the Miador, and besides all the cool hardware, we also built software, the combat management system, tacticals. An average naval vessel has a large number of systems on board. In this example on the top you see the surveillance radar, the STIR as a fire control radar, and the navigation radar. The gatekeeper system is present as optical sensor, and the ship also has an electronic countermeasure system to hinder opponents, for example a jammer. A solar system is available to detect targets in the water as well. As weapons systems, this ship has two guns, a service to service and a service to air missile system and decoys. All these systems, they can be used independently, but that is not very effective. How do you know what to shoot or how to aim your gun? You need a surveillance radar to detect the targets first and you need a track radar to aim your gun, and that's where tacticals comes in. Tacticals is an integration system that integrates all the sensor and weapon-related systems on board. Tacticals integrates talus products of course, that are the blue system in this example, but also many systems of other suppliers, more than 55 in total. All the information that tacticals collects comes together in the combat information center, also called the CIC. This is not the bridge of the ship, but it is an armored room deeper in the ship where several consoles are installed. The operators process all the information with tools and tacticals in order to achieve the best possible situation awareness around the ship. Tacticals can be used for combat operations, that is protecting the ship, her crew and allies, in which it supports by means of engagement planning and controlling the weapon system, but it can also be used for maritime security operations. These operations are more like policing a certain area to find, for example, drug traffickers, piracy, and other illegal activities. In the next part, I'm going to zoom into the infrastructure of tacticals. The development of tacticals started in the early 90s. Since then, many upgrades have been applied, also to the infrastructure. Until recently, we were able to serve our customers, but the newest customer, for example, the German Navy with their new F-126 ships, plays higher demands on the system that are difficult or even impossible to meet with the current infrastructure. The size of the ship plays a role, but also the security requirements. In order to be able to serve the newest customers and to prepare us even better for future customers, we have started developing a new infrastructure for tacticals. That's why we started developing the new software platform called the Naval IT Infrastructure about three years ago. Here you can see a number of important drivers for this new infrastructure. The most important driver is the design for change. Since the lifetime of ships and also our systems is 20 to 30 years, a very long time in which a lot of technologies will come and go. We want to be able to adapt new technologies faster and sooner compared to the current infrastructure. The security demands are rising higher than we can offer with the current infrastructure, and the current infrastructure is also tightly coupled to the hardware. We also want to move towards a microservices architecture with lots of opportunities to improve our software development process, faster software deployment, and easier to apply in-service updates. Also a very important driver to move towards the microservices architecture is the scalability. With microservices, a cloud-native platform quickly comes into play. There are many public cloud services available, but they all depend on the internet, and that is precisely something that is not present onboard the naval ship, at least not in the combat management system. Talus has therefore chosen to develop their own private cloud based on reddit Enterprise Linux 8 and Kubernetes. In addition to the air-capped environment, a private cloud offers the highest level of control and security because the command management system works with data of the highest security classifications. Now just roll out the Kubernetes cluster and you're done, right? No, unfortunately not. Because the command management system is a mission-critical system on board of a naval ship, even a vital system to the crew, we have to deal with a lot of challenges compared to the standard data centers and internet-based applications. Besides the lack of internet, a naval ship is also a rather annoying environment for hardware. Equipment regularly breaks down due to vibrations and the infrastructure has to be able to cope with this. As I said before, the systems we make have the extremely long lifespan. In this day and age we need to keep the system up to date and able to protect the ship again the latest threats in the coming 30 years. Tacticals is a complex software system that we cannot rewrite to microservices overnight. That is why we also need to support legacy applications as well as integrating legacy software and hardware from suppliers. Limes depend on the tactical performance so that's why performance is also very important. The system has to be very robust and able to deal with errors and failures of hardware due to, for example, an explosion or a fire on board. This ensures high redundancy and failover requirements. The system has to be formed at all times and errors need to be resolved as quickly but also as easy as possible since there is no dedicated talus support team present on board. Security of the system is also very important and intrusion can have major consequences and secret data cannot be leaked. And last but not least is the support for multicast for control and video data. Tacticals relies heavily on multicast so providing multicast to applications in the community cluster is a nice challenge. We will dive into that later. The nitty platform makes it possible to deploy applications and it also provides multiple services that applications can use. Here you see a number of these services with the usual open source suspects. The network service is a combination of multiple components and it includes connectivity between applications within the cluster but also with end ports outside the cluster. It provides redundancy so that applications do not have to worry about interface switch or cabinet failures. Let's dive into that. In this section I will describe a number of use cases that make the network of tacticals an interesting one. The first is open supplies. This is a distributed database in which applications exchange mission critical data such as tracks and engagement data. Multicast is used for the synchronization of all the nodes. Containers running in the cluster must therefore be able to receive multicast. Quoting a service and fill of redundancy is very important to ensure that all the mission critical data arrives as quickly and with as less as interruptions as possible. The next one. A ship always moves on the waves and that can be very annoying when you want to hit the target with your gun or see a target in the sky with your radar. To solve this problem gyroscopes are used to measure the movement of the ship. This data is converted by a data interface unit into ether packets and distributed into the tacticals network at a very high frequency. This also applies to the current position of the ship and the time usually acquired via GPS. It is very important for the position of the weapon system that all the systems involved have a perfect time alignment. A gap in the own ships data during a failure in the system can have major consequences such as radar systems that stop working or aborted engagements. That's why we send the own ships data twice into the network via separate independent paths an A side and a B side to reduce the chance of interruptions. This data is also multicast and it goes without saying that quality of service is also very important. It has already become clear multicast is a widely used technique within tacticals. A large part of the multicast traffic is video coming from cameras but also from radar systems. Analog video sources are digitized by means of a video interface unit. Unlike online streaming services like Netflix or YouTube we are dealing with real-time video. Real-time video is needed to steer a gun with a joystick for example. This is only possible when the video has a very little delay. To reduce the latency real-time video is not encoded. It results in high-bounded use. Quality of service is used to ensure that the video data does not suppress the mission critical data. Now we are going to dive into the network details. First we take a look at the fiscal and logical networks and then I will show you our multicast solution supported by a demo. What is different in the new cloud native infrastructure are the virtual networks between the cluster nodes. In this picture the console is also depicted as part of the cluster because workloads can also be scheduled on it but in the meantime it is also used as a front-end node in the CIC. All subsystems are connected to the physical network with or without the data or video interface unit. Only the data interface units are part of the cluster. The most important requirement of the infrastructure is availability. Assure that the effect of any failure in the system has the smallest possible effect on the applications. Tactical fiscal network lives in several cabinets on board the ship. Two cabinets are shown here but on board there are often more than two spread over different locations on the ship to prevent failure of the entire system. Each cabinet contains two switches at least connected by 40 gigabit backbones. Servers are mounted in the cabinets and can be connected in two different ways to both switches in the cabinet or two different cabinets. This depends on what level of redundancy is desired. Consoles are connected to different cabinets with at least one pair of cables for redundancy and this also applies to subsystems that are connected via a data interface unit. Ethernet-based subsystems are connected directly again with at least two cables for redundancy. When a subsystem only has one cable we place a dual homing switch as close as possible to the subsystem in order to connect the subsystem with two cables again. The video interface units are also double connected and other less important subsystems such as cameras or network printers are connected with only one cable. On top of the physical network we use multiple logical networks. Within the cluster we have a management network for communities and a redundant data network for all cluster internal communication. Outside the cluster we also have multiple management networks and several dedicated networks. Each subsystem gets its own network for example. We also use application-specific networks for example for the distributed database open supplies and for multicast traffic. We use dedicated networks for multicast traffic because multicast routing can introduce long unwanted failover times which have a direct impact on the functionality of the system and do not suit our strict redundancy requirements. By using dedicated networks for multicast we can instant these networks directly to the endpoints which ensures that we can afford the multicast routing. Let's focus on the physical network of the cluster nodes. Each node has a management interface and at least one pair of data interfaces. On top of those two physical data interfaces we can configure a bond interface on which we can configure the cluster data network feed-in interface. Depending on the redundancy implementation of the subsystem we can also configure a feed-in interface for the dedicated network of a subsystem. Another solution is to configure a unique feed-in interface on both physical interfaces in addition to the bond interface if the subsystem uses two separate independent parts. To make it a bit more tangible I briefly show how it works in a live system. For the context of this demo the cluster consists of nine nodes each with a management interface and two data interfaces. I log in via the control node on the left where we can operate the cluster via kubectl via the management interfaces. In this demo we have two dedicated network presence at each worker node subsystem 1 and subsystem 2. Okay let's switch to the demo. Okay here you see my terminal so we're going to ssh to the control node and then we are executing kubectl get nodes to see which nodes are in the cluster nine in total. Node 1 is the master. Now we're going to ssh to node 2 and see the network interfaces by ip link. Here we see a bond interface the e and b 3 s0 f0 is the first interface of the bond and f1 is the second interface of the bond and we have our subsystem 1 feed-in interface on top of the bond interface and subsystem 2 network. We go a little bit deeper into the subsystem 1 interface and there we can see that it's on top of the bond it's a fieland interface with id 102 and subsystem 2 has the fieland id 103. Okay then we go back to the control node and ssh to node 4 which has a different configuration. Here I execute ip link there we see a subsystem 1 interface not on top of the bond but on the e and b 3 s0 f0 interface so beside the bond interface and subsystem 2 is on the other so there are two separate parts. Now we have seen the interfaces on the nodes let's focus on the container interfaces. Of course a container with at least one network interface must be connected to the cluster data network. For this we use the container network plug-in calico the choice for calico is mainly driven by the performance and security features and the bgp integration option. In addition I want to do some research on cdm zoom the ebpf performance security and observability through the Hubble project could be an interesting as a native c9 plug-in for our platform. To connect a container to a dedicated multicast network for example we need additional interfaces in the container. After researching known container network plugins we concluded that no existing c9 plug-in supports multicast natively meaning that multicast is often treated as broadcast within the virtual network and that causes problems in our system due to the high amount of multicast traffic. The solution we have chosen is the cni plug-in multis to add an additional interface to the container and to connect it with the cni plug-in McPheelen to an interface. During the deployment the name of the additional interface inside the container in this example the ifx can be changed to any desirable name. This static name can be used in the container or sorry in the configuration of the application running inside the container. The direct link to the interface by means of McPheelen assures a high performance and with the solution several additional interfaces can be added if needed. I've got to mention that we use McPheelen in bridge mode. There are a number of modes available but in our case more than one container connected to the same additional network could run on the same node. Only the bridge mode of McPheelen supports forwarding between local endpoints without generating load on the physical network. McPheelen in bridge mode can be seen as a layer 2 switch which means that multicast is handled as broadcast within the node. So all the containers connected to the same additional network on the same node will receive multicast traffic even when they didn't join the multicast group. We consider it not as a problem since the traffic does not leave the node and does not generate load on the physical network but just be aware of this. An application can only use an interface if it has an IP address. This picture shows two containers on two different nodes with each an additional interface in the same dedicated network. The IP addresses of the additional interfaces must be unique. The host local IPen plugin is not suitable for this because it only provides unique IP addresses locally on the node. To ensure unique IP addresses across the cluster we use the cluster-wide IPAM CNI whereabouts which offers a kind of cluster-wide DHCP functionality. This works very well but we run into two problems. A static IP address could not be assigned to an interface and we need this to run legacy applications in a container. We looked into the CNI plugin Qube OVN which had that functionality for assigning static IP addresses but it was not suitable as a CNI plugin for our platform. Another thing we run into is that we can add routes via a gateway specified by an IP address but not via an interface for example a via interface EDH0. This is needed to send multicast traffic out via the correct interface for legacy applications that cannot bound to a specific interface. The only interface to the network service for the users of our platform are the network attachment definition files. In these files the configuration of the Miltus and McVillan interface is described which interface to bind to as well as the IPAM configuration. The application developers only need to choose the correct network attachment definition, rename the additional interface in the container if needed, and configure the static IP address when possible and desired. The network attachment definitions for existing networks are automatically generated and deployed by Ansible. Next I'm going to show you a demo of Miltus, McVillan and Rehobards. Okay we are back at the control node. On the control node I have a number of files. The first one we are going to look at is the network attachment definition for subsystem 1. Here you see the file. It is a network attachment definition with the name subsystem 1. Type McVillan. It binds on interface subsystem 1. It's in bridge mode in the IPAM we're using Rehobards. This is the range and the range start. The same we're going to do for the subsystem 2 network attachment definition. This one binds on subsystem 2 and has a different range of course. Okay we're going to apply those two files with kubectl apply. First subsystem 1 that's done. Now subsystem 2. Now let's have a look if it is applied successfully by kubectl get network attachment definitions. Here we see subsystem 1 and subsystem 2 are successfully deployed. Okay then we have to deploy some pods. I've created also a Yaman file for that. It's a demon set so every node gets one pod and this file describes the pods with a network annotation. In this case network annotation it uses the subsystem 1 network attachment definition file and sub 1 will be the interface name inside the container. Okay so let's deploy. Again with kubectl apply minus f and then the Yaman file and then we execute kubectl get pods to see which pods are running on the cluster. And here we see a list of all the pods. One on every node and we're going to look at the first one running on node 2. We're going to use kubectl exec minus ti to get into the pod. Here we are inside the pod taking a look at the interfaces. You see here the sub 1 interface is added besides the eth0. IP address of the interface is 10.1.2.23. Okay so we open a new tab ssh to the control node. Check which nodes are running and then we are going to get into another pod on another node on node 4. So again kubectl exec minus it of ti in that new pod on node 4. Let's have a look. It also has sub 1 interface and other address of course .27. Let's see if we can ping the first pod and that's working so that's great. Now I'm switching back to the first pod. I'm starting a tool called traffic tool that's a test tool and we're going to transmit on interface sub 1 multicast packets towards 239.1.1.2 that's the multicast group and on the second pod I'm also starting the same tool but then in the receiving mode on the same sub 1 interface so the same network with the same address. Let's see if that's working. Yes received messages from the IP address of the first pod so that's great. So we're stopping this one and I want to show you another example that's the dsnetools subsystem 1.2. There's also a demon set so again all nodes get one of these pods. It has also an annotation but this time too the first one subsystem 1 with interface name 1 but also a second one subsystem 2 with the name sub 2. Deploy it again with kubectl apply and let's have a look if it's running with kubectl get pods then we are going to take a look at one of the new containers on node 3 for example kubectl xacti inside the pod and then show you guys the interfaces. This case 2 so we have sub 1 and sub 2 so two additional interfaces with two IP addresses of course and now let's see if it can also receive the same multicast that is transmitted from the first pod so also interface sub 1 and it's working we are receiving so that's great that's working okay now opening a new tab again to show you the layer 2 behavior of McVillan in bridge mode so we are going to select a container on the node which doesn't run any transmitters or receivers of our multicast group we're entering a pod right now it's kubectl xacti so here i'm showing with ipm other that we have no multicast joined on the 239.1.1.2 multicast group and we are sniffing with tcb dump on the sub 1 interface and we do not receive any multicast traffic so it's not broadcasted inside the whole cluster now we are moving to a container which is not transmitting or receiving our multicast group but runs on a node in this case node 3 on which you're receiving container runs both containers are connected to the interfaces via McVillan in bridge mode so the multicast traffic is broadcasted in a node so again no multicast join on our test multicast group sniffing again with tcb dump and that results in packages that we see coming in so this is clearly clearly showing you that it is broadcasted within the node itself the same we are going to do for a container which again doesn't transmit or receive our multicast group that runs on a node in this case node 2 on which a transmitting container runs and we are seeing the exact same behavior again no join of the test multicast group but tcb dump shows us that there are packages coming in so be aware of that so to conclude this story as of today no c9 plugin supports multicast natively only handling it as a broadcast the implemented efficient multicast handling and affording multicast routing fillover by using dedicated networks which are connected to containers in the cluster with multus McVillan and the whereabouts ipan plugin also be aware of the fact that McVillan in bridge mode behaves like a layer 2 switch which means that multicast traffic inside the node will be handled as broadcast but doesn't leave the node and won't generate load on the physical network i hope this gives you some insights into the challenges of a better-resistant infrastructure and knowledge how to deal with multicast traffic in your own Kubernetes cluster i would like to thank you very much for your time and your attention i hope this talk was very informative for you questions can be asked via the chat or you can reach out to me via LinkedIn and i hope to speak again soon in the future about our challenges and interesting implementations thanks again and enjoy the conference