 OK, so, welcome on our session, our last session, last day, so together with my colleague we prepared presentation about Kubernetes, SDN, performance and architecture so, the goal of this presentation is to take through our findings and our ideas and a tady bylo všechny měl výstřední měl, které byli výstřední a výstředné. Ano, bylo to výstředné, a bylo to výstředné, a bylo to výstředné, a bylo to výstředné, Márek můžete přijít kalykou v dětáře, opuníkou kontrénu v dětáře a tady bychom způjít děláme to dvě soličního. Co je to zvukovat? A co je to zvukovat, když se vždyží? Což o měm je Jakub Pavlík a jsem formální CTO of TCP cloud now, director of product engineering at Mirantes. And Marek is software engineer, also comes from TCP cloud and we work together for more than two years on different cases and different workloads. So networking in Kubernetes, first thing is like when containers appeared, we successfully ignored them for a long time because it was like playing ground for the developers and we didn't want to take it serious. And the reason was the networking and the problems with like port mapping and how to build a production because build a container, run the container on the laptop is easy. You can launch it, you can launch multiple containers, but how do you want to orchestrate and run some production, high availability, disaster recovery and of course the networking. And we started to take containers serious when we discovered the Kubernetes as approach which finally take the containers and replace the way where you have to map each port for the application and remember how you are exposing ports. Because you are launching ports, you don't care what's the IP and each port has IP and you are just exposing your service endpoint so you don't care anymore what's the network there, how it's connected, create the network. We just launch the port and expose the port as a service endpoint and you have balancing so it's really cool solution what we took a look and we started working on it. And there are different approaches for different use case like overlay versus non-overlay discussion, multi-terrancy security performance scaling. We will take you through this, how we see it. And it's also like there is multiple plugins similar like I see like analogy in OpenStack Neutron at the beginning when there was and still there are couple of plugins and it's difficult for people who are new to understand which plugin works and usually you have to test that because everyone does tell you this plugin is the best. So there are couple of SDNs, Calico, OpenContrail, Romana v Contif, OpenVSwitch. We look at it and because we come from the OpenStack and this is OpenStack Summit and one of the most usable solution for Neutron is OpenContrail. We decided that we will pick the OpenContrail as example and Calico as most common and let's say from our point production ready network plugin for the Kubernetes and we wanted to compare those two solutions not just as performance but also functional features and the use cases. So overlay versus non-overlay. So if you look at common overlay concerns like this is now very popular for the people. So they saying like you lose the benefit of the simplicity, you lose the performance because you have overlay which is not needed. It's difficult to maintain, it's difficult to troubleshoot. So these are the statements of the people who prefer non-overlay and if you look at the overlay you have benefits like native multi-tenancy, security and with the OpenContrail you can get features like EVP and L3 VPN, you have analytics. So this is like two worlds where the new world is non-overlay and the second world is legacy enterprise which need to cover several features which must be there. Some of the time usually people try to simplify that it's just a performance issue. But from the performance perspective not using overlay you still necessarily use the internal bridges, the moves for the container and the performance difference in the end and you will see it on our slide is like 3% or 4% of the payload what differs and maybe what's the difference huge is the packet per second which the question is if this parameter is really valuable in the containers it makes sense for NFV clouds where you are launching the firewalls and routers but the question is if containers also need this. So I borrowed a statement from my favorite guys so the key aspect is to consider is the operation complexity and it's really depends on the use case where what you are running. If you are running application just one application stack and you don't need to solve the security, the balancing and separation of these services and physical infrastructure separation from the application itself then non-overlay is fine. If you need to separate it and you need to put your endpoint for the oracle database cluster or go on some legacy world then you will probably need the overlay and we are not saying what is right. So we run the test environment when we run several functional and performance tests. These are the use cases what we tried with my team for almost one year so we run the Calico on bare metal cluster with 100 nodes. We run Kubernetes with open control on the nodes with open control 2.x, with open control 3.x. We run open control inside of Kubernetes with the Calico. We run open control and Kubernetes next together. We run Calico in open stack with open control through the BGP as a service feature. We posted also the blog about and we are running also open control Kubernetes in open stack with Kubernetes when you are using single control as a data plane and you can map through the v-router. So we tried multiple scenarios and we tested in scale of up to 100 nodes not more and we will share the story today. So my colleague will take you through each components. I would like to start to explain you the basic components of the Calico architecture but first let me tell you the other evaluation of the Calico. We have been using an evaluating Calico since the Kubernetes version 1.1 and we are now presenting Calico as the default networking solution for the Kubernetes. For the new version of Mirantis Cloud Platform. So let me get back to the architecture point of view. Calico is using CNI which stands for the container network interface which is abstraction layer for connecting the third party SDN solution to the containers. It's using the traditional bird routing daemon for all the BGP stuff. It's using ETCD as a key value store for the Calico configuration. It's using the confD for the generating the bird configuration file and Felix as a daemon which is on each Calico node. I would like to mention that the Calico as we use it is pure layer free. So there is only routing and I will hop into the next slide. What it actually does, it takes a node, it takes the slash whatever network. It gives it to the node and when you spin up the container, it will provide the slash 32 host network to the container. And then the routing information is distributed via BGP. So by default in Calico, there is the full mesh BGP network which is not like really good for production and it's not really good for scale. Because when you will end up having 100 servers, you end up having 100 BGP sessions per server. When you have 1000 servers, you end up having 1000 BGP sessions per server. So that's why we are using the routers as a router reflectors which stands like for the centralized propagation of the routes. So by default, IBGP is not distributed to the IBGP peers but with router reflectors. You only peer the nodes with these router reflectors and they are responsible for distributing the routes to the other nodes. So you will end up having for each Calico node having the two BGP sessions with the router reflectors. There are two for like active backup scenario. This is not only the only concern, the other concern is that you are trying to be like not dependent on the underlay at all. Because when you are using pure layer 3, you need to redistribute the routes all over your network so the nodes can communicate to each other. So you actually need to propagate these slash networks from the Calico to the networking infrastructure and you need to use for example BGP to do that. But you don't want to, like in any other hypervisor, you don't want to create the BGP sessions with all of the networking devices that are in your infrastructure. So you are just peering it with the router reflectors. OK, so some pros and cons of using Calico, one of the pros is that there is no overhead. Since you are using only pure layer 3, there is no inner header, outer header, no encapsulation at all. It's all transparent routing and the other thing is that it reduces the complexity. For example in this world for like traditional networking guys it's really hard to go along with the SDNs and stuff. But since this is using the normal routing daemon, the bird, it's possible for like traditional networking engineers to troubleshoot and operate this stuff. On the other hand it is highly underlay dependent since you need to connect the Calico nodes with the BGP protocols and you need to redistribute the routes all over the environment. So it kind of take you away the benefit why you even use the overlays so that you don't want to be dependent on your networking team. And of course there is also no layer 2 since there is by default only routing. Some facts about the Calico using the Kubernetes. As I already mentioned it is using the CNI plug-in. We are right now evaluating Calico 0.22.0 with Kubernetes 1.4. And one of the features that it provides is the Kubernetes policy for the security. Then we have also some production consideration what we have found out during our testing and evaluation. And one of these things are to always separate the ETCD cluster for the Calico. I mean separated from the Kubernetes ETCD don't use the same key value store. The other thing is use at least version 3 ETCD which brings a huge improvement of like performance. The other thing that I already mentioned is to disable BGP full mesh peering. And then we have a thing that it's like from our point of view right now don't run it as a Kubernetes manifest. This is like our evaluation right now also for the Kubernetes. We prefer to run it as a system these services rather than run it in Kubernetes manifest. OK, let's jump on OpenContrail. I will also start like with some brief architecture overview. But first let me tell you the experience that our company has with this SDN. We have been using the OpenContrail for more than like two and a half years. But we started to using it with OpenStack because at the time the Kubernetes wasn't even existing. So our main experience is with OpenStack and the virtual machines. And we are using the Kubernetes or OpenContrail plug-in for Kubernetes since day one. So OpenContrail is an overlay solution and it has like parts like control, config, analytics, database, agent. I mean, there are a lot of services and let me just briefly tell you that control is responsible for like control pane, the connection between the agents, exchange routing information between agents and between the physical gateways. Config is using to translate the API calls, UI calls or whatever calls to actually to make the configuration change. Analytics collects the huge amount of data about all the flows, about the performance usage of all your nodes, etc. The database is then for the persistent storage of your configuration as well as the analytics data. And the agent together with the kernel module then is responsible for all the rounding and stuff. OpenContrail gives you opportunity to use multiple encapsulations like MPLS over GR, MPLS over UDP, as well as the BXLAN. It usually uses the physical gateways and it's also our recommendation to use them. These gateways are used for north-south traffic from the OpenStack point of view as well as from the Kubernetes point of view. If your containers want to go out, they go through the gateways. This is like example topology of the control plane of the OpenContrail with Kubernetes. As you can see on the right side, there are two control controllers which has the XMPP sessions with the VRouter agent. The concept is similar as well as the Calico works with the router reflectors. When there is a change on the VRouter agent, the change is propagated to the control nodes via XMPP. And then the control controllers are responsible for distributing all the networking changes to all other agents. And as well as there is the BGP peering to the cloud gateways. I don't want to make a confusion. This is only for control plane. The data plane tunnels are created directly from the containers to the gateways as well as between the containers. So this stands only for control plane. Also the pros and cons of using OpenContrail. It is underlay agnostic. You are not dependent on the top of direct switches, aggregation switches or whatever switches. Maybe someone can say that the routers are like the legacy world as well, but we see the cloud gateways, the physical routers for termination of the tunnels as a solution of the cloud. So they are part of the SDN from our point of view. It's not the part of the legacy world. The other benefit is that there are capabilities of advanced networking features, including load balancers as a service, using the layer 2, layer 3 VPNs, also extension to the bare metal world like orchestrating the top of direct switches via OVSDB, configuring the routers via Netconf and other stuff. I put the using physical gateways to the pros because we see it as more stable and it also provides you the possibility to connect almost every encapsulation from the world to your cloud. So you have no limitation considering what can go to the cloud, what can go to cloud. It provides the full separation on layer 3 or layer 2. So that's why it's under the pros. Under the cons there's, of course, overhead. It's mentioned that there is adding the payload to each packet that is processed. You have inner outer header and also the complexity. I mean advanced networking features goes hand-to-hand together with the complexity and so it can give you a lot of trouble. Let me tell you how the open-contrail works. Oh, there is a mistake with the Kubernetes. There is the Kubernetes Network Manager, which provides the bridge, the connection between the Kubernetes API server and the Contrail Config API. There is the huge difference in the balancing between the containers and from accessing to the cluster. In the Contrail it is using ECMP from the gateway up to the containers or between containers itself. So there is no cube proxy or IP tables balancing, like with other SDNs like Calico. There is also a bigger separation on the networking level. The networks are created and the containers are associated to the networks based on the labels in the manifests. So by design in overlay the networks are completely separated, but the Contrail is using the next level of separation on the namespace level and it takes like the concept of the namespace in Kubernetes, which normally is more or less like logical separation rather than physical separation. It can be comparable in OpenStack tenants. So it's like an open-contrail point of view, the namespace is in Kubernetes, are tenants in OpenStack. And the security is done by Contrail policies. We are currently evaluating the Contrail, the latest stable release, 3.0.3, which supports Kubernetes 1.4. Then we have some production consideration for using OpenContrail. This is not like a consideration using OpenContrail with Kubernetes, but using OpenContrail at all. Always separate the Cassandra cluster, the database cluster for the config and for the analytics because analytics can generate loads of data which can overload your Cassandra or your disk space. As I already mentioned, use physical routers as a gateways. All right, and I will give the word back to Jacob. Okay, okay, so a small comparison. So regarding paint performance, as I said, it really depends on the encapsulation in the Contrail. We get the best performance with MPLS over UDP between containers and MPLS over GRE towards gateways. The performance here really depends on the nick offloading. So it's really difficult to say what will be the performance because it depends on the drivers, on the kernel. So we did the test with onKernel 4.4, on Ubuntu Zenile. The payload overhead is around 4%. In this particular case, how we measured that by IPerf, it's like 1%. But we had to... So it's like it's very difficult. It's around a percentage and it's not really about bandwidth no, probably, but in overlay world it's more about packet per second, as I already said, and packet per seconds in containers. I don't think that it's a real issue because what you usually running in the containers are in Gynics and Apache and this proxy. So if I need more performance, more packets, I will scale my container rather than increase the container. So that's the biggest difference between virtual machines and the containers and still like NFV features, where these functions are needed for the performance, are not useful for the Kubernetes and containers. On the Calico side, it's no encapsulation, no overhead and bandwidth is almost similar. It doesn't make sense to really show how CPU utilization is measured and how it's generated because it's not exact measurable and exact comparable. The important thing is that the bandwidth is almost the same and it's not really about performance but about the use case, what you are trying to get. So why not both? In this example, we are showing that we are able to run the physical servers with Kubernetes on the hardware with Contrail as well as OpenStack with the Contrail and we can provision for the users, for the developers through the heat through the Murano or whatever, application workloads with Kubernetes with Calico for the testing. So if developers need to quickly test something or developer pre-ground with Kubernetes, you can easily do it. We also set up the case where we enabled BGPS service in the OpenContrail and we put the BGPP ring from the virtual machines into the V-Router on the node which is the latest feature of the OpenContrail and we run it. What we see in this multi-example approach, especially here, it's like for whatever reason you cannot put everything into the container and it's not sometimes technical but it's licensed sometimes also. So for some reasons you cannot take some vendor databases and put them into virtual machines or put them in the container, it doesn't make sense. So with the overlay you can very easily connect your physical nodes with Kubernetes application controllers with your database nodes which are running on the OpenStack VMs or which are running on a completely separated bare metal server or whatever is needed. So this is also one of the use cases which we tested and which works very well. So another thing what was presented yesterday, I think, by Rudra from the Contrail team is like running Kubernetes with Contrail on top of OpenStack with Contrail and use the same Contrail because there is starting to be issue like duplication of the overlay, overlay in overlay, vrouter in vrouter. So they solve this issue very easily that Kube network manager which managed Kubernetes is able to call Contrail controllers of OpenStack and he is able to add sub interface for your container directly into the virtual machine. So then you can use same approach, same networking and you can again on the cloud mixed your containers and VMs in the OpenStack which is another very good use case. What we found for the Kubernetes itself it's like that we are building our own binaries Mirante supports the Kubernetes clusters so we have a downstream team which works with the upstream and provide a patched and fixed stable production Kubernetes. So we are taking these binaries and instead of taking docker containers with unknown origin or building them itself we have team for this and we offer this service to our customers and we are running these services right now as well as ETCD in SystemD not in the manifest with the docker because from the operation perspective we figure out that doesn't make sense for us really to launch Kubernetes itself in the Kubernetes or in the containers because it adds extra complexity. ETCD is one binary Kubernetes, HyperCube now it's also one binary so today we cannot take these two files and put them into SystemD and not be dependent on everything on the docker and everything running in the container so we are separating this we also have a provisioning tool what we offer and what we are using which is not like upstream default installed installation because it's a lot of mess and a lot of stuff mostly for developers not for the production so we are building the Kubernetes putting stuff together and make it real as operation ready for our customers instead of playing ground for the developers and we also pulled images from the private docker registry with the authentication so this is how looks our production Kubernetes deployments so small comparison in the end so opencontrail, Calico encapsulation in contrail you can choose as I said MPLS over UDP had the best performance between containers bandwidth it's like difference of 5% let's say in general security multiterrancy in Calico you have new policy and you need to specify which is like extra work or it creates IP tables so it works but on the other hand in contrail it's natively so everything what you provision based on your label it creates virtual network which is completely separated and based on just labels you automatically creating the policy so that's the differences when launching Kubernetes for single purpose you don't need to really solve the policies multi cloud opencontrail enable you to cover VM, bare metal containers whatever in the Calico you cannot cover but you can run internal merantis team created solution where you can run for Kubernetes as well as for the open stack so you can reuse single Calico it's especially for use cases when you want to run container as open stack on top of Kubernetes and open stack should use neutron plug in Calico so then it makes sense to use single Calico for both and don't have to complexity Calico is just bird and bird is here for several years so it's stable and it's using ETCD as just storing the objects so it's pretty easy we didn't find any issues what how to where we spend some time on huge debugging in opencontrail it's like set of many tools Cassandra Zukipl Redis, Kafka Lot of lot of stuff and for new people it's really difficult to understand but that's the reason why we are here to help you to set up your opencontrail environments and extra features as Marek said you can get almost everything into your container which is awesome on the other hand if it's really needed because containers are created for the simplifying this is just the question question sorry this is just the picture from Mirantyská platform design how it's how it's done now so what we are testing when we launching Mirantyská platform with opencontrail it's like we taking the small deployments which are the servers where we are putting HA proxy for Kubernetes HA because again with Kubernetes you need to solve high availability and how you will solve it you need to have two or better is three nodes and you need to have single endpoint so again you will end up with virtual IP on top of that the contrail controllers as the containers as well as open stack and we also provisioning Galera and RabbitMQ into the containers so this is just another use case what we did so that's everything what we want to share so if there are any questions we can answer them I'm curious what will be this question it will be difficult to answer no it will be easy one so I have the question on the architecture you shown them mixed architecture right so the question from the service providers is like if you want to use opencontrail and you also want to use open stack the front end is open stack and there is so much challenge like if you want to make some change then you can't upstream it so what do you suggest in that case like you mean when you want to launch Kubernetes inside of open stack for the services just leave the containers aside just using open stack and open contrail so if we want to change some port definition bring in enhancement to open stack there is no way to upstream because the back end is contrail so what should the service providers do in that case I'm not sure if I get the right so you mean like changes in the open stack what you did and how to get it into opencontrail you have to upstream into open stack the community is working in a different direction but opencontrail is working in a different that's the good question that's difficult it must be discussed with the probably contrail development team how to do it yesterday I heard that you have to submit a blueprint and they will figure out but basically I think that I know that neutron support couple of features or neutron team trying to support couple of features which are not in the opencontrail so that's the like really difficult decision and it's depends on the use case what's exactly you need it's difficult to answer it's more about to bring the people from contrail and ask them how they follow the neutron vision thank you for the talk talk is performance I had a few questions regarding the performance could you go to the slide that you had on performance so here could you elaborate a little more on what tool was used what packet sizes were and when you mentioned you had 100 nodes how exactly does that fit in with performance take it yeah well with the containers we didn't actually like increase the MTU sizes and we were only using the standard byte size that is used by IPER so we actually didn't like create didn't try to create any scenario that we won't like see in our environment at all so basically we scale 200 nodes and then we provisioned ports see how it behaves and then we did the measuring between the nodes what performance we can get between the containers between east and west and what performance we can get from the from the north and south and we didn't want to replicate what everything was done for example by Calico team where they measured it containers bare metal and they measured the CPU and all this stuff where the bandwidth is near line speed and we tried to focus more on as I said like performance is not only one criterion what why we are choosing or picking the solution because you can easily scale we tested like how many objects we can also create in ETCD like because when you are growing your ETCD is growing as well so we started with ETCD version 2 then we moved to ETCD version 3 so that's like for the for the routing of the Calico especially it's like it's BIRD and it's a native native routing okay so but just to go back so you mentioned that you had ports which are running IPer so one port is on the one is a client, one is a server so you had the NIC offloads does Mirante support enabling the NIC offloads when you deploy the cluster or you had to go to each node to specifically tune I'm pretty sure you used RSS, DSO, GSO just to get the performance yeah that's some tuning what you need to set on the NIC and I said it depends on what NIC you are using but on most of the cases we didn't do any special parameters for the NIC because in contrail when you increase the NTU then you get this performance and in Calico you get it without any tuning these parameters thanks I would like to maybe add that we are using Calico but there is not like the a lot of Calico containers which will be increased by increasing the cluster size because on the compute nodes there is running Calico so the server is aware of the Calico routes as well as the Kubernetes is able to make the balancing between the service endpoints but the things that run on the hypervisors in containers are LibVirt and the WacomPute which uses the host networking ok so if there are not any more questions thank you for your attention thank you