 Hello, everyone. My name is Navin Joy, and I'm a cloud architect working at Cisco. So primarily, my team works on container initiatives within Cisco's CTO group. CTO stands for Chief Architecture Office. So what we do is we talk to our customers, we get their requirements, and then we do proof of concept demonstrations to them. And a lot of our work recently has been around containers. And our team has put together recently a Kubernetes environment using Ansible Automation. And I just want to share my experience and my team's experience here with you all, the lessons learned, and some of the issues that we have run into. So we have also open sourced our playbooks and our code that I hope you can leverage. Feel free to use them, modify, and if they are of any help to you, we are quite happy to help you out if you even run into issues. If you do need any other help or have questions, you can ping me at my email address. You can reach me at najoy at Cisco.com. So before we go into the deep dive of the automation work and Kubernetes, I just wanted to spend some time here to list out our key requirements and why we chose this route. The first was that we wanted a container-centric infrastructure to deploy our applications. So several of our customers have today virtual machine environments, and they recognize the agility and the ability to move from development to production, basically carry it over from development to production as it is if you're running containers. Then they also understand the fact that they can port their applications between clouds. So if you have ever tried to move your virtual machine from one cloud to another, you know how hard a job that is. So a lot of these factors are driving this container-centric application deployment. So from our team's perspective, we wanted to put together an infrastructure to build and test our container applications. So the second point here is that we were looking for a low friction porting of apps from virtual machines to containers. So if you pause and look at this point or elaborate on this point, if you look at the virtual machine world, you have a bunch of applications running in a VM. And those apps share the IP address and the port space of that virtual machine. Now, if you want to containerize those apps, if you use the default Docker way of containerizing which we have tried before, what Docker does is, as most of you may know, it, by default, isolates all the containers both at the network and process level. So what do I mean by that? So if you take that, let's say you have some applications running on your virtual machine and you containerize each one using the default Docker approach, what Docker does is it creates an independent TCP stack for each application, which means that each container now, each application that moved from the virtual machine into the container world has its own TCP stack with an IP address and its entire port space. So this is hard for a lot of apps to deal with. Now, they were all sharing the virtual machine's network space. And now they have each application now has its own IP and port space. So how do they communicate? They will have to communicate through the network now. The applications that used to communicate using local host on that virtual machine, once containerized, will have to communicate using the network because they all have their own independent network space. At the process level, mostly you can deal with it because the idea here is that if you containerize that process, it will not be able to see other Docker processes. Nor will it be able to see processes running in the virtual machine itself, but that you can deal with. But this unique network stack per application, once containerized, creates its own sets of challenges. Now, Kubernetes has a nice way to work around this. So Docker has another approach where, or it gives a feature by which you could create a set of containers and share the network name space. So what this means is that, so if you share the network name space and you have a bunch of containers, they all, logically, it's similar to that virtual machine that applications that we're running in that virtual machine except that process isolation. And that's exactly what a Kubernetes port does. It is, in effect, a group of containers that can share the network name space and also volumes. So it kind of nicely ties into that virtual machine world where you had that local host communication. And all of these port containers can communicate using the local host. So each pod now has a network stack, an IP address, and its port space instead of a network stack per container. So we like that. And also, pods could be scheduled, much like virtual machines. So you take this group of applications that are containerized and that form a pod, then you can schedule it onto your environment. In our case, those pods were actually run on virtual machine instances themselves. We did not use bare metals. So these similarities between virtual machine and Kubernetes pods really helped us in migrating some of the apps in a low-friction way from virtual machines to containers. Service naming and discovery is also built in to Kubernetes that we liked. Again, going back to the virtual machine analogy, if you want to discover a service, you use DNS. You have a name for a service, and you do a DNS lookup to get its IP address. Similarly, here, which I will go into details as to how we have our service naming working, you have a cluster DNS server that holds the name of the service and also it supports SRV lookups, using which an application can look up the DNS server and then get the IP address and port number of the service instead of having to do it any other way it's built in. So we like that. Then another feature that we liked was self-healing. So a lot of us spend time in the trenches and in the operations. And way back, we did not have this self-healing feature. What it means is that occasionally you have issues. So you have your virtual machines would go down at some times. There would be network problems that would cause containers or virtual machines to lose connectivity. So self-healing is a way by which Kubernetes environment will try to restore the system by restarting containers if they have failed or if the underlying node or the virtual machine that runs those containers, if those were to fail, Kubernetes has the intelligence to take the pod that was earlier on running on that virtual machine and recreate it on another virtual machine. So that helped in the middle of the night if your software could do these things, it would help you. At least you are not troubleshooting at that level or trying to recreate something that has already failed on another machine. So that is already built in. Then finally, with respect to the automation, the environment was getting quite complicated as it is. It was not a child's play, not a plug and play environment. OpenStack is complex with its sets of projects. And Kubernetes itself has several moving pieces that you will have to tie together to get it to work. We do not want complexity in our automation. Kubernetes and OpenStack are complex for a reason. They provide a lot of functions and a lot of knobs that you can customize and tune. Agreed, APIs and a number of projects, variables that you can customize. But automation framework, we wanted it to be simple. We do not want to learn another manifest language to be able to get this going. So Ansible looked at it, said, here's the way to go because it gave us a very simple cookbook or a YAML file where you could list your tasks, group the tasks to the host that you want those tasks to be run. And the ordering was intuitive. You do not have to specify any special ordering between tasks. So it was clean, no complex databases that we maintain to run the automation itself. So pretty nice. And any developer could run it. We all have SSH access into the system so we could pretty straightforward get it going within a couple of days. So these were our key requirements and why we chose Kubernetes and Ansible. This is our infrastructure stack looks like. So we have an Ansible host. This would be typically one of our developer's machine. And in our case Ansible is running on my MacBook here. And when you do a playbook run, what Ansible would do is it would read the settings from a global settings.yaml file. What these settings would contain is all the variables that are needed for the developer to customize their Kubernetes deployment, such as, for instance, the version where to download the software from, where is the master nodes, how many worker nodes do you need, what are the instances, how many do you want to launch, and things like that. So the developer had the ability to control his cluster deployment by using that single central settings.yaml file. So Ansible playbook would read the values from the settings.yaml file. And then it would go through those, the tasks in the playbook and deploy these Kubernetes spots in the end. So each developer, in our case, runs this playbook, this seven set of playbooks, builds this environment, and then deploys these applications. So the first and foremost, how do you launch your virtual infrastructure, which are primarily the underlying instances on top of which these containers are running? So there are several ways by which you could build your virtual infrastructure. For instance, Magnum project uses heat to build. So if you want to build out a set of complex infrastructure, if you want to build your networks, if you want to build and create your security groups, build out your volumes, that might be one way to go. But in our case, we had networks already built out. We knew in which network we had to place these instances. We knew the images. So we did not have to create any of those. We had to just tell some tool to go and create those VMs using those settings. So this Ansible 2.0, if you're using, has a module called OS underscore server. And that has the ability to go and create instances. If your objective is just to create instances, attach them to network and assign security groups, floating IPs, and so on, it works pretty decently well. So we created a profile for an instance in the settings file. Each developer had the ability to do that. And then when we ran the playbook, that information about the instances on the network and the floating IPs would be read by that playbook. And then it would talk to OpenStack and spin up the underlying infrastructure that we used to launch the Kubernetes cluster. So here is an example of the settings.yaml instance profile. You can always look up the open source code to see the details of the settings.yaml file. This is just a snapshot. So here you have a simple dictionary in which you capture key value information for launching your instance, such as your SSH username, your tenant, the password, the availability zone, security groups, the network, floating IPs, and things like that, flavor. So one would note that if you look at this, these are pretty much the inputs for the NOVA API. So in turn, this OS underscore server module talks to NOVA, feeds this information, and then builds out these underlying instances for our Kubernetes cluster. Another feature that we have leveraged is the Ansible dynamic inventory. So what is this dynamic inventory? So for any automation tool, you have to tell it where your hosts live. You have to statically specify that information. And to what groups that host belongs to, so appropriate actions could be performed at a group level instead of at the individual host level. So there is this classification thing, discovery, and classification of nodes, if you will, that lives in your environment. We do not want to hard code the IP address on the group information for our Kubernetes cluster into Ansible that would make it too painful. So what we did was that we injected this metadata into each instance as it was launched. And then when you run the playbook, which is called Deploy Kubernetes, what that would do is before it actually pushes out the software components, this dynamic inventory script would go in and discover all the nodes that are out there. And the groups that they belong to and then feed that information into Ansible. So dynamically it's done. So Ansible knows what are the Kubernetes master nodes and what are the worker nodes. So then it knows where to push the control software, where to push the worker software, and things like that. So we're able to do this dynamically and that helped us in managing our infrastructure in a dynamic way. So we do this Ansible playbook run using this command, playbook minus i scripts. There's an inventory script. It's a Python script that discovers, classifies, and feeds that information to Ansible. And these tasks that are listed in the deploy Kubernetes.yaml file will actually refer to those groups that have been discovered by the inventory script. So this is a snapshot of the settings.yaml file in which you specified here we have a single master node and three workers named master one, worker one, two, and three. And then you can see that the metadata has been injected into each node saying the master node we have injected metadata that says the host group is KITS master and the worker is KITS worker. So the inventory script is essentially reading this information passing to Ansible. And the tasks are organized based on these groups. So if I could summarize our cluster deployment, this is how it looks like. So we have this instance profile that is in this settings.yaml file that provides values to customize our deployment. So then we run our playbook launch instances.yaml that essentially launches those instances that are specified and then tags them using metadata, essentially providing that classification information that's needed for the subsequent playbook run. Then we run deploy Kubernetes.yaml. There's another playbook that is also there that deploys Docker. Then this deploy Kubernetes will retrieve that inventory script will retrieve the metadata and classify the instances as master node. What's the control layer? What is the worker layer? And then we'll deploy the appropriate software to both of those groups. So how does our cluster look like? So in our cluster, we have one control node at this point, which is a Kubernetes master. And it runs the API server with the HCD database. It also runs the scheduler and the controller manager. The controller manager is a component that runs the background processes that are actually doing the monitoring and auto restart and all of those things. Then we have a set of worker instances. Now all nodes for consistency, we are running Kubelet and Kube Proxy on all our nodes. What this Kubelet does is that this is the piece of software that is responsible for actually creating the pods on the underlying nodes. And Kube Proxy provides a service abstraction in front of these pods. So I'll also show you some details as to how these work in my later slides. Then we have a cluster DNS add-on for service discovery and naming. Then a Kube CTL or Kube Control Client to interact with the cluster. So what we have done is when you do this deployKubernetes.yaml file playbook run, it deploys all the software, it then also deploys the certificates necessary to enable TLS connectivity between the client and the Kubernetes API server. And then all of those certificates are also the client certificates are also stored on the developers machine. So you can run the Kube control from your machine or your laptop. And then that is going to read up the certificates. And that's how we authenticate through the control plane. So we have this certificate-based authentication. And also there is also a token that is put in the Kube configuration file. So it's all fully, the automation will build all of that. So when you run Kube control, it's ready to go and talk to your cluster. And of course, it's darker for running the containers itself. So once automated, the cluster kind of looks like this. So we have this Ansible host that basically fetches the software from the location that you specify. KubeNet and KubeProxy, we are actually not running these in containers themselves. The KubeNet and KubeProxy, because it kind of complicates things. KubeNet is responsible for creating the pods and all the containers. Containerizing that itself will require some additional work. So we are running these as basic Linux services, the upstart services within the actual instances themselves. So then Ansible goes ahead and creates this master node, then all the worker nodes, and then deploys the software, and then sets up the worker to point to the master, essentially the KubeNet. So the KubeNet now then talks to the master and tells, hey, I'm here. I'm a node. So the master registers all of those nodes. And then the proxy is then responsible for providing the service abstraction. So when a client comes in and talks to your service that is running in the pod, it hits the proxy. And then from there, it is randomly, what proxy does is randomly sends the traffic to one of the back end pods that is running your application if you have multiple pods that are configured to run your application. Maybe if you want to create multiple frontend pods if you will. So I want to spend some time on how we did the networking and what changes are needed in OpenStack Neutron to get this Kubernetes networking to work seamlessly. So we have looked at some of the evolving projects that are courier. It's still not ready for Kubernetes. It's promising. The fact that you could use Neutron networking for containers. The idea is great. But how it would work with Kubernetes is a whole new thing. It's still being worked on. Then you have other overlay technologies for networking, such as Flannel being the most popular. And in fact, the one that is being used by Magnum when you deploy a Kubernetes pod inside the pod. So what Flannel does is it builds an overlay network for connecting these containers. So we did not want to go there because you already have this Neutron overlay. And on top of that, these pods, you built another overlay. So overlay or overlay, we wanted to explore something different as opposed to using the standard Flannel. So then we examined the fundamental requirements of Kubernetes networking is that you can use any networking for Kubernetes as long as it satisfies these three conditions. One is that all the containers that live inside the nodes, they need layer three connectivity without going through any NAT layer. So which means that a container running on one instance should be able to directly talk to a container on another instance. And the destination containers should see the actual IP address of that pod in which that container is running. It shouldn't see the masqueraded IP of the host. So as long as you meet that, it's great. Then layer three connectivity between the nodes and the containers without NAT. So nodes are the virtual machines that are running these pods. So these pods should also be able to communicate with those nodes directly. That's our second requirement. And third requirement is that there should not be any port forwarding from nodes to containers. So in Kubernetes, since it, as I mentioned earlier, since it has unified the network stack and has provided a single namespace, right? And it has an IP address to listen to. So you really don't need port mapping for it to work. So Kubernetes, each pod has an IP and its own port space. And as long as you can route to that pod, you can access it directly without having to map ports from the underlying compute node to access the containers, which is what you would have to do if you run the default Docker container. So to summarize, you have to meet these three requirements regardless of the networking technology that you use for Kubernetes. So how we achieve this problem? So what we had thought about it as well? You have Neutron. Why don't we try to get Neutron itself to provide routing for our containers? So then to each instance, we assigned a slash 24 subnet for pod addressing. So these pods live in their own network and it sits inside the virtual machine. So we said, okay, you have these workers. So each worker, we did a mapping and assigned, we had a clash slash 16, excuse me. So we subnetted it and then we had 254 subnets and then we assigned one by one to each worker node and to the master node. Then we created a node name to pod cider mapping. So we have to automate this, of course. You can't manually go in and assign that bridge network inside the virtual machine. So we created this mapping, said, this is a mapping called node pod cider master worker and here are the subnets that are for the pod networks on each of these nodes. Then we created a script and using Ansible, then we replace the Docker zero bridge with a bridge named CBR zero, which is a container bridge and allocated that bridge to this network. So that was done using Ansible. So when Ansible looked at the node master one, okay, here is this IP address for CBR zero. It goes and creates that bridge and assigns this IP address to it. So we were able to automate the pod addressing using Ansible, okay. This was a step. So now you have this pod networks that are present inside each virtual machine. But at this point, Neutron does not know anything about it. So if you were to route to these pod networks, Neutron would just drop the traffic. Then we wrote a script at Neutron route.py that went to each, that went to the tenant router and then program the tenant router to route to each instance for the pod network that lived inside it. And then we invoke the script using Ansible. And then so at that point, the network was built and the tenant router was configured to route to that instance. Then we enabled IP forwarding on the VM's kernel to route the traffic to that container bridge, CBR zero. So our network was set up at this point. Neutron router was configured, the pod networks were built. But we ran into this issue. We knew that there would be this issue. Kind of surprising because we did this in AWS. So in AWS it worked in the sense that if you have a VPC and you have an instance running in your VPC and you have the router, so if you add a router, if you add a route on your routing table pointing to your instance to a network, right? AWS would actually route the traffic to your instance. And if you do TCP dump on that instance, you can see that the traffic is hitting your instance. It's not blocked at that host level or that AWS host level. But however, there is some work that needs to be done in OpenStack whereby what Neutron does is it adds IP tables rules to each NOVA compute host. This is like the underlying server, the physical server that is running NOVA to permit only traffic originating and destined to instances only. Just locks it down to only that instance. So a pod network that is living in the instance is getting dropped by this IP tables rule that's sitting in the host level. So what we did was that then we in fact wrote a script that updated the IP tables on the host. We had access to a host. It was a lab environment. We have not rolled this out to production but before anyone tries this out in production, this fix has to go into Neutron where if you add a route on your tenant router pointing to an instance, IP tables on that host should allow traffic for that network into that instance. So that fix is needed to take this to production. So at this point, what we have addressed is creating the pod networks and also routing traffic to them within your tenant. So how do you provide external connectivity to the container, the ports that are running inside of these virtual machines? By default, if you look at OpenStack Neutron, Neutron only nets outbound traffic that originates from the instances themselves. So if you send traffic from this pod network, the 10.8 slash 16 that we have, Neutron would say, I don't know anything about this network. I'm going to drop it. I'm not going to do SNAT for this. So in order to fix this, what you have to do is you have to, when a container talks to the internet, any host that is not in your network, you will have to configure an IP table rule on each instance to actually masquerade the traffic through that instances IP address. So it has to do an SNAT and go out to the internet. So then Neutron knows, okay, this comes from one of my instance. So I have to basically SNAT it to a roundable IP that could go out to the internet. So again, we use Ansible. We didn't want to do anything manually to create this, to deploy this IP table rules on each instance to masquerade pod traffic bound to the internet. The second problem is if you want to expose certain services that are running inside the cluster to outside, how do you do that? You most likely will have a front end service that's running in a cluster. So we used, many of you may be familiar with it, a node port service type supported by Kubernetes. What it does is it uses the IP address of the underlying node, which is the instance itself, that you could make routable using floating IPs, right? And then using a port that is in the 30,000 range, what you could do is Kubernetes will take the traffic coming in on that port and send it to your pod. So that is what node pod does. So we use that. And you will have to, for all practical purposes, you will have to tie this to a neutron load balancer to forward traffic or distribute traffic to all of the front end components that are running in your instances. So this is how the completely deployed cluster along with the pods would look like in our environment. So we have the DevOps machine using playbooks. We have orchestrated the Kubernetes cluster that has created these nodes, master node control and the worker nodes, deployed the appropriate software, dynamic inventory to determine their, you know, the information, inventory information, the IP addresses, and to classify them into groups, certificates have been created for authentication. So that's done using the Ansible Run. Of course, also the CBR0, the bridge network that lives inside each instance has also been automated. And then the tenant router has been programmed to route to the CBR0s on each node, pointing to that node's routable address, right? The ethernet zero. So once that's all set up, we are ready to create pods. So what we did is this diagram shows two pods, the front end pod and a back end pod. So the front end pod runs the front end servers, like a web server, and the back end pod runs the database in our case. So we wanted to expose the front end pod to the internet, not the back end pods. So we created this service, the front end service, which is the service abstraction in Kubernetes. What that does is when you send traffic to that service on any node, it picks up that traffic and sends it to the back end pod that the service is pointing to. Those pods are called endpoints from a service perspective. The idea here is that your pods may die. Kubernetes is going to recreate a pod, and that pod is going to be on another node with a different IP. Now, how does it keep track of the back end pods or other dependent pods? The idea is that the service is always going to have the same IP address, and it's not going to change. It lives inside that node that's local, and it, in fact, lives in each and every node. It's the same. So you talk to the service, and service knows about the endpoints, which is the actual pod application that running in the pod, and it will send the traffic to that application. So then we expose the front end service using node port feature that used a port on that virtual machine, a high port with 30,000 onwards to send traffic to that service. Then we created a load balancer with a VIP, and then added these two nodes as the load balancer members, pool members, and pointed the load balancer to that node port. And that VIP was associated a floating IP by which the client could talk to that load balancer. So the client hits the VIP, and the load balancer forwards the traffic to these Kubernetes worker nodes onto this node port. And Kubernetes front end service picks that service up and sends it to the application that is running in the front end pod. And the application then talks to the backend service, which sends it to the backend pod. So here's a two tier application that we deployed. This entire infrastructure has been automated. Thoughts on service discovery. So in our environment, containers discover services. So you saw those services that had the cluster IP addresses. So you have to talk to that service preferably, and not to the application that is running in the pod itself for stability. So how do you discover those services? What we have seen is that if you use a combination of environment variables and DNS, is what that works the best. So regarding environment variables, so what Kubernetes does is as it creates these pods, it looks for all the services that are running, and then it creates environment variables. These are Docker environment variables that the application could look up. And these environment variables will provide the IP address of the services that are running, the pod information, and things like that. So this posed one problem in the sense that when you create a pod, the service has to be running for these environment variables to be created by Kubernetes. So you will have to have this ordering requirement. You have to first create the backend pod, backend service, and then the front end. So Kubernetes will go and inject that, the right service information into the front end so it can discover the backend. So if you want to take out that dependency, what you could do is you could run a DNS server. It's again a simple, it's all run as a pod. So there is a pod manifest that runs this DNS server. And fortunately, you don't have to do anything manual to populate this DNS server. Kubernetes is intelligent enough so as you create each and every service in your environment, Kubernetes will, you give a service name at the point of creation, and the Kubernetes infrastructure will go ahead and automatically add an A record and a service record in that DNS that refers to that service that you already created. So that was good. So containers have, and also the containers are configured to use the DNS server. Okay, so I've been told that we are just out of time, so I just want to give you my last slide here. So you can clone all of this out from a GitHub, which is microservices, create a Python virtual environment, test out these artifacts, run the playbooks, and there is also a program called setup.sh that ties all of these together so it goes and launches the instances, deploys Docker, then deploys Kubernetes, creates that cluster, and then builds the entire infrastructure out for you. So thank you, and thank you for coming.