 My name is Angela. I'm a software engineer at Pivotal. And today, we'll be getting into the nitty-gritty details of CF plus CNI, particularly how to integrate your own networking solution. So first, we'll just level set on what CNI is, why we even care about having this plugability in this extension point in Cloud Foundry, before looking at what the CNI experience is today in CF, before finally spending the majority of our time talking about tips and tricks for actually integrating your own solution. So first, what exactly is CNI? CNI is the container network interface. And it's an industry standard API for container runtimes to call out to third-party network plugins. So it's part of the Cloud Foundry, or Cloud Native Computing Foundation. And so what this really allows is for some container runtime systems, so for example, in Cloud Foundry, Diego, or perhaps Kubernetes, to be able to call out to different networking solutions to get different networking features for their containers. In particular, as its industry standard API, we're going to be focusing in not so much on the container runtimes calling out, but instead on the networking plugins themselves. So as part of CNI, we have these core CNI plugins that will actually set up the networking experience for your containers. What they're tasked with is first, of course, creating the network interface for your containers. As you can imagine, when you have a create, you also have a delete. And so core CNI plugins also care about removing resources when your containers are deleted, so cleaning up after themselves. And your deletion should always be idempotent, right? It should never be erring out. Hopefully everything's deleted the first time, but if not, run it again and see us succeed. And very recently, there's been a third addition to the specification for CNI plugins, which is a check, to ensure that after you've created your container that your networking is set up as expected. So as we can see, by looking at all these three main parts of the specification, we're really focused with our CNI plugins on our core feature of network connectivity. So now that we see what a CNI plugin should be doing, you might be wondering how does it actually implement these features? How does it actually work? Well, your CNI plugin that your container runtime system will call out to, it takes a form of a binary, and it's invoked during your container creation or deletion by your container runtime system. And the arguments for how you actually want it to be set up are passed in through a combination of environment variables and standard in. So pretty straightforward. It's all listed in the specification of the container networking GitHub. You can look there for more details. In addition to your core CNI plugins that are really focused on your network connectivity aspect, there's also a notion of chain plugins. So after you set up your core network functionality, you may want some additional features, such as bandwidth shaping or port mapping. And so what you can do is actually provide in addition to your core CNI plugin, plugins to be run after that will add these additional features. So you sort of get the abstraction you can pick and choose. You don't have to just take your core CNI plugin and have it the end all be all. So now that we see sort of the basics of CNI, you might be wondering, OK, well, that's great. Obviously, CNI is allowing me a specification. It's allowing me an abstraction. But why do I care about this API layer? Why would I even want to switch out one CNI solution for another? Well, first and foremost, you might actually care about the type of networking implementation for your containers. There's a host of different ways that you can implement a container network. You could have a flat network. You could have an overlay. Maybe you're using IP tables. Maybe you're using a bridge. The list goes on and on. And maybe you care for one reason or another about the specific implementation. And that's why you would want to choose a different CNI plugin. The second reason, and usually one that people find more compelling, is that there is an entire ecosystem around connectivity as well. While the core CNI plugin is really focused on connectivity, it doesn't end there. You might care about things like policy. So after I've created my container, what can I actually talk to out on the internet? What can talk to it? In addition, you might care about something like network isolation. Is the containers I'm creating isolated perhaps from other containers? Or the last thing you might care about is something like ensuring that in addition to your container having an identity within your container runtime system and having an identity outside of it as well. So that, for example, you have a legacy database and you want to whitelist only specific IP. You want to make sure that you have that container actually mapping to that external IP identity. And so as we can see, a lot of people find this compelling. And so we have a whole bevy of CNI plugins that you can choose from that present different solutions for different use cases. And this list can be overwhelming, but we're going to focus in now on moving from CNI in general to CNI in Cloud Foundry. How does it work today? Well, currently, if we go back to the list and focus in, we see noted here that we have Silk, a CNI plugin designed for Cloud Foundry. And so let's examine exactly how CNI works in Cloud Foundry through the examination of a simple CF push. What happens when a container is created? This diagram is going to be obviously highly simplified, but during a CF push, eventually the information will be passed down to Diego, which is our container runtime system. From here, Diego will decide what host the container should be run on, and that information will eventually make its way down to Garden, which is actually going to issue the container create. As part of Garden, we do have a Garden external networker, which we'll take care of setting up the network connectivity aspects. But first, Garden will say, OK, I see that you've pushed an app. There is a new process. Needs a new container. And we'll start the container create process. Before finishing the container create, though, we need to have network connectivity. And that's what, via the Garden external networker, we'll be calling out over the CNI API to the Silk CNI plugin. Now, the Silk CNI plugin doesn't live on its own. It, in turn, calls out to a Silk daemon that we have running on the host VM, which eventually we'll call out and ensure that the state of the world is consistent with an external controller running on a different virtual machine. And once all of this information is passed through, the Silk CNI plugin knows the state of the world, it eventually will be able to set up the networking stack, allocate an IP for the container, and then Garden can finish the container creation process. So as you can see here, the main focus is that we have all of these red components that are part of our CNI extension, things that are Silk specific and swappable. But we're still abiding by the CNI API. So we have this green section here. We have the Garden external networker that's really calling out to a CNI plugin and, presumably, could be calling out instead of to Silk to any other CNI plugin. And so this gets to the tips and tricks for actually integrating, right? How do I swap my own CNI plugin solution for Silk? How do I change those red components? Well, we'll talk about two main things in terms of tips and tricks for integrating. First, some specifics on the creation of your Bosch release for your CNI plugin. And then secondly, your CNI plugin development itself. What preconceived notions in Cloud Foundry? Does your plugin have to respect? So first, creating a Bosch release. Obviously, you're going to be packaging some things up, first and foremost, your CNI plugin. But also, in addition, you may have a daemon, a long running process running that should be packaged as a Bosch job as well. Of course, not every single CNI plugin needs a daemon, but you'll find the vast majority too. Additionally, if you have any controllers or any other things as part of your CNI ecosystem, you should also be packaging those up as well in this release. So for this release, we recommend that you name it CNI. And the reason we recommend that you name it CNI is because the Guardian external networker needs to know how exactly to call out to this specific CNI plugin. Right now in CF deployment, you'll see if you look at the properties for the Guardian external networker, it's configured to look for your plugin in the packages CNI bin directory and look for the config for that binary in the job CNI config directory. And so if you name your release CNI, you don't have to deal with any ops files, any changes to the core CF deployment. You simply need to create that Bosch release and pass that in instead of silk. And of course, always follow best practices. This presentation will be made available. You can follow the link to get more tips and tricks with just general Bosch goodness. In addition to actually creating the release or perhaps before creating the release, you wanna make sure your plugin actually works, right? And so how do you wanna do your CNI plugin development? Maybe you wanna create a plugin from scratch for one reason or another, or maybe you wanna take a plugin on that long list and modify it so that it works for the Cloud Foundry use case. So if you're starting a CNI plugin from scratch, the first thing you'll wanna do of course is set up network connectivity between containers, make sure it does what the CNI spec tells a CNI plugin must do. If not, if you're taking something like Flannel or Calico or another solution, you can probably skip this step because they've already implemented the network connectivity feature. And instead you'll be focusing on the second part, which is respecting Cloud Foundry's network functionality. What do I mean by that? Well, specifically what I mean is that you need to respect application security groups or ASGs in Cloud Foundry. And you also need to respect C2C container to container policy and dynamic egress policy. So these are network policy features that are promised by Cloud Foundry that you can be writing. And so any CNI plugin that you swap in needs to make sure to read these policies and enforce them as well. So you're having a consistent Cloud Foundry experience. So let's look first at application security groups or ASGs. So ASGs define egress policy. So policy from your application to the external web. Can I talk to, let's say Google.com for example. ASGs can either be global, so apply to any container in a Cloud Foundry deployment or it could also be on a per space basis. And ASGs also apply either to staging containers or running containers. So we see that there's a few permutations of all of these combined. ASGs are stored in the Cloud Controller database. And so you can make a simple call to the Cloud API or CAPI to get the list of ASGs that you need to be enforcing. This would be a get to the V2 security groups endpoint. Obviously this is a lot of text, it's really small. We're gonna focus in on what each specific ASG looks like in the response body. Focusing in, we see that we have in this example a name which is a default security group. And then in this entity, we have the rules. So what ASGs are actually being defined here. In this case, we have a pretty permissive set of ASGs. It can basically talk to anything in the whole wide world over all protocols. But you can definitely make it more fine-grained. And you can have as many rules in your ASGs as you want. Next, we see after the rules, we have two things called running default and staging defaults. And what default means in this case is global. So is this a global ASG for my running containers or for my staging containers? If either of these are false, then you can find what spaces they apply to by looking at the spaces and staging spaces URL. Which you'll just follow that link and we'll give back the list of what spaces relate to this security group. And so that way you get all the information from that space. You can then apply to specific containers depending on what space that container is running. But you might be wondering, or you might be thinking to yourself, well, do I really wanna be querying cappy all the time? Do I wanna be pounding it? Do I wanna be giving it this heavy load? Can I even query and pull it as often as I would need to? And so there is a second option for figuring out what ASGs are currently created, which ones are currently applied into what containers. And that's through either, and that's through querying Diego. So Diego's BBS backing data store stores information about all desired long running processes, desired LRPs. And as part of that information on a desired LRP, we get the list of ASGs. So you can either be watching the BBS or you can make a post to the V1 actual LRPs lists to get back all the containers that are currently created. And as part of each LRP, there's a section called egress rules, which has the list of all of the ASGs that are being applied on a per container basis. So you can do a mix of querying for all of them to make sure you don't miss anything in addition to watching the BBS to ensure that as containers are being created, you're applying the right application security groups to those containers. After ASGs are supported or respected, the second thing you'll need to respect are container-to-container network policies and dynamic egress policies. And both of these live in the same policy server, C2C, container-to-container policies, deal with whitelisting traffic between containers. So by default, all traffic between two containers in a cloud boundary installation is rejected. But you can have a C2C policy to allow traffic from one container to a different container. Additionally, dynamic egress is the new version of ASGs. The main difference is that with ASGs, you would have to restart a container to have it apply. With dynamic egress, you can simply make the policy and it will be applied to those containers, to those spaces. And so both of these can be queried from the policy server API via a GET to the networking V1 internal policies. Since you're going to be respecting this probably as part of a job or process that you're deploying, you'll want to use the internal endpoint here. And we'll focus in first on a C2C policy. So we see here that we have a source and destination for your C2C policy. The source and destination IDs are actually the application GUIDs. And that's of course because containers can come and go, your IPs are ephemeral, your app GUID is the way to actually track what container you care about and then you can from there correlate from the app GUID, the IP that it's currently running on. And we see here that on the destination, you have a list of ports that the source can talk to for that destination and over what protocol. Similarly, for dynamic egress policies, we see that the source is also an ID, which is an application GUID. But the destination in this case, because we're talking to the outside internet, will be a list of IPs, which have a start and an end, and a list of ports. So in summary, we see that integrating CNI plugins into Cloud Foundry is totally feasible. You can take a CNI plugin that already exists, make some modifications to respect Cloud Foundry concepts and package it up as a bot release. And you should go out and try it for yourself and give us feedback on what works or doesn't. As you go on this journey, if you want any help or assistance, you can always reach the container networking team on Slack. And there's also a couple of links to first, the container, the CNI communities, GitHub org, as well as Silk Release, if you want an example starting point of what a CNI Bosch releases, which will also direct you to the Silk CNI plugin itself. And with that, thank you all for attending and I'm open to questions. One question about security groups. So before you enable CF Networking, security groups are, they still work, even without CNI and all of this. They work without CF Networking. When we enable CF Networking, in this case, the original one doesn't apply anymore? They still apply. Yes, so the question, yeah. So the way it works is application security groups, you're right, existed before container networking. When container networking was implemented, when we switched over to abiding by the CNI specification, we still ensured that we respected application security groups. But who applies them in this case? CNI, this our CNI plugin or something else which? Yes, so right now, if we actually go back to, this doesn't love me. Okay, where am I? If we go back to the drawing, it's so far back. Here, yeah, so if we actually, Silk CNI plugin is actually sort of many layers of wrapping, there's a CF wrapper CNI plugin which ultimately calls out to the Silk CNI plugin and that's what currently is responsible for applying ASGs. So if you were to bring in your own networking solution, then yes, you need to be respecting ASGs because Silk CNI plugin is what's respecting the ASGs right now. So the answer is that the plugin actually does this work. So it's replacing the original implementation. Yes, so the plugin will do the work, but in addition, because of dynamic egress policies as well, you'll probably need some long running process watching for these changes, right? And actually implementing the changing ASGs, if that makes sense. Okay, thank you. Chris? He has notes to ask a question. Three questions, but first out one, do you support chaining plugins with Garden or will only call one plugin? Yeah, so right now, the Garden external networker is only set up to call one plugin by no chain plugins is on the radar of the container networking team and they have office hours right after this. You should talk to them if it's a use case that you're really interested in having supported sooner rather than later. And then you mentioned that for the CFASGs, it could be in running and staging, but I didn't understand what staging meant. Yeah, so when you push an app, or when you push an app, you'll first have a staging container created that will be dealing with everything to get your actual running ASG set up. The running ASG will be your actual long lived container. Your staging will be your setup shorter. And then last one was, when you apply a dynamic ASG egress policy, will it cut existing connections if they are currently in flight? So say I'm pooling something from an external source and then all of a sudden I decide nobody should be allowed that. Will it immediately cut it off or allow those questions to finish? So in general, ASGs are whitelist, so you would be removing one, I guess, to cut off a connection. ASGs and dynamic egress will also compound on each other. If you delete an ASG and you have a connection going, I believe your connection will finish before, and then before the ASG actually takes place and is removed and you're no longer allowed to talk to that connection. Great, thanks. Yep, no problem. One last question. Oh, we have two, all right. Well, let's... Hello. Yeah, yeah. Hi, third question. Is it possible to pass CNI specific metadata somehow to the CNI plugin? Is it possible to pass CNI specific metadata? Yes. What do you mean exactly? That I have a CNI plugin and I would like to get some information from my app when I do a safe push to use some kind of specific logic that would be possible to actually... It's like the capabilities, for example, for the port mapping. I would have to think on that because right now, really, on the CNI plugin level, you're really constricted by the information being passed down as part of a CF push, so the information on your desired LRP. There's not really a append metadata field there, but I could envision maybe if you had some other process as part of your CNI plugin that you would have those port mappings or have that information to, then perhaps your CNI plugin could call out to that. Okay, excellent. Maybe that would just be my first thought off the top of my head. Thanks. I'm not sure if this is a good question for this session. It's about if networking in general. It's been there for a while already, but still there is no way to define networking rules as like make them important. You need to run a call to add the rule, and especially this becomes a problem when you need to do a blue-green, or even like to put your rules together with your code somewhere to, like you need to script it basically every time. And it causes some, so any plans to fix that? Yeah, yeah, yeah, any plans. Yeah, I know it's definitely, this has come up multiple times. Like, you know, for example, could we define policies like the app manifest or some other level, so it's not just a call via the CLI or API. I would suggest going to the networking office hours right after this, and again, you know, raising your hand. I'm not sure currently where it's falls on the priority list, but you know it is a very common request. So I think, you know, just raising your hand again. Thank you very much. Okay, let's thank Enjirai again. Thank you. Thank you.