 Thank you everybody for choosing to come to our session to close out your CF Summit 2018. You really appreciate it. I'm Angela Chen. I'm a software engineer at Pivotal and with me is Usha Ramachandran who is a product manager at Pivotal and today we'll be talking to you about CF networking, policy, service discovery and beyond. So before we begin, hopefully you've seen this because otherwise that probably means you haven't attended any sessions yet but in case you need another reminder where to go in case of a fire, here it is. And with that we'll get to the agenda. So first we're going to talk about container networking challenges we've faced in the past, sort of what container networking is and problems that we've more recently faced before introducing polyglot service discovery which is the latest epic of work that we've implemented. We'll do an architecture overview into how we've actually implemented this feature before doing a demo. We'll then talk about advanced features for service discovery, sort of what's next, before talking about other enhancements we've done in addition to polyglot service discovery. But before we get to any of these items, we just want to give a quick shout out to everybody on our team because without them we wouldn't be able to present any of the work that we're presenting on today except jokes on me because I'm no longer on the team. So I probably should call that out as well and that's why I have to wear this hat to try and claim some street cred. And with that we'll get started and Ushua will take it away. Thanks Angela. Seriously thank you all for staying back. We were just talking that this is the last session who's even going to be here so it's great to see a sizable audience here. So most of what we're going to cover here, at least in the beginning of the presentation, shouldn't be new to anyone. It's just a recap of some of the challenges that we faced with networking on Cloud Foundry and how we tackle them so far. So just to kind of set the foundation for what we're going to be talking about. So typical Cloud Foundry networking used to have every single app go through the go router if it wanted to talk to another app. And this is not only inefficient in that you have this additional hop to go all the way out of your PCF foundation all the way to your load balancer but it also poses a problem in terms of applying security policies in case you wanted to protect app B from other apps trying to access it. So those were two of the challenges that we initially faced with Cloud Foundry networking and we wanted an ideal state where apps could talk directly to each other. You could identify the source of app traffic so app B would know that app A is talking to it and you could also configure policies so that you knew exactly which apps were talking to each other. So hopefully none of this is new to anyone because we've had this for about a year now. CF networking release has been out since last year in August and what we did there is basically kept the existing use cases. So if you have ASGs, if you have traffic coming through the router, if you have traffic going out to services none of that changes. ASGs for those of you who are not familiar with them are application security groups which are ways to configure policy for traffic going from CF apps to external things. What we did do is actually embrace CNI which is the container networking interface which is a standard and what that enables us to do is allow third parties to come in and plug into Cloud Foundry. So an example of that is the recently released NSXT integration with Vividel Cloud Foundry that leverages this CNI plugability. The batteries included version of CNI though is a plugin called Silk hence the spider web on the hat and what we did with Silk was basically to put all containers on an overlay network which runs VXLAN and give every single container its own IP. The next thing we did was add policy. So you now have app-to-app dynamic policy. We have currently support through the CLI and the API to configure this policy but we have heard feedback from you all that it would be a lot better if this policy was in the app manifest. By a show of hands how many of you think that having policy in the app manifest is something that you all would like to see? Awesome. Thank you. The other aspect of policy is that it is self-service so you could as a space developer you can configure policies for your own apps only if your operator allows it though. So that was last year's news. When we spoke at CF Summit in Basel we had this slide about a track of work that we were going to take on and the question we asked then was container-to-container networking is great but if I have to bring my own service discovery then is it a burden on app developers and so typically for any application you need to know what to connect to and what port it's listening on and things like that and so we classified this into two different types of applications so microservice apps that need like some sort of load balancing and just connecting to any arbitrary backend and clustering apps that are more peer-to-peer apps and I'll talk about them a little bit further down but the basic problem that we set out to solve was that today or at that point users had to bring their own service discovery there was no service discovery for C2C on the platform. So we're really excited to have support now for polyglot service discovery on the platform. We have a new release called CFApp SD release and what this release does is it basically uses Bosch DNS under the hood and you can use Bosch DNS to look up the DNS name for your destination app. We also created a new domain so the app start internal domain is a hard-coded domain that is created by Cloud Controller once you have enabled this ops file and this can then be used by app developers to map routes to it. So as an app developer when you go in and you can configure a map route from for your app to this domain it automatically classifies that route as an internal route so this may be like brand new terminology for you also we call routes going through the router an external route and routes that use direct container to container communication as internal routes so any route that is mapped to the app start internal domain is going to be an internal route and when you do a look up for this route you basically get the actual container IP for that container that you're trying to connect to so on the container network and as I was going to do a demo showing you exactly how this works let's just go through a couple of use cases for these so let's take the first use case which is the most common use case that we see which is for secure microservices so you can have your front end microservice that actually has an external route but then everything else is basically an internal route that has c2c policy controlling what can talk to what and it's that traffic never leaves your foundation it basically stays within cloud foundry so that's what that looks like the second use case that we wanted to tackle was that similar to how developers are used to the current map route command which can map multiple applications to the same route we also wanted to provide that same experience for internal routes as well and so you can map a new version of your app to the same route validate that it works and then take the take the old app out of commission right so this is a very typical workflow for blue-green deploys on the platform today the third use case and this is one that we're eager to hear from the community if you have these use cases is for clustering apps so if you have apps that need peer-to-peer communication or apps that use TCP and UDP in order to communicate between each other and need to access individual instances you can use the index based routes so every app in addition to get getting that overall load balance route also gets a zero dot or a one dot route for each instance so these are typically apps like aka clusters or lang clusters or other types of peer-to-peer applications so those were just some high level use cases and a high level overview of how the how we envision service discovery to work Angela is going to walk us through a deep dive on the architecture and then show us a demo great thanks Usha so we'll do a deep dive into what changes were necessary to have service discovery be a part of the CF platform by looking at the use case of pushing a brand new application with an internal route map to it so a lot of this may seem familiar to you but we'll point out parts where we've introduced new functionality and we'll also highlight new components in green whereas existing components are in blue so let's consider the case you push an app with an internal route and this information is passed to Cappy Cappy passes the information not only about the application that you want to be created but also the internal route to Diego over a desired long running process Diego then decides where to schedule your application and communicates with the rep on that Diego cell in order to have the application actually created and the rep of course communicates back that we have this application created with the actual long running process that has information about the container IP that was assigned by container networking at this point we haven't touched the internal route yet so previously existing on every single Diego cell was a round emitter for external routes so routes to that would be sent to the go router or the TCP router and the route emitter previously would query the VBS for desired LRPs and actual LRPs to get information about host name and the cell IP and high number port and do a mapping of the two we've modified the route emitter to now also do the additional work of getting internal route information so the internal host name and mapping that instead of to sell IP to container IP the route emitter will now emit this internal route information to NATS which is a message bus on a different NATS message body so this way we keep the existing functionality for the go router and TCP router only know about external routes because they're listening for a specific message header whereas internal routes are being emitted on a specific internal routes NATS message and we've introduced a new component called the service discovery controller which subscribes to NATS listening on that message header so it's only getting information about internal routes and doesn't know anything about external routes for either the go router or TCP router so then when you have an app that wants to communicate with another application on the platform the app will ask to connect to let's say app a dot apps dot internal this request will be resolved by Bosch DNS because you're trying to reach something with an app's dot internal domain Bosch DNS has now been configured to know to call out to the Bosch DNS adapter which is a new component that runs on every single Diego cell the Bosch DNS adapter acts as a conduit between Bosch DNS and the service discovery controller so it takes the information provided by Bosch DNA Bosch DNS namely the host name and calls out to the service discovery controller to get back the correct container IP and it does so using the Envoy v1 API the reasoning for this we'll get to a little bit later in the presentation as Usha looks at what's next for us in terms of advanced service discovery features and then of course after the service discovery controller passes back the container IP this gets propagated back until the app will then try to connect to this destination with said IP it's important to note here that the application will still only be able to successfully connect to the destination app if policy has been configured as well so we're not equating discoverability to being able to successfully connect and so that's something that we're still abiding by our previous contract that you need to create policy in order to have apps be able to talk to one another so now that we've done sort of a little bit of a deep dive let's do a quick demo so how do I get to the other so yeah I'm so bad at anything okay cool okay so for the sake of time we've already set up a CF Oregon space that has three apps on it we have a front end and two back ends if we look here we see that the front end has an external route front end you shan't go see to see cf app.com whereas back in A and back in B both have only internal routes so routes with apps.internal again so back in A and back in B which both serve different pictures of cats aren't publicly accessible and so in addition if we look at CF network policies to see the policies that are listed we see that the front end has been set to be able to communicate with back in A but there is no policy for the front end to talk to back in B so if we if we go back and we go to the front end which is the wrong thing and we this is so bad I'm so bad with all of this yeah so the front end basically is just a way to access the back end so you can type in the back end HTTP URL and see whether and see whether or not you're able to connect so in the case if we want to type in back in A.apps.internal on port 2007 we see that we get a picture of a cat because we've configured a policy however if we try to connect to back in B we'll see that we have no idea what we're doing because we have been configured a policy and the connection has been refused so if we go back to the terminal and we add a network policy between the front end and back in B on port 2007 if we try to refresh we get a different picture of a cat because it's a different back end but that's not all so hold on to your clapping for just one second if we look back again we see that both back in A and back in B in addition to being mapped to their own specific internal route have both also been mapped to back end.apps.internal so if we go back to the front end and we say back end.apps.internal we'll see that as we refresh we get some basic load balancing between the two of course it doesn't always go but yeah and now you can clap and so this this shows sort of just the basics of service discovery and what you can do with it on the platform now. Thanks Angela that's the best three-point story ever the cats and dogs demo. Now that you've seen the basic service discovery in action let's move on to what's next for us. Similar to what we spoke about in Basil we were really trying to see like what are some of the things that folks are expecting for C2C to work and now that we have basic service discovery some of the questions we're getting is what about retries what about load balancing what about different types of load balancing algorithms what about mutual TLS and many different things that app developers have come to rely on for their client side libraries to provide so what we're finding now is that we need to provide these these kinds of advanced features on the platform and due to that we are moving towards supporting a service mesh so the basic idea here is that for all these common client side capabilities that you would otherwise have to bring a language specific library into your environment a side car that's that's basically attached to every container would transparently just provide these capabilities so things like you know automatically doing retries or health checks being able to do more like weighted routing many different types of capabilities so as part of that like the side cars are all managed by a control plane so that's just a high level overview of what a service mesh could provide to you and I'm going to get into next the service mesh that we're integrating with how many of you here are familiar with the Istio okay about half the room so that's awesome so we only we're not going to go into the details of the Istio architecture here but we Cloud Foundry is integrating with Istio both both on the routing control plane as well as for the east-west control plane and the container networking team is working on adding some of the east-west capabilities using Envoy in the data path and being controlled by pilots so if you remember the architecture diagram that Angela shared with you all we had this plan when we started writing polyglot service discovery which is why we chose to use the Envoy APIs to talk with our service discovery controller so the basic idea here is that we would use Istio Pilot as the control plane that would basically set up all the policy for connecting your apps to each other of course this is an iterative process so we'll start with providing some simple additions like adding things like health checks and retries which seem to be pretty much table stakes if you all have any features that you know are must have for for you all then please feel free to reach out to us and let us know so that we you know what to prioritize so that's kind of where we're heading it's still early days we will be setting out a feature proposal pretty soon at the moment our team is working on trying to understand more about this and also fleshing out a feature proposal but this is just just a schematic of what this might look like and the components that you see are basically the same on the cell there's an Envoy now in every app container and and the difference is that the control plane is now going to be pilot instead of the service discovery controller as I mentioned this is all stuff that we are currently working on so it is subject to change and hopefully at the next summit we can you know stand up here again and and you know give you a little more fleshed out version of this cool so I'm going to hand it back to Angela to talk about the other enhancements that we made great so as this talk promised we've talked about policy talked about service discovery let's talk about beyond other things we've worked on since the last time we spoke to you that have improved CF networking on the platform so the first thing that we want to highlight is the creation of a new release called silk release so if you've looked at either our github or been to any of our presentations in the past this diagram may be familiar with you it's pretty big don't get stuck in the weeds what I really want you to focus on here are the colors so everything in blue are components that we don't know we don't really care about for the topic of silk release everything in green here are the core components of CF networking release so things that should always be in your CF deployment and the components in red are the swappable bits so as Usha mentioned before we abide by the container networking interface which allows you to swap out the red components for another CNI solution such as flannel calico so on and so forth and so we've decided in order to better support third party integrations to split CF networking release into two parts CF networking release now contains only the green core components that any third party plugin would need to integrate with whereas silk release is now all of the red components here everything that's swappable this shouldn't make an impact to the operator we're updating CF deployment accordingly and so the chain the operator and app dev shouldn't see any changes this is really to support our third party friends out there the second thing that we've worked on is supporting multiple interfaces so in the past we assumed that all container to container networking traffic should go over the default interface which is usually eth0 in this case however we heard use cases come in that you could have scenarios where you have multiple interfaces so in addition to an eth0 let's say you have another interface eth1 and people wanted the ability to choose which interface their container networking traffic would actually go over rather than having us pick the default interface additionally operators wanted to be able to choose if they had multiple interfaces which interfaces had application security groups applied to them and so we've now introduced support for operators to do exactly that by either specifying a Bosch network name or the specific interface name so that this scenario is now enabled that you can have container to container networking traffic go over eth1 instead of the default interface of eth0 in this case and the last thing that our team is working on and continues to work on is cni chaining support so again cni the container networking interface recently added support to be able to chain your cni plugins so let's say you have a main plugin that sets up all of your networking for your container and then you have a secondary plugin that you want to provide some additional feature like let's say bandwidth shaping for your networking traffic in that case you may want to chain your plugins instead of adding bandwidth shaping into your existing main plugin so if we look at this sort of layer plugin cake diagram we see here that in terms of container creation and networking Diego will call it to garden run c garden run c calls out to the garden external networker and the garden external networker the core component that we own will then call out using the cni api to our cf wrapper cni plugin and that again calls out over the cni api to the silk cni plugin so there's many layers in this picture but by adding cni chaining support you can now envision a world where we have silk cni plugin return the result to the cf wrapper cni plugin cf wrapper cni plugin returns the result to the garden external networker and the garden external networker then passes that result to an additional plugin so let's say again a plugin that deals with bandwidth shaping that can then just implement that little feature and so that's reusable and doesn't have to be added to every single cni plugin out there you could even imagine a world although no promises here where instead of having the cf wrapper cni plugin call out to the silk cni plugin these plugins could be chained at the garden external networker layer as well so really taking advantage of new features in the cni specification and with that those are just a couple of the additional things that we've been working on beyond just polyglot service discovery and I'll let Usha wrap us up thanks thanks Angela so just a quick recap of what we discussed today we do have polyglot service discovery out there and we're really looking for feedback so give it a try and let us know what you think it is running on pws right now and it looks great so far and we'll be promoting this to being generally available once once we see it running for some time we also spoke about today splitting up cf networking release into the core and swappable parts we will be cutting a cf networking release to auto pretty soon to make that clear that there are some manifest properties that are no longer going to be present we also added support for multiple interfaces for some cf operators and then finally there's a lot of exciting stuff to look forward to as we start to add some more advanced client side features to polyglot service discovery I'm going to leave you with some links so we have a feature narrative we also have a blog post detailing a lot of what we covered here and then the demo that Angela shared there's a GitHub page that you know you can feel free to take a look at it write your own demos and let us know what you think about it we would love it if you stop by container networking on cloud primary slack let us know what you think if you have questions comments anything but thank you all I think we might have maybe two minutes for questions if anyone has questions if not thank you all for staying for this last session so the question was could you reuse the same plugin or I guess the question is could is the question should you can you use the silk cni plugin for cfcr or can you share the same plugin yeah so the question was first can cfcr use silk as a cni plugin and the second question was could you have silk running in both cf and cfcr sort of as a shared networking experience so for the first question you could theoretically use silk cni plugin as the cni plugin for cfcr we actually about a year ago on the team at the time played around with it got it working with a few couple tweaks so that's definitely an option in terms of if you could have the same sort of shared networking across cloud foundry and I guess cfcr they would need to share a common they would need to share a common control plane component which I think isn't easily set up at this time yeah I think so the answer to your question is yes it's possible but it does need some work to get it done any other questions so the multiple interfaces support was added for a very specific open source community member who who typically runs their networks with a multiple interfaces so one is a management interface and one is sort of the data interface and so they wanted us to support that use case so that's the typical use case to run it if you have an environment where things are kind of segregated in that manner yeah so typically your vm interface for management would go to a different physical nick and your vm interface for data would go to a different physical nick so so the thing that we added support for was was simply choosing which interface to use and we rely on the Bosch network's support for actually configuring the interface so our our kind of involvement with that ends at the Bosch network layer were there other questions yes yes they should be uploaded to this sketch they will be soon yeah that is correct they're they're erect so the question was for Bosch DNS are you all just publishing a records the question was would we do SRV records with ports as well we did consider SRV records but we we we did not choose to go that route because not all clients support them yes we do plan to support ports but we'll support them by through the use of the service mesh so as we start leveraging Istio and pilot we'll start exposing ports through that the other questions I think we're done thanks again guys