 All right, welcome, everyone. Thanks so much for attending. My name is Eric Maugham. I work at Pivotal on the core Diego team. And today, I'd like to give you an overview of what Diego is in the context of Cloud Foundry and an update on some of the things we've been working on over the past year or so. So first off, let me tell you a little bit about what Diego is. In essence, it's Cloud Foundry's native container runtime, the system that's responsible for coordinating containerized workload across to Cloud Foundry deployment. So just a brief history of Diego. We started development on Diego almost four years ago, January of 2014. And a year and a half later, we declared ourselves generally available, suitable for production use. We finally cut a 1.0 about a year ago. After we validated, we could run a quarter million containers in a realistic CF environment. And that started the clock on the deprecation schedule for the previous container runtime, the DEAs, which were retired this past May. So the ecosystem is now fully consolidating on Diego as the core container runtime for Cloud Foundry. So today, I'd like to give you an overview of how Diego and its components fit into Cloud Foundry and coordinate with the other subsystems to keep Cloud Foundry running, give you some perspective on how those core components are typically deployed and how they've been evolving over the past year or so. I'll then give a few updates on some of the things we've been working on in support of broader features across Cloud Foundry, including what Diego does in support of isolation segments, which you may have been hearing about, and some really interesting new security primitives that we call instance identity credentials that are enabling secure microservice communication patterns on the platform, as well as some other core features inside of Cloud Foundry that are improving stability. I'd also like to give you a brief introduction to some operator-focused tooling we've been building that we call CFDOT. And then lastly, I'll give a view into some of the next things we're considering working on for Diego in the next year or so. So to start, if you've been interacting with Cloud Foundry either as a developer or an operator, you're probably familiar with things like the Cloud Controller or the Go Router. And you've probably experienced the magic of running CF push and eventually seeing your Build Pack or your Docker-based or now your Windows application running on the cloud. So let me tell you a little bit, though, about what's going on under the hood here. All of these application instances are actually containerized, and they're running under the supervision of the component that we call garden inside of Cloud Foundry. This is effectively Cloud Foundry's container execution engine. So it knows how to run a container on a local host and manage the processes inside of it. But intentionally, it's not designed to have visibility into the broader distributed system around it, and that's where Diego comes in. So the Diego system manages these instances of garden running across hundreds or even thousands of container hosts and understands the lifecycle of the containers that it's managing in a Cloud Foundry context as well as trying to decide where best to place those across those hundreds of hosts. So Diego relies on some external dependencies to operate. Principally, it requires a consistent data store to keep track of all this distributed state. So nowadays, that is a SQL database, such as MySQL or Postgres. It also relies on an external system for some amount of component coordination and discovery. So for the past few years, that's been console, and we've been moving away from that and introducing a new component to Diego that we call locket to help deal with some of the lock coordination among those components. And that, too, relies on this consistent data store in the form of a relational database to operate. And then finally, Diego also brings in some additional systems, most notably the SSH access system that you can use to get interactive access to your application containers. OK, so with that overview in mind, I'd like to give you a picture of what a typical Diego deployment inside of Cloud Foundry looks like. So this is something that you might see deployed with, say, CF deployment. If you're using that to create your Cloud Foundry deployments with Bosch. So if you've been looking at any of those deployments, you're probably most familiar with the VMs or instances that we call the cells. And so these are the container hosts within a Diego deployment. So naturally, they're running a copy of Garden to create those containers and to run processes inside of them. But as I mentioned, Garden doesn't know anything about this broader distributed system. And that's where Diego comes in. So on those cells is co-located one of the core Diego components that we call the rep. And this is responsible for registering the presence of that cell in the rest of the system, controlling that local garden and telling it what containers to create and what ones to destroy, and also keeping track of virtual allocations of memory and disk as it represents its remaining capacity to the rest of the system. And then finally, the rep does some amount of asset caching as it downloads things like build packs and droplets to run inside of those containers. Well, let's say you're a client, such as Cloud Controller, and you want to tell Diego to do some work. You're not going to end up talking to these reps directly. Instead, you're going to talk to the public API of Diego. And that's presented through the second core component called the BBS. So that presents that public API for clients. But it also understands more of the details of the lifecycle policy for the types of work that Diego runs. So those are principally what we call long-running processes or LRPs and tasks, which are one-off pieces of work. So when clients come in, they tell the BBS API a description of the work that they would like it to run. But when the BBS goes to run it on the cells, it doesn't talk to them directly. Instead, it delegates to the third core Diego component called the auctioneer. And that auctioneer is responsible for communicating with all the cells that are registered and deciding an optimal placement for new units of work. OK, next I mentioned that we are weaning ourselves off of console for some of the component coordination. And so the final core component of Diego that I'll mention is this new locket component. And that presents a lower level API for registering locks and presences to the rest of these components. So the auctioneers can coordinate over having only one of them running at a time so that they're not stepping on each other's toes as they try to distribute work. Finally, since this is a complicated distributed system, the BBS also conducts a periodic convergence assessment on the deployment. So it compares the desired state the client's told it about with the actual state that it thinks is running on those cells and fixes any discrepancies by stopping work that's not supposed to be running anymore or starting new work that has disappeared. OK, so let's now further zoom in on one of these cell VMs to see how that's been evolving over the past year or so, especially in conjunction with new containerization standards that have been emerging in the broader ecosystem. So we'll take a look at this cell VM. And in the beginning, it was simple. There was just the rep and garden linux. And the rep told garden what to do in terms of making containers. But then about a year, a year and a half ago or so, the Open Container Initiative started their first standardization project called OCI Bundles. And these are a lower level description of how to run a containerized process on a host such as Linux. And along with that came a reference implementation called Run C. So the garden team looked at this and said, this is a great opportunity to stop owning a lot of this containerization logic inside of garden linux and instead to start delegating to this community standard for creating containers. So they took garden linux and transformed it into a new component called Guardian that knows how to delegate to Run C. So together, these form garden Run C, which is the replacement for garden linux. And I think if you have an up-to-date Cloud Foundry deployment, you've converged on that as your container engine now. So in parallel with this, the CF persistence team started up and has been working inside of the rep component to teach it how to talk to volume plugins so that containers can connect to distributed file systems such as NFS. And that effort has been becoming standardized in something called the container storage interface. Likewise, there have been networking interfaces emerging in the community, most notably the container network interface. And so the CF container network team has been working with the garden team to take all of the networking complications out of Guardian and instead move them into something external that operates over the CNI networking interface. Finally, the second standard to come out of OCI deals with the layout and format and representation of images, the root file systems that constitute containers. And so a parallel team to garden has been operating called GrudFS that's building tooling and an image plugin to fit into Guardian to manage that. Again, taking more logic out of Guardian into this smaller component that is adhering to these community standards. So the picture of the services running on a Diego cell has gotten vastly more complicated over the past year or so, but it's really reflecting all of these standardized interfaces that are emerging in the container ecosystem and that we're trying to take advantage of. So I know there's a bunch of other talks on this track in particular that are discussing those efforts and where we see that going in the next year or so. So that's very exciting. Another exciting thing is that we're bringing all of this to Windows, too. So the Garden Windows team has been looking at this OCI bundle standard and saying, well, we can make a Windows implementation of that, too. And they've done so. They call it WinC. And even today, you can experimentally run Windows cells on Windows 2016 that will make Windows server containers. So we've got another talk on this track discussing that effort today. So changing gears, I'd like to tell you a little bit about some of the features that Diego has helped enable inside of Cloud Foundry over the past year or so. And one of those is enabling placement for this feature that we call isolation segments. So isolation segments are this capability inside of Cloud Foundry of having a dedicated set of compute and now routing resources going to a particular partition of workload running on the platform. So if you need really strict isolation at the infrastructure level, maybe you want to give application instances more dedicated CPU resources or access to some special device or something like that, then isolation segments are intended to address those placement concerns. So let me give you a small example of how Diego fits into this in terms of deciding placement for those application instances that are in different isolation segments. So one aspect of that is knowing which infrastructure resources are available for placement. And so for Diego, those come in the form of the cells as compute resources. So here, we have initially just a Linux cell running SayGuard and RunC. And when we're setting up an isolation segment, we make new cells, but we tag them with special placement tags, which I'll represent with different colors. So we're now deploying a new Linux cell, but it's got this green tag on it, which means it's going to form a different partition, so it's going to be in a different isolation segment. We can also do this for window cells. Let's add a window cell that's tagged green. And we'll add another window cell that's tagged with a different color, this purple one. OK, so to see how this plays out, we now need some applications that are assigned to those different tags representing their isolation segments. So to start, let's say we have two instances of a Docker image-based app, and it's untagged, so not assigned to a specific isolation segment, which means it goes into the default one. And then for variety, we'll have a Windows app that's tagged with that green tag, and a Ruby BuildPack app that's tagged with this purple tag. So Diego's job from this perspective is to place these application instances where the operating systems and those placement tags both match. So looking at these Docker instances, there's only one cell that they both can land on, namely that original Linux cell that was untagged. So Diego will put both of those instances on that upper left cell. Moving on to the Windows app, there's only one Windows cell that's also tagged green in this deployment, so it's got to place that instance on that one. And then finally, Diego's going to look at this Linux-based Ruby BuildPack application that's tagged purple, but it's going to find that there are no Linux cells that are also tagged purple. So it's not going to be able to place that one. So maybe that's what the deployment operator intended. They didn't want to provide any Linux resources in that particular isolation segment. So it's a runtime error to run that kind of application in a space assigned to that segment. OK, one other feature that I'd like to tell you about that Diego has introduced in the past year or so, and that we're really excited about in terms of enabling more security-focused and stability-focused features on the platform, is something that we call instance identity credentials. So let me tell you how that's set up. Again, let's take a look at one of our typical Diego cells running the rep. So when the rep is making a Cloud Foundry application container, it has a set of identifiers that it knows are associated inherently with that application. So especially in the presence of container networking, each one of those application containers is going to have a unique IP address on the container network. And so that's tied to the identity of that container throughout its lifetime. And then likewise, in a CF context, we know that that application container corresponds to a particular application GUID coming from Cloud Controller. And then Diego even assigns each one of those containers a unique instance GUID that uniquely addresses it throughout the deployment. So before the rep starts any processes inside of that container, it's going to create a new certificate key pair and then supply it inside of the container so that when we start application processes inside of it, they'll have access to that. And they can use it for, say, TLS communication. So let me show you how the contents of that certificate metadata encode this notion of identity on the platform. So if you've ever looked at that certificate metadata, you'll know that one of the most important fields is the subject distinguished name. And that's where we encode these CF-specific identifiers. So that application GUID comes in as an organizational unit. And the instance GUID comes in as the common name in that subject distinguished name. We also provide those identifiers as subject alternative names so that IP address that's uniquely assigned to that container is present as an IP SAN. And the instance GUID is present as a DNS SAN. So one other feature of these certificates is that they're, by default, very short-lived. So let's say that we created this container right at the start of this talk, so 940 UTC. Well, this certificate, by default, is going to be valid only for a day, so only until tomorrow at about this time. So then, of course, each one of these certificates has a unique serial number associated to it that's randomly generated. OK, well, since the certificate expires, we don't want to leave this application hanging high and dry when that day is over. So the other job of the rep is to continually rotate these credentials to generate new ones with a new validity period, and to make sure that they're continually supplied to the application. So right before this certificate expires, the rep is going to make a new one, so it's going to contain the same subject identifiers as the previous certificate. And it's going to have a new validity period that overlaps slightly with the previous one, and then, of course, a new unique serial number associated to that certificate. So then the rep moves that into the container, replacing the previous set of credentials, and then it's, say, up to the application process to reload those or to detect a file system event to know that those certificates have changed. OK, so you'll be hearing in some other talks today about other applications of these credentials. One thing that, if you were at the unconference last night that Shannon Cohen mentioned, is that we're using these to make sure that the go router is always talking to the correct instance, the one that has a registered endpoint for. And then one other application that we have in mind is allowing applications to talk to configuration servers, such as Credhub, to securely retrieve credentials that only they are authorized to retrieve. But I'd like to give you a more intrinsic example about how this can enable better security between microservices communicating directly on the application platform. So to illustrate that, let's have our typical CF deployment and focusing on the routing aspects of this. Here's the go router, which allows ingress for that HTTP traffic to application instances. Let's say that we have a fairly conventional two-tier web application. So we've got, say, a front-end app. Maybe we've got a couple of different versions of it running. And then there's a back-end app. But let's say that that back-end is doing some more secure processing. Maybe it's handling credit card numbers or something like that. And we don't even want it exposed to the public internet as a result. So we can achieve direct connectivity from the front-end applications to the back-end via some of the container networking access policy directives that are now baked into the CFCLI. But because of the sensitive nature of whatever this back-end application is handling, maybe we want to be even stricter about security. And we want the front-end instances to talk to that back-end using mutual TLS, and even for the back-end to be very particular about which applications it authorizes to make requests to it. So to illustrate that, let's put the application GUIDs for these two different copies of the front-end on the screen here. And let's say that the back-end is configured only to accept requests from the new copy of the front-end, so with this A83 application GUID. OK, well, now let's say that we're sending requests for our great website. Let's call it nido.com to the go router. And because we've arranged the routes correctly, those are going to go to the new copy of our front-end app. And in turn, they're going to connect securely using these application instance identity credentials to the back-end. So the front-end is going to see that it's connecting to the back-end that it intended to, something that's presenting a certificate that's signed by that application certificate authority. And the back-end, in turn, is going to get that automatic check through the TLS handshake. But then in its own application logic, it's going to check that the app GUID coming from this client matches the one that it's configured with. So in that case, it does, because the application GUIDs match, and so the request succeeds. Well, let's say that we left an old version of our website running, and it's pointing now to the old copy of that front-end application. But we didn't update the back-end. In fact, we intentionally took that application GUID out of its authorized list. So that front-end application can talk to the back-end. It may handshake just fine, knowing that it's talking to the correct one. But the back-end can refuse it and say, wait, you're not on the list to get in. So that request is going to fail. So I've put some example-going applications that illustrate these interaction patterns up on GitHub in this TLS example apps repository on my GitHub account. So if you're interested in playing around with that, all you need at this point is an up-to-date Cloud Foundry installation running container networking, as well as opting in to generating these instance identity credentials for all of the containers. OK, the last main topic I'd like to tell you about is some additional operator-focused tooling that we've been developing over the past year or so. We call it CFDOT, which stands for the CFDiego operator tool kit. So this is a command line tool intended to interact with the various Diego APIs. So to date, we have commands that address the BBS API endpoints. So if you want to inspect and manage your long-running processes or your tasks, and we've just added support for the API endpoints on the Locket Service. So if you want to dig down and inspect the locks and presences that the rest of the Diego components are coordinating over via the Locket API, you can use CFDOT to do that. Coming up next, we're planning on making a few refinements and improvements to complete and flesh out the API that the Diego cells present through the reps. So we'll be adding corresponding commands to get that information. And at that point, we're well poised to have some higher-level commands that give you more of an overview of what's running in your deployment. So you may ask, why do we need this kind of command line tool to interact with these APIs? One of the reasons is that these APIs tend not to be very friendly to things like Curl. By default, we require all of these APIs to themselves, require mutual TLS to communicate with them. And then once you get past that hurdle of supplying the correct certificates, then you're dealing with protobuf encoded data in more of an RPC calling style. So it's not restful at all. So having something that intermediates that and instead just gives you a friendly stream of JSON objects to pipe into other programs such as JQ is really a benefit to operators. And in fact, we even have a Bosch job inside of Diego release that deploys a compiled CF dot executable and puts it on the path along with JQ. So you can jump on to a VM and just get to work debugging, figuring out what's going on inside of your Cloud Foundry deployment. So I'd like to give you a couple examples of some processing pipelines that you can build just ad hoc using this tool and then JQ to do some processing downstream. So for one example, maybe you just want a quick count of all the number of instances that are in different states running as containers on Diego. So you can do that as follows. CF dot will let you dump out the information about all those instances, which we call actual LRPs. And then you can run it into JQ and do some quick aggregation based on the state that's reflected in all of those records. So here's an example that I pulled from one of the release integration environments, the infamous A1, if you've heard of it. In this case, we've got about 300 running instances, but about 40 that are backed off into this crash state, and three that aren't running anywhere at all, that apparently we couldn't even place for some reason. Maybe they were in the wrong isolation segment, or we ran out of resources because the cells were too small to accommodate the size of the work they needed to run. So here's one more example. Let's say you've got an entry in the routing table for the go router, and you want to track down the application GUID and index that that endpoint corresponds to. Well, that information, too, is reflected in the actual LRPs presented in the BBS API. So again, you can dump out those and then pick through them to find the host IP of that cell VM and the port that's registered for that application in the routing table. So again, here's an example from that A1 environment with a particular IP and port. And it allows us to narrow down to a specific application GUID and even the index of that instance. So if we then need to restart that instance or try to find something about it in Cloud Controller, we now have that core identifier from the CF context. OK, and then finally, I'd like to mention a few things that we're considering for Next Steps in Diego over the next year or so. One thing that we're very committed to is having the appropriate support for zero downtime or rolling updates inside the platform. So we did some initial exploration of that work over the past few months. And we've been discussing with the CAPI team how best to implement that so that we can give the correct routing availability guarantees. So figuring out how we're going to coordinate as necessary with the routing tier to bring this out is really the next challenge to bringing the solution to all of you. We know that a lot of the blue-green and rolling deployment scripts that everyone is using are doing this kind of routing gate checks to make sure that they're not destroying the old copy of their application before the new one is up. And so we'd like to make the same kinds of guarantees as part of the platform. We'd also like to better understand how we can place large application instances while requiring operators to maintain less headroom in their deployments. So that may come in the form of doing things like active rebalancing, moving application instances around, or dynamic over-commit on cells. And again, if this involves moving instances around dynamically on the platform, we'd really like to make sure that we're maintaining the kinds of availability guarantees for those application instances, both in terms of their intrinsic health, whether or not they're running and whether or not they're routable. Next, we've definitely seen some cases where certain individual cells or maybe the entire cluster has gotten kind of overwhelmed with the amount of work that it has to do. So we've put in some crude controls to gate those if you know specific things about your environment or if you're reacting to past failures. But we'd like to see what we can do to make that kind of back-off more automatic based on more information coming from the cells. As you may have noticed, it's been almost a year since we cut Diego version 1. And so we'd like to cut another major version pretty soon in part to simplify some of the deprecated properties and API endpoints that we've had over the past year and also to reduce some of the configuration choice and complexity. So things that we're considering requiring for V2 are making components always communicate with each other over TLS. That's always been the recommended configuration for any production deployment. But it's been an option for development. We found that with all of the new tooling enhancements with the Bosch CLI and CF deployment and things like Credhub, that generating and managing that zoo of TLS credentials is much less burdensome at this point. And so it's already the default for, say, a Bosch light deployment that you're deploying with CF deployment. And so just requiring it for all the components seems like a sensible measure at this point. And then we'd also like to continue weaning ourselves off of console. So we're considering making a locket a required component running in V2 so that the other components wouldn't operate without it deployed. And then, of course, we're always committed to any stability and security improvements we can be making. And we really value all of your feedback from the community in terms of identifying those. So we can't observe every failure about the system firsthand. And so if you encounter those, please let us know about them. So for the next few minutes, I'd be happy to take some questions. And otherwise, thank you so much. I'm emolmepivotal.io, emolme on GitHub, and in the CF open source Slack. So please drop by the Diego channel and ask us questions. Yeah. Regarding instance identity certificates, have you put more thought into how you might delegate signing to an external certificate authority if the enterprise you're working with has got such a thing? So do you mean if you as an operator want to supply another CA to generate those? Yeah, exactly, yeah. Yeah. So theoretically, that's configurable inside of the Corbash release. So each one of the cells just needs an intermediate CA and corresponding key to issue those certificates. So if you can generate that from some parent certificate authority, then you can just plug it into the Diego deployment. And it'll generate things associated to that chain. And then we also have various capabilities of installing root CA's inside of the containers themselves. If you're using build pack apps, it's very easy to do that just to bake it into the CF Linux FS2 root FS. But then we can provide them on the side for other containers where we maybe don't know so much about the root file system. Yeah, Marco. So in your mutual TLS example, you were using app GUIDs to validate that the request was coming from the right application. I guess that's in most cases not what I actually want to be using, right? Probably I want to allow traffic from all applications coming from a certain space or a certain org. I maybe want to use the name of the application so I don't have to update every time the GUID changes and so on. Is there anything thought about or planned already in that direction? Yeah, so it's definitely something we've considered. I think it'd be very easy to provide those Oregon space GUIDs as additional organizational unit identifiers in the subject. So that's probably the next logical step to take there to enable less granular and less fiddly authorization primitives for those certificates. Dealing with application names is a little bit more difficult because those are mutable in the CC API. So you could end up changing it, and then we'd have to propagate that update down to all the certificates. But because we are able to generate them dynamically and to replace them in all the containers, that wouldn't be infeasible. We wouldn't have to restart all the containers when their name changes. So I think that's something that we could do if there is sufficient interest in the future. Eric, can you say something about you said, in looking forward, you're trying to, well, exclude stressed cells from placing new containers onto that, like, any word about how you intend to do that? Like, how do you measure when a cell is stressed? Like, I don't know. Just lay out what you guys are thinking around that. Yeah, so that's a good question. How would we measure that degree of stress on the cells? So some things we can do are looking at failure rates for the other dependencies that, say, the rep has, like talking to Garden, making containers, deleting containers. Since the rep is also involved in downloading various assets, if that's going slowly or timing out, understanding those failures and folding them into something more like a circuit breaker system for the cells. So the hard part of that is tuning that kind of information correctly so that you're not backing off of cells too quickly when they're still fully capable of doing work. They're just going a little bit slower than you might expect. All right, well, I think that's it for questions. Thanks so much for attending, and enjoy the rest of the talks. Thank you.