 So I'm here to talk about Diego. I wanted to first make a big shout out to René Fringe, who designed the go-go for what I have appropriated very lovingly to try to give us some iconography to play with today. So I want to tell you about what's new with Diego. And last year, I had this wonderful opportunity to tell a story. It was a story that had, at the beginning, a problem and then a proposed plan that culminated in solution. It was a story, perhaps, of hope, a new hope. This year, it's complicated. But there's a plot twist, but I'll give it away. Turns out Diego is Lattice's father. So I guess this is really Diego Strikes Back. But let's talk about what we're actually going to talk about. And I just had five things. First, there's just a quick recap of what Diego is, which will oddly lead us to talk about what a container actually is. And then I want to just briefly talk about Diego's evolution, which will take us to Lattice and then talk about the future. So let's start with, what is Diego? So at its core, Diego started off as a rewrite, a rewrite of the Cloud Foundry runtime. And that means the DEA's Health Manager, Wharton, we decided to write all these things in Go. So Go, so Diego, that's the name. That's where it came from. That doesn't tell you what it is. So it turns out Diego is a distributed system that orchestrates containerized workloads. Let's dig into each of these pieces. So a distributed system, if you were to look into a running Diego installation, you would see a pile of VMs that we call cells. Now these are the workhorses. These are the big beefy machines that have all of the containers running all the applications on them. You'd also see a handful of highly available VMs that we call the brain. And these have some functionality that I'll get into in a second. And you'd also see a handful of VMs called the BBS. And the BBS is really a centralized data store that we use to coordinate information in the cluster. It's how the cluster gets to sort of solve the distributed systems problem. And we're currently relying on NCD to give us consistency in the BBS. Now that's the distributed system. What does it mean that Diego is an orchestrator? Well, Diego's orchestration responsibility has really fallen into two things. First is Diego is a scheduler. So when you bring your workload to Diego, Diego will try to optimally distribute it across the running cells. And as more work appears, it'll do its best to balance that workload across the entire set of cells that are running, across availability zones, if possible. Diego is also a health monitor. So if your application crashes, Diego will notice and restart it. This applies on a macro scale too. If an entire cell crashes, Diego will notice and save those applications. But what I really wanna talk about is what it means to have a containerized workload. What is it that Diego's actually running? Well, we have this interesting abstraction. We have one-off tasks. We can run one-off tasks in containers. And we can run long-running processes in containers. Now a one-off task is easy to understand. It's a unit of work that runs at most ones inside a container. A long-running process is a little more complex. We would have a number, n, of long-running instances that we would distribute across the cells for high availability and monitor and restart in the case of failure. This generic platform independent abstraction sort of describes what Diego can do. And here's what we actually done with it. We're able to take your droplet, the product of running CF push, and run a built-pack-based application on Diego. We're also able, using the same abstraction, to run a Docker-based application on Diego. And we're even able to run a Windows-based application on the same Diego cluster. Now what's cool is I sort of previewed this last year, while all of this is working today. And that's very exciting. We think that this abstraction has been very successful. We're seeing it prove itself out. But I think there's a lot of confusion. What does it mean to run these sorts of things inside a container? Isn't container synonymous with Docker? What are these other two things, if that's the case? What is Diego's relationship to Docker anyway? What is a container? Let's talk about that. At its core, a container is about isolation. When you're on a shared host, you have a set of shared resources. And if you're multiple tenants on that host, you want to run your various processes. Now these processes are vying for these shared resources. And the way they have access to them is, of course, through the kernel. Isolation is all about isolating these resources. And it comes in two flavors. There's resource isolation and namespace isolation. Let me dig into the first one first. So resource isolation is easy to understand. You have a single CPU or a set of cores on your shared box. And you have the multiple tenants vying for that CPU. Now in an ideal world, each process is using its fair share of the CPU. But what happens if process A starts to run awry and begins to soak up the CPU on the box? Well, this is bad. Tenant one is taking up more resources than they should. And the other processes, the other tenants are being crowded out. You need some sort of isolation in this multi-tenant context. Well, Linux kernel has this great feature called C groups. And it lets us build barriers between the different tenants. And with these barriers, we can make certain guarantees. The tenant one cannot exceed his or her threshold. The tenant two and tenant three are safe. That's resource isolation. Namespace isolation is similar, but complicated and different. Let me pick an example. Let's think about the process ID. So in Linux, each process has associated with it a PID, an integer that you can use to refer to the process. Now, tenant one, process B, can look at the PIDs associated with its tenant, which is what you want. But it can also look at the processes associated with other tenants, which is bad. That breaks isolation. Again, the Linux kernel comes to the rescue with the PID namespace. This allows us to set up barriers that prevent tenant one from peering into tenant two and three. But it's a bit stronger than that. This is actually a namespace. And each tenant has their own namespace into the PID world. And so they can reuse PIDs without actually conflicting. It takes this global resource and really buckets it up nicely. You're getting close to imagining that each tenant is running its own VM on the one VM. There's other isolators that the kernel provides in addition to the PID namespace. There's the network namespace, the mount namespace, network namespace for isolating network concerns, then the mount namespace for isolating file-based concerns, and the username space to make sure that users and different tenants can't do nefarious things. So what is a container? Well, it starts with isolation. And isolation really is just a series of walls that if you construct together correctly, give you isolation. And I wanna emphasize that this is a feature of the Linux kernel, and it's a very powerful feature. But who cares? What goes inside the walls is what you care about as a developer. And really that breaks down into two things. There's contents, the actual files that go into the container, and there's processes, the stuff you actually wanna run in the container. You put these three things together, that's a container. So Diego runs tasks and long-running processes in containers. And in particular, the implementation of containers that we use is this thing called garden that we built ourselves. Now why? Well, garden is really powerful. Garden allows Diego to programmatically and independently say these three things. It allows Diego to say, make me a container, put this in it, now go run this. And it does this through a platform agnostic API. Garden allows Diego's abstractions to be very flexible and to support these three very different and very important and interesting use cases. So let's dive into them. Let's look at CF push. To understand CF push, you have to embrace the CF push haiku, which is this. Here is my source code. Run it on the cloud for me. I do not care how. So what does this look like? Well we take your source code, we run a task on Diego, and produce something called a droplet. This is where all your build packs are doing their work, and we call this staging. Now what is this droplet? Well you can think of it as a compiled asset. It contains your application and any application-specific dependencies. If it's a Rails application, it has all your gems bundled right in. But that's all it has. And so it can't run on its own. It needs this particular execution context upon which to run. We have a name for that context. It's just a series of files that you need to bring alongside the droplet for it to run. It's our root file system. Current one is CF Linux FS2, which is a mouthful. So how does the droplet run on Diego? Well, the way anything runs on Diego through a long running process, this LRP. So the LRP allows us to specify, hey, I want a container, and to specify the contents of the container. In this case, I want a container that has in it this root file system, CF Linux FS2. And then Diego can say that LRP can tell Diego to download the droplet onto that rootFS, and then spin up the start command. The metadata for the start command comes out of the staging process. So isolation, content, process. That's CF push. And you can see it. If you look at the code, if you look at the definition for a droplet, it has a bit for isolation. I want 128 megabyte container. A bit about what to put inside. Give me this rootFS, download this droplet. And a bit about what to run. It really elegantly brings these three independent things together. All right, that's CF push. Well, how does Docker fit into this? Docker is very different. But it comes down to the same basic paradigm. The contents in Docker are described by your Docker image. It contains the files that you want to run in the container. And the set of processes to run, that comes from the Docker metadata. Now all of this stuff comes from the Docker registry. And Docker really nailed this, right? They've made it really easy to push out an image, to specify what you want to run, to tweak it, and then to just launch a container that runs that stuff. But it's important to understand the isolation bits. That's the UNIX kernel. And you can see this in Diego's LRP. We're just doing isolation. We're asking for a Docker image for the contents, which Docker is a first-class thing that we support. And then we're saying, hey, based on the Docker image metadata, go run this start command. So how does Diego relate to Docker? It's real simple. You can put anything you want in here. And one of the things you could put in there is a Docker image. And once you've got that Docker image, you can run anything you want in here. And one of the things you can run in there is the metadata associated with that Docker image. That's how Docker runs on Diego. Now what's cool is this is really flexible, and be really easy to have app C running on Diego. And that's something that we hope to do eventually. We're not quite doing it yet. All right, that's Docker. Fast forward to Windows. What does that mean? Well, I just talked about all of this Linux kernel stuff. Resource isolation with C groups, namespace isolation with all these namespaces. Well, it turns out you can do something similar with Windows. You can do resource isolation with the kernel job object, and you can do namespace isolation. And in fact, we're running your application in an isolated IIS instance. We're collaborating with Microsoft on this, and it's allowing us to build a CF push experience that's working today. It provides a container experience for Windows 2012 that we believe will only get better with Windows 2016. So you have these two very different platforms. How does Diego communicate with them? Again, this is the beauty of Garden, through one single interface, which means that you can just define a .NET LRP that looks just like your BuildPack LRP or your Docker LRP. It talks about isolation. It talks about stuff to download, which includes information about the Rufus, in this case, Windows, which allows Diego to figure out where to put the LRP, and includes metadata on what to run. So these three very different contexts all run on one orchestrator. It's pretty cool. All right, so let me tell you about how that orchestrator has evolved. So I want to talk about two things here. I want to talk about the scheduler, and I want to talk about an API. So let's start with the scheduler. So we're used to thinking of architecture as this thing that comes in from above and tells us what code to write, and that's true. And at Pivotal, we do a lot of test-driven developments, so we're used to also thinking of tests as something that does that. So you always write your tests first, and your tests influence what code you write. But that's not all that there is to test-driven development. Your code also feeds back into your tests, and you get this nice virtual cycle between tests and code. This is at the heart of TDD. It's all about quick feedback loops. Now, your architecture also informs what tests you ought to write. You need integration tests to make sure all your components work together, but in a complex system, you also need simulations, and you need performance tests to make sure that everything works correctly. We love the fact that these two arrows point back and forth. We love these feedback loops, and we're finding it really important to have feedback loops back into architecture. This is the most useful definition for agile architecture that I can come up with. It's all about feedback loops. It's all about the stuff that you build informing your vision for how to build it. So last year, I made a lot of noise about this distributed auction and simulation-driven development. So this is what it looks like. You have your cells, and with the distributed auction architecture, you have a scheduler on each cell. Now when work comes in, the schedulers, we call them auctioneers, can talk to each other and figure out where to place the workload. Now this was really cool when it worked, and we ran a bunch of simulations. It informed the code and informed the tests, and we ran a bunch of simulations to make sure that it actually worked, and the simulations were running fine at the 100 cell scale. Then of course, we made the more realistic and went up to 200 cells and started to falter. So we added some code and we made it better. And then we made it bigger and it broke again. So we added some code and it worked again. And at this point, we go, okay, how about thousands of cells? And at what cost? Things were getting complex, and it was time for architecture to change. So we stepped back and we did something very simple. We moved to a centralized, highly available scheduler. Mezos does this, Kubernetes Borg does this. It's just simpler this way. Okay. So let me talk about API. So APIs, when you say CF push, you're talking to the cloud controller, which turns around and talks to a pool of DEAs and asks them to stage and run. Now when we started off our mandate was rewrite the DEAs. And so we wanted to do it in a cleaner way. And one of the things we knew was that the left-hand side here was very app-specific and we wanted something a lot more generic. So we built this bridge called the CC bridge that translated from this app-specific domain to this more generic domain. And then we went off and built all of Diego. Now this was working really good, but we made a mistake. I wouldn't say mistake. We started off thinking of all of this as Diego. And because we were thinking of all of this as Diego, we made an interesting decision. Let me phrase it that way. We had the CC bridge talking directly to the database. Now that's fine. It helped us bootstrap and get working real quick. But a database is not an API. So we stepped back and we said, well, really this is Diego. And if this is Diego, then really Diego should have an API. And so we built one, we call it the receptor API. And if you have an API, well, then that's obviously what CC bridge should talk to. Well, now you get an interesting picture. CC bridge, CC, who cares? That's just a generic consumer of this API. What if you had another consumer? Well, that's cool. And that's where Lattice was born. So here's Lattice. You take this picture, you have this distributed system that can run your workload. With this kind of meh, who cares? So you're running my containers. How do I get to them? Well, we realized that if we added the go router layer, we could do HTTP traffic to your containers. And if we added the logging and metrics layer, we could pull out logs and metrics from your applications. Now, what if we took all of this and packaged it up and made it really easy to install, vagrant up or to start a cluster, terraform apply? And what if we gave you a little command line tool to create and manage your applications? That was Lattice. You can run it on your local VM or with terraform. You can deploy to AWS, DigitalOcean, Google Cloud. And thanks to the community, OpenStack, that was a PR. That was awesome. So I wanna talk about two things real quick with Lattice. The first is what is the relationship between Lattice and Cloud Foundry? Again, there's a lot of confusion here. And the second is real quick, why did we do this? So what is the relationship? So Cloud Foundry is really the union of all of these things combined. Cloud Controller, the UAA, Diego, LoggerGator, Go Router, BuildPack, Services, Bosch, all of these things. Lattice comes in right here. It's these three things. It's sort of Cloud Foundry by subtraction, as James Bayer likes to say. So what don't you get with Lattice? Well, you don't get the CC and UAA, which means that it's really a singleton environment. You don't get BuildPacks yet, which means that we're relying on Docker to distribute your bits, which is fine. You don't get services, so you really have to bring your own. And you don't have Bosch. We've made it really easy to deploy, and Bosch just isn't easy to deploy. There are implications to that. It means you don't have rolling upgrades sort of out of the box. You sort of have to figure that out yourself. It's possible, we just don't make it particularly easy. At the end of the day, Lattice really gives you a cluster root experience, and we just want to encourage people to go and play, explore these technologies. So why Lattice? Well, we think it's a useful little barrier solution that solves real-world problems, and we just wanted to get it out there so that people could play with it. We think it makes exploring Diego a lot easier. We feel it's a softer on-ramp to the CF tech stack to just introduce more and more people to Cloud Foundry. And actually, this has been really useful. We're finding it allows us to efficiently prototype new ideas internally. I'll talk about that a little bit, but we have a lot of new initiatives that we're just saying, hey, let's go build that on Lattice, see it work, and then bring it into the platform, which leads us to the future. So what's coming? Well, the first question everyone's asking is when, and I will just gently say, hey, Diego's scope is a lot bigger than just rewrite the DEAs. You can do Lattice, you can do Windows, you can do Docker. Okay, but when are you going to ship? Well, Diego's running in production on PWS. It's handling about 5% of the load. More importantly, it's running all of Pivotal's internal applications, so that's great. Okay, but when can I play with it? Well, it's in beta while we validate our performance at hundreds of cells, and do some internal security work to make sure that all's well. Okay, but when can I play with it? Well, I want you to start using it today. You can start getting us feedback soon. All right, but when will it be finished? Should be out of beta within Q3, probably. All right. Okay, then what? Okay, this is the exciting part. So placement constraints, having placement pools so you can have different workloads on different cells. That's top of the backlog, post beta. CF SSH, I want to SSH into my running container index. Give it to me. Working now, it's working now. The CLI support is on the way. This will ship with Diego. Give you shell access, port forwarding, SCP, all this good stuff. If you're administrating the cluster, don't worry, you can turn it off. But if you're a developer, rejoice. TCP routing, we're kicking this off with GE. It's very exciting. I encourage you to go check out a tools talk. Private Docker registry in collaboration with SAP. Check out Yorgie's talk on Tuesday. Support for persistence, so persistent disk. It's a long-term goal and we've done some experiments that Caleb and Ted are going to report on. Check out their talk on Tuesday. And container-to-container networking. Some sort of overlay networking story. That's a long-term goal. We just don't need it to replace the DEAs. We don't need it for the CF push workflow. But we recognize that it's something that we want to bring to the platform. And it's in Diego's future. And finally, condenser, which is what I alluded to earlier, lightweight build packs for Lattice. Bringing that CF experience, CF push, and finding a minimal subset that's actually useful and fun to play with and bringing it to Lattice. We're excited to do that. All right, that's the future. I have an open house today at 1.30. Come and ask me questions then or right now. And that is all. Thanks. Show us X-Ray. No, no live demos. Real number one. I don't have an environment set up. But I didn't get to talk about X-Ray. I can talk about X-Ray. So we have this receptor API and it gives you full visibility into what the cluster is running. So we built this really cool UI on top of it that lets you just at a glance see what the cluster is running and understand whether there are any problems. Oh, we have a mic. Thank you. Any other questions? How much time do we have? Five minutes? No? Oh, hey. With DA, we had quite a lot of problems with re-balancing clusters when there was recovery from failures, things like that. They said that Diego would be one project that'd be looking at fixing that. Is that a plan for Diego, looking at those kind of distribution algorithms and redistribution, things like that? And improving that? So the question is re-balancing. We have a story for re-balancing in our backlog. We want to do it. We're not going to do it before we sort of ship. And it's something that we actually want to do in such a way that it's just always happening. We don't want there to be a button that you can press to then just magically change the entire system. And so Diego will just sort of naturally, because you're running a 12-factor application, will identify applications that it can move to improve the distribution on the cluster. That's planned, but it's not there yet. So you mentioned Apache Mesos. Could you compare and contrast sort of what the difference is between what Diego's providing and what Apache Mesos provides? What Mesos provides? Sure, so I get this question a lot. Why didn't you build Diego on top of Mesos? And it's a good question. And in some ways we could have, but there were just a couple of key things that we needed that we didn't think we could get out of Mesos. So Windows support was one of them. So we can actually do Windows. That's actually working today. And that's just not really a thing that Mesos certainly at the time could support. I don't think it does yet either. Yeah, so I'd say that. The other thing was that Mesos was really giving us this sort of scheduler piece and there's just a lot more to it than that. And so we would have had to build a lot of stuff in addition to the scheduler anyway. The nice thing with Mesos is that then your scheduling can live alongside other schedulers. And so I could imagine sort of a plugin for Mesos that allows Diego scheduler to just piggyback on that. The only thing stopping us from doing that is priority and time, frankly. These things can all sort of overlay and intermix pretty easily. Sure, so that's interesting. So the question is what about auto scaling? You said DEAs, I'll say cells, right? The work pool that runs the containers. And that's interesting. That would be the first time that we have an arrow pointing from the runtime into Bosch, right? Or whatever is orchestrating your cluster. It's definitely something that is very doable and that we would consider doing. But again, I just go back to priorities and time, right? Is it in the future? I imagine so. I imagine a full-blown solution where an operator just says, hey, you can have at most 100 cells grow as you need to, but don't use my resources until you need to. I think that's probably going to come. But no concrete plans at this time. So you might have answered this question before in the mailing list, but I figured I'd ask it. So it's very cool what you have, especially on Lattice, like a simpler version for developers. But as you know, and as we all know, anything that starts very simple tend to get complicated. So in other words, is there a guarantee that Lattice doesn't become the new CF? How does Lattice not become the new CF? Discipline? I'm not sure. It's a good question. Let's see where Lattice goes. I think it's still early days for Lattice. You'll always have CF. So how does the Windows isolation work? Seems like a lot of the isolation is provided by IIS. Does that mean like people can't write worker type apps yet? Like, is it more web apps? I'm going to ask Mark to come and answer that. We're microservice architecture. Mark is the Windows microservice. There you go. Sounded like Jesse. So, yeah, so isolation of Windows is very different. And in Windows, when you have a web workload, it assumes that that web workload is integrated into the operating system itself, IIS. So it's not like in a Linux container where you can just start a process, it's just a process where in Windows you have many flavors of processes. So there are kernel primitives that allow you to go and isolate different types of workloads in Windows. So what we will expose to Diego is going to be slightly different than what you'd see in Linux. So today in Windows 2012, you're going to have the HWC and a lot of other mechanisms to isolate Windows like web workloads. And then we're working on background tasks next. All right, looks like we're out of time. Thanks, Saul. Like I said.