 I'm on the Congress team at Pivotal, and I'm going to talk about concourse and the container on time. If you came to the keynotes this morning and you didn't sleep in, you would have seen me demo basically concourse using Kubernetes instead of scheduling containers in garden running pods in Kubernetes, and I kind of just want to go over a little bit of like how I got to this, and I like threw the slide up and I was like, oh yeah, it's cool, it's like we're like taking a great path in the product direction of concourse and we're like really like following the best path, but really it's more like we're like seeing this fuzzy path and then we're figuring out the best way to like follow that path and figuring out sort of how to coexist much like Cloud Foundry is the talk before I was talking about like, why do I need Kubernetes when I have Cloud Foundry already? So like much like Cloud Foundry, we're kind of figuring out now that like everyone's starting to use Kubernetes, we're really on this path to figure out like what's going on with Kubernetes and like how developers are using it, how people are using it with concourse. So you might be asking like, why are you looking at this now? Slides are like very slow to switch. So we're changing a lot of things. We're always changing things. We're always kind of finding the best way to interact with the community, finding the best way to serve the needs of the community. We're changing a lot of things internally with concourse as well as sort of how we interface with people in the community. So we're changing how we document and support different deployments of concourse. Previously our docs were very focused on here is how you Bosch deploy concourse and there's also binaries and Docker images, but you can figure out how to use those. So we're kind of changing the way that we're documenting and supporting all of that stuff and reworking some of the wording on our docs to better support those different deployments. We're also changing how workers register with a concourse cluster and we're also changing how volumes and containers on workers are garbage collected. So go a little bit more into these things. So this blog post that I'm kind of referencing a little bit when we're talking about like renewed focus of like how we interact with the community and how things are deployed. Alex wrote a really good blog post about kind of the new website and also how we're changing the way that the docs talk about deploying. So we're really focused on like simplifying the deployment and then moving all of the specific documentation that's not just the binaries here so you run the binaries, we're moving all of the specific documentation to separate GitHub repos so that if you're interested in botched deployment you can go to the concourse botched deployment repo and look there and you don't have to like read through the docs and find like where are the docs do I do this and like it's become a really common thing for you know botched releases to have a deployment like kind of a hand in hand deployment repo that kind of has all the different scenarios for how you might want to deploy that thing. We're also kind of focusing on the concourse Docker repo and improving that as well. And the big thing that we're focusing on is the concourse helm chart kind of related to Kubernetes. There's been a lot of great work in the community on this helm chart. And there's been a lot of great people out there working away on improving this helm chart. It's out there, it's in the stable directory in the Kubernetes charts repo. Shadows a little trend in there. Former pivot who's been working away on that. Lots of great work from them and a lot of other great contributors who don't show up on the first page. So I don't remember their names, I'm sorry. So we're taking all the great work on this helm chart where people have been like really focusing on how to deploy Kubernetes in a really reproducible way. And we're bringing that into our main pipeline. So this is our big like main testing pipeline for concourse. This is how we like test releases for concourse. We're bringing that into our pipeline. We're putting the concourse stamp of approval on this helm chart. We're testing it. This really doesn't show up well. Both logos are blue. Blue logos, man, they're everywhere. So we're putting our stamp of approval on this thing. We're saying we've tested this with this release of concourse. So we're really going to focus on that. And my next points are kind of related to the theme of the conference, writing at scale. We've been running concourse at a big scale, referencing more blog posts. I wrote this post on our wings environment, which is a huge scale concourse deployment that we use for teams within Pivotal. We have around 25 workers and three ATCs. And a whole lot of teams using it. I can't even remember the number of teams. 75 teams. Thanks, James. And we built these SLI dashboards. And in the process of kind of running this giant concourse, we've seen a lot of scalability issues with the way that garbage collection happens. So we're changing the way that garbage collection works. And this seems like totally tangential to how this fits in with the theme of Kubernetes, but I'll get there. So currently, GC is kind of centralized on an individual ATC. And then that ATC will go out to all of the workers and remove anything it doesn't need. So a build has finished. I don't need this container anymore. All of the stuff that was running in there can just go away. This build has gone green. You don't need to worry about it anymore. It's an old build that happened a long time ago. It's gone green. So we're looking more at distributing this work across the cluster. Instead of having this all happen on one ATC, having it all reach out, we're distributing work. So let me kind of dig a little bit more into how this works. So currently, the way that it works is the ATC looks in the database at all of the builds that have happened and all of the volumes and containers related to them. And it goes out to the worker and starts just telling it to delete things. And then eventually, that worker removes those containers or volumes. These are just abstract representations of things. Don't take this as a literal diagram of how anything works. So with centralized GC and a large pool of workers, the ATC talks to the database. Then it has to go and reach out to all of these workers and delete all of these resources. And when you have a very large cluster of concourse, this can get really bad really quickly. Imagine this with 30 workers on the screen and the amount of crazy red lines that I didn't want to draw in Google Slides because it was hard to even draw this many. So we're changing this. We're kind of distributing the GC to the workers themselves. So on the workers, there will be this Reaper component just baked into the binaries that goes out and talks to the ATC. The ATC looks at the database. And then Reaper's like, OK, I'll delete these things locally on this worker. So we're reducing the amount of communication that has to happen from the ATC to the individual workers. So the workers now just have these one single line, which already happens when they're registering with the cluster so they can keep this state up to date. So the ATC really now just needs to care about what's in the database. And the workers are responsible for cleaning up that work. So this really fits well with kind of the thoughts around Kubernetes and moving things, kind of implementing more support for Kubernetes. Because now we don't need to really worry about distributing this work across this giant cluster. The workers just phone home, which they're already doing when they register with the Congress cluster, and deal with that GC locally. So there's less chatter going around and this big, basically, DDoS of whole workers that happens. We're also changing worker registration. So as I previously talked about, Concourse really likes Bosch. It was our main way of talking about deploying Concourse in our documentation. And as such, we kind of built a lot of really good logic into the Bosch release. So there's this ground crew job that deals with registering Bosch-deployed Concourse workers. And when you do a Bosch deploy and say this is going to scale down and remove one of your workers, the worker that's getting removed will stop registering. And then it will start draining. So there's this drain script that runs that basically looks at garden and baggage claim and waits for any of the jobs that it's still running to go away so that it knows that it can safely take itself out of the cluster and retire. And this is all baked into the Bosch release. And this is not supported previously on any of the binaries, on Kubernetes, on basically anything that packages the binaries into a container, so Docker as well. So we're kind of looking at all of this logic as one big component now and shrinking it down into a go library, just worker.go. It's actually not worker.go. It's just worker, but it's a compiled go binary. Anyways, the binaries are more powerful now. So we're baking all of this logic into a shared go library that's used by the binaries. And I didn't finish the rest of the slide. And Docker and Kubernetes. So anything that packages the binaries also is getting this logic now for free. We've taken all of this logic that was previously baked into the Bosch release and moved it into a shared binary. So now we can leverage all of that within Kubernetes and within Docker and within kind of any way that you want to deploy it that's not Bosch. So that's great. Now to the Freedom Friday experimentation that I showcased this morning. So I had this idea of an orchestrator for concourse, kind of supporting different components that could be used to schedule these containerized workloads for concourse. So I've got my concourse build that's running. And I want to use the container runtime to execute my code to create containers and actually run my tests inside of it. And I want to kind of move towards this new world of the CF container runtime. It's so cool right now. Everyone's talking about it here. And I want to use it. Currently, we use Garden. And a lot of people will kind of come to us and they say, oh, Kubernetes is cool now. And Cloud Foundry has a container runtime. Why don't you use it? So I was like, all right, I'm going to look into this. I'm going to see what that would look like. So I looked at Kubernetes. And I was like, oh, Kubernetes already supports jobs. I've got my concourse job that I want to get into Kubernetes. But Kubernetes already has this thing called job. So what's a job in Kubernetes? This is a giant quote. Basically, a job runs something to completion. It's very similar to what you would want from concourse, from a tester. You want to run through all your tests and have them exit at some point. And then you get an exit status and you're good to go. So this is basically a workload that you can schedule inside of Kubernetes that will go out and create pods and containers for you and figure out which node to schedule it on and all of that fun stuff. You just tell it what container image you want, what volumes you want to bound into it, say your Git repo and all of your source code and everything. And then what command you want to run, your test command. So I kind of got to work looking at, here's my task definition, my YAML. Here's more Enterprise YAML for you. That I would define for a concourse job that just goes out and runs the unit tests for booklet, open source documentation language of science. And basically take all of those things and convert it over into what a job spec would be. And this is like a giant representation of one big job spec, rather. So slicing it up and kind of talking about individual parts of that. I'm using Routefest URI here instead of Image Resource for ease of use, basically, ease of hacking this in. But I know I want to create a container that's using the Docker image go lang. So I know in my job spec I need to define a container that has that image. I also know that I'm mounting in the source code for booklet. And concourse inputs are basically giving a directory from some resource that's just a get of that resource. Just download all of the bits of that resource and store it in this path inside of the test container. So I know that I want to mount some volume in there. And I'll just give it some GUID and track it inside of my pod. And I need to mount that volume inside of my job spec as well, inside of the pod that's in my job. And then the run path is just the command that I want to run inside of that container. So it's sort of starting to build up this spec for a Kubernetes job just by picking off pieces that are already expressed in the concourse YAML. And this is where it gets a little bit interesting. So to get those inputs from somewhere, the way that it works now is that concourse goes out to this thing called baggage claim, which is basically our big server for directories. It's a little bit more complicated that it uses special file systems underneath to make copy and write volumes. But you can just think of it as a server for directories. I can make an API call and give it some GUID, and I can get a directory and just download the bits into this container. And Garden lets us mount those volumes into the containers directly. But with Kubernetes, kind of moving these baggage claim volumes around became a little bit weird because I had to worry about, oh, what node is it running on? I'm not as close to scheduling this thing on this specific Garden host or this specific Kubernetes host. I need to leave it up to the scheduler to figure this out. But Kubernetes has these cool things called init containers. So I was able to actually say, actually, create this special image called conveyor, which basically just talks to baggage claim that's deployed inside of Kubernetes. It gives me this volume server inside of Kubernetes. It's a total hack job, but it worked for the demo. Kubernetes also has volumes. And this is one of the big open questions for me is how to better leverage volumes inside of Kubernetes to avoid having this weird init container that talks out to baggage claim and make everything more centralized inside of my Kubernetes cluster. So there's this weird baggage claim off to the right side here, hidden behind. It's actually really effectively hidden behind the curtain here, because this arrow just points into the curtain. So it's great. So this is the hidden parts of that demo this morning where there's this weird bit where this stateful set just has a baggage claim server running on it. And volumes get kind of shifted around between this thing and stored into a volume, a Kubernetes volume, that is attached to that baggage claim server. So basically, the idea with the Congress Orchestrator is to have the ATC implement this very honed interface that expresses all of the things that the ATC might eventually want something to create a container for. And then generalize that. So the first step was really to generalize it for a whole cluster of garden hosts and move all of that logic within the ATC over there and then move it into this thing that implements the Orchestrator interface for garden and then just take the stubbed out empty interface and just start throwing all of this Kubernetes stuff into it. But as I kind of did that, I was like, where are we going with this? This is like a very hack day project approach to getting this in. And I kind of came out with more questions than answers. There's a lot of questions about are Kubernetes jobs really the thing that I want to be using for this? Everyone keeps saying custom resource definitions and I should make my own controller for scheduling all this stuff. But there's some very special concorcy things that the ATC knows that then that controller would also need to know. So then there's more questions there. There's the volume management question. Should I be using persistent volumes inside of Kubernetes to store all this stuff inside the cluster itself instead of this weird baggage claim deployed inside of Kubernetes? And image resource support. I kind of glossed over this a little bit, but I was using just the straight RudaFest URI, which in our docs, we actually tell people you shouldn't use and you should use image resource because it's kind of versioned in the same way that all of your pipeline resources are. But there's a lot of tools emerging to kind of support building images inside of Kubernetes. So that's where the RFC comes in. And this is just a big request for comments for everyone to kind of collect together and tell me all the things I did wrong. And this kind of goes over more in depth, more than I could possibly fit into this talk. The details of kind of specific terminology, this weird like, oh, there's Kubernetes volumes and there's baggage claim volumes and kind of clarifying how to talk about implementing this inside of concourse and summary of some of the proposed changes and a giant list of open questions and kind of caveats about like, how do we actually implement this? How do we do this in a very concorsy way so that we do it the right way? And that we also leave it open to support other container backends in the future. There might be something other than Kubernetes and Garden that people might want to use or even people implementing their own schedulers. So maybe they want to use Garden but they want to kind of change the way that things are scheduled on Garden. They have all these Garden hosts that they're managing themselves and they have a bit more needs out of the scheduler but it's not necessarily something that needs to be inside of concourse proper. So yeah, just a lot of open questions. And then I have this shameless plug for spring one. There's a attendee discount if you want to go to that. I'll be there talking about concourse stuff. That's really all I got. Questions? Yeah. Basically the amount of people who are using Kubernetes, the amount of people not only from like the pool of pivotal customers being kind of a project sponsored by Pivotal, we're really close with like people who are Pivotal customers who are using PKS, our distribution of the container runtime, but also just random people in the open source community who are like, I'm using the Helm chart and it's great. Are you guys going to officially support it? Are you going to support like, what have you thought about running Kubernetes runtime and like having it schedule pods instead of garden containers? So basically just enough people asking about it that I was like, all right, okay, I'm going to go do it. Any other questions? So the hope is that once the RFC kind of gets enough traction and then we kind of like stabilize the discussion around it and we come to like a consensus within the concourse community that eventually we can move towards supporting this. And I guess the one big thing that I didn't talk about too much was the amount of stuff that I learned just building the top half of this or branch here of like refactoring concourse itself to support this interface actually drew out a lot of things where we could improve the way that we schedule things in garden itself and sort of cleaning up the interface within concourse I think would also help us to get more contributors and get people to contribute back to the way that we perform all these operations because there's a lot of kind of things that are really entwined with within the code where we're storing things in the database and then we're also like, there's like objects that deal with the database and with garden and they're kind of just like mashed together and it's just like do all the things. So like cleaning up this interface is a really good exercise for us and I think it's kind of gonna be the path for us to get more people contributing to concourse itself. Yeah, I think the big thing is having a scheduler that is more mature than concourse's own scheduler. We've really tried to kind of make the selection of a specific garden host, something that's really efficient for specific scenarios. Then we also just have a random option where we just pick whatever one will support it. So having like a more mature scheduler is something that's really gonna be useful because a lot of times, and this is something that we're gonna have to actually start implementing ourselves is managing the resources on those specific garden hosts and like we're basically gonna be re-implementing Kubernetes and it feels like inside of concourse. It kind of moves us to look at this as like okay, lots of people are using it. Can we leverage this kind of more mature container scheduler to improve the way that it runs on Kubernetes? And the containers and containers is also really, it trips people up a lot and people run into a lot of problems with like nested overlay file systems. Actually the Helm chart by default now is using the naive driver for baggage lane. So I glossed over baggage lane a little bit. Basically the underlying file system drivers can be overlay FS or butter FS or just like use whatever the file system is. Just throw stuff in directories, don't worry about it. And that removes all ability to make copy on write volumes, which are basically, I don't know how to describe it without just saying copy on write. Basically it lets us mount volumes that are a copy of the original resource cache. So when you go and download your Git repo, we cache that volume and then when you're inputting it into a task and doing stuff with it, we don't mutate the original cache. So right now the Helm chart for concourse defaults to naive, so it just copies the directory over. It's literally just a copy. Instead of this fancy-dancy copy on write, minimal diffing of files. So yeah, lots of considerations there. Cool, I'll be around if anyone else has questions.