 So originally this was titled crucible towards a safer and more consistent Bosch job interface. The lawyers told us we couldn't use that name though, so it's now called BPM. That's the release you can find on GitHub. I'll be presenting with James. We're both software engineers at Pivotal. I've been there for about four years. I've done stints on Runtime, on Bosch, many other teams. James worked on Diego most recently, and we're both working on security in the past year or so. Overview. We're going to talk about where this came from, why we're doing this. Jim will talk about implementation and some of the benefits, and then we'll talk about the future and hopefully wrap up with enough time for some questions. Cool. So yeah, let's first talk about where BPM came from. So I think it was late last year. Christopher Brown actually proposed a research proposal to CF leadership and asked to see if we could explore what it is to run Bosch jobs inside containers, but also to explore how we can standardize the Bosch job interface away from just Monit and Bash. And then in March, I believe, we were given a two week exploration where we spiked out a project that was known as crucible at the time. And we were basically running Bosch jobs inside Run C and also using a declarative configuration to do so. We were trying to test out the feasibility of this product and see how it went. And the fact that we're still talking about it today seems like it's going pretty okay, but we converted a large majority of the open source releases. And yeah, there were some pretty cool benefits. If you're interested, I can link you to the previous exploration that we did, but yeah. So motivation. Probably you all have some idea why we'd want to run containers instead of just having jobs running on the BM. The first is the defaults are unsafe. So every job starts as root. Every job has maximum privileges and the best practice of least privileges left as an exercise to the developer to varying degrees of success. Bash is not a great abstraction for running processes. Pidutils. You might have seen this. Some of you might have copied it into your release. Starts out kind of simple, gets a little more complex, kind of makes people sad. Loggly direction. Seems kind of simple. You have to create all your directories. That's also sort of problematic. Copy and paste that into all of your releases. Also poor isolation. So jobs can read everything. Other jobs credentials, other jobs configuration, the Bosch agent credentials and config. Jobs can overwrite anything on the stem cell. Try replacing bash, see what happens. No resource constraints. And there are no tests for these bash scripts. There was a vulnerability earlier this year, maybe last, where we had been passing user input for log lines through an exec in bash, which was less safe than we would have liked. All right, I'll turn it up to Jim to talk about implementation. Cool. Yeah. So, yeah, now that we know why we're building BPM, we can talk about how we're building BPM. One of the first things is BPM is actually using RunC internally. And I'm sure everyone's familiar, but RunC is a implementation of the open standard for containers known as OCI or open container initiative. Technically, it's a Linux implementation using C groups in the Linux kernel. And one of the great things about RunC is that it is lightweight. There is no daemon, so that was a big win for BPM. One of the things about BPM is we want it to be light and we want it to not actually take up very, very many resources because it's going to be side loaded on every single VM where it's used. Another thing that we actually found that was pretty unique that we could do with RunC was that we were actually able to get away with not using an image. So what we did is kind of weird, but it works. But what we do is our image is actually an empty directory, and then we bind out the root file system from the stem cell into said directory, obviously with permissions protected and stuff like that. But it's a really cool feature because what it gives you is it gives you the execution environment of the stem cell while you're running inside a container. This was an important design feature for BPM because with the way it works, we always wanted the stem cell to still be the source of truth for your file system. We wanted the security updates to come with the stem cell, not with a separate image that you're side loading onto said stem cell. So that was a big win and that was something that we did. Another thing that was great about RunC is that it mirrors the Bosch lifecycle pretty well. RunC run is a Bosch job start. RunC kill, you can imagine, is like a drain. And RunC delete dash f, which is a forced delete, is a big win because it cleans up all the associated resources with said container. And that was always a struggle in the past. We were sometimes leaking processes with PIDutils and other things like that. Some other cool things that come with it is the built-in security primitives. It gives you an easy way to do app armor in SE Linux. We are not doing them currently, but it's an idea. And then also another huge thing is that it's actually the backing technology of Diego through Garden RunC right now and also on various other platforms at the moment. So we do know that this is a battle-tested technology and production ready and it does work. But now you might ask yourselves, like, why don't we just expose RunC to Bosch jobs rather than doing some other tool around it? And the real reason for that is that the JSON configuration for RunC is kind of crazy. I've seen specs that take up over 1,000 lines, and they have way too many specifics that most people don't really care about when they're creating releases. So one of the big takeaways from BPM is that we really want to give a simple interface to secure containers, and that's our goal. So looking at some of the BPM configuration, this is what a Monit file looks like today. You see your bash scripts with start arguments and stop arguments in a PID file. We're still in a Monit world because Bosch is so attached to it, but maybe one day we'll be away from it. We'll never know. But in a BPM world, this is what this looks like. I can prom-ify. Right? In a BPM world, the PID file, it's just now in a standard location. And then the other thing that's unique is that you're now using the BPM executable to start and stop your process. Not much else changes. You're still allowed to use other Monit directives with it so you can do the health check and stuff like that. And then what happens is when you call BPM start, what it does is it goes and finds a BPM configuration file that lives in your jobs directory. And it's just a declarative config that you can see here. One of our goals was to keep it really simple and to not really give the developer much leeway. Like they should just specify what they need to specify to execute their process. So you have your executable, your arguments, your environment variables. And then we have some other configuration here that we needed to add so that we could support other Bosch releases. But ideally, you could get away with just the first three. And so this is a simple YAML configuration. We did choose to go with YAML over JSON because comments, first of all, and second of all, it seems like it's the Cloud Foundry way to do things. Yeah. And it takes hundreds of lines of bash or thousands of lines of JSON and turns it into 20 lines of YAML. So that's how we implemented BPM and how you're expected to use it. Some of the benefits that we see coming out of BPM and really what we are hoping to get out of it is, one, security. So job isolation is a huge win that we get from BPM. We use namespaces pretty extensively, one of them being the mount namespace. And one of the awesome things about the way we're doing the image list containers is we actually have really fine-grained control over how we mount all of the directories into the container. So one of the big things is most of the file system is no exec, no SUID, and read-only. So it's really locked down. Another big thing is that in a previous world, Bosch jobs have access to literally everything, the whole host. But in this world, in a BPM world, they actually can only see their own resources. So we only let them see VARVCAP job name or VARVCAP data job name and so on and so on. Another big win is we actually have control over where jobs can write their data. So the only actual directories that are mounted read-write are VARVCAP data and VARVCAP store. This is just standardizing where Bosch jobs should put their data. I've seen jobs put it in packages. I've seen other things. This is just standardizing it. And one really big win we get from that is it protects the host, the stem cell, from tampering. So if a job is compromised or something like that, we have confidence that the integrity of the stem cell is still intact. And we don't have to panic that other jobs might be compromised as well. Another big isolation is PID namespacing. So each process in a BPM container is the init process. They all start at PID 1. They think they're the only thing running. And it really just provides more isolation. So if a compromised process can't see other processes or signal other processes or cause interference. So that's just another big isolation win there. And then we've also thought about resource isolation. So this is kind of a weird one. Right now, we don't actually have any default resource constraints in BPM. And the reason for this is resource constraints in Bosch are kind of confusing because VMs are arbitrarily sized. Like, what does it mean to put a memory limit on a job when you can give it massive VM or small VM or anything? But one of the cool features that I think you get out of some of these resource isolations is it gives a release author a way to really document the resource profile that their job intends to use. One example being the Metron agent. It's always meant to be side voted on all the VMs. And it's supposed to be non-intrusive. So they can set some amount of resource limitations so that they ensure that they don't actually become a burden to the VM in question. We also see this used in Cloud Controller and other things as well. Some of the other wins, with security, we get least privilege. So there's absolutely no capabilities allowed by default to the process executing. You have to actually request capabilities. And even then, we only let you request a certain subset of them. Right now, the only one we actually do allow is CapNet Buying Service, which allows you to listen on low reports. We have found that, I would say, the most Bosch processes are totally OK with this. They don't need these capabilities. That's just another huge win. In a similar vein, we always execute as a non-privilege user. In the case right now, it's VCAP. It's not a suitor, so you can't escalate your privileges. You're locked down. Another big thing that I think we get out of it is what I like to call enablement. Right now, we have N releases, and they're all implementing PID utils and log utils in different ways in their own flavor. And what we get with BPM is we get a singular release that we can then apply across the board to all releases. So you just do one update, and it goes across the board. So you really do get a consistent place where you can make these changes. And then if you try and roll out these changes, as simple as a Bosch deploy. You update your BPM release. Anything using it gets all these updates. You can implement these changes in a shared location. So an example of this would be the security vulnerability that Erin mentioned earlier. It would be tackled in one place, and there wouldn't be the struggle of having to manage the security across multiple teams or across multiple products. Some other things we can do is we can do simple performance improvements. We found that by setting certain time zone variables, you actually do get performance improvements. So you can do simple things like that and get a consistent execution environment for all your Bosch jobs. Some other things is like best practices as well. And a thing that I think is actually kind of interesting is we always talk about CIS security standards, but this might be a really good place to try and implement these because you can consistently enforce these across all Bosch jobs. Another thing that I think is huge is simplicity. So from the releases that we've converted so far, we actually have seen a large reduction in BASH. Most of them, there's almost no BASH, but in some cases, there is. And when it is BASH, there's like 10 lines, so you get a much more auditable BASH experience. It's easier to reason about, and you don't find yourself in ambiguous situations where you don't know the execution path. Another thing is the declarative configuration. You know exactly what you're running. You can see it, it's declared out there. There's no more boilerplate, so PID utils is gone. Everything is handled by BPM. And I like to think that developers can really focus on writing the code that they wanna write rather than writing code to operate their code. A huge thing too is opinionated standards. So we're enforcing users to actually follow standards that both Dimitri and the team thinks are good experiences. And a really big thing is it's a consistent experience for users, right? So, right? But it's a consistent experience for users is what's big too. So if I'm working on release A, and I go to work on release B, the execution doesn't change. I know right now I've jumped projects and the way that bash is invoked, it just changes. But this would standardize that. And I think the last thing that's really important to actually focus on is the operator experience. I think BPM really gives us a nice tool to improve that. And what I mean by that is, because all jobs will hopefully be one day using BPM, what you could see is log files will be standardized and ephemeral and persistent data will be standardized. So when you're an operator, troubleshooting an issue on a stem cell, you have a much clearer idea of what things are doing and what they should be doing if they're using BPM. Another cool thing that we actually found in the Crucible research project is that we're able to put the BPM executable on the path. And so then we can actually add some helper functionality to the BPM executable to help operators. For example, BPM logs is a current one that where it would just tail the logs for your process. BPM trace is an opinionated S trace, which is a really cool tool that has helped me find some bugs in the past. And BPM shell just launches you in a bash shell if you need to poke around in your container. Cool. I'm gonna talk a little bit about the future and then we'll stop for questions. So one possibility is maybe improving cross OS consistency. Currently we've got Monit and the Monit JSON that comes with Bosch Windows. We see a future where maybe these are using the same config file under the hood. Or at least people are using the same sort of primitives. This may also involve us moving or looking at WinC on the Windows. The corresponding Windows Run C executable. Possibly being able to translate from Bosch jobs to other container schedulers. So Diego Kubernetes and Bosch BPM begin to look very similar. It's not inconceivable that you could have a mechanical translation between those. Further security enhancements. BPM provides a shim that might allow us to run every job as a different user on the VM. Username spaces may allow us to do that without having the job know about it. So the job would appear to be running as VCAP within the container. It improves isolation, leverages user isolation on the host might be some issues with shared volumes, it's unclear. Network namespacing is also a possibility but that seems to be somewhat fraught. We don't really know what's gonna go on there. But definitely a possibility we might consider. Flexibility, so we're introducing a layer between Monit and Bosch jobs and the Bosch developers. This might pave the way for replacing Monit. Again, it's a five year effort, so not sure when that's gonna conclude. I'm sure it's a bit. I think it's fine. We'll provide also an abstraction between the job and the stem cell. This allows fine grain control of the job's execution environment but also it allows us the possibility of maybe evolving the stem cell and what's laid out on the stem cell OS differently from what the jobs see. So this could happen either more rapidly evolving stem cell or more rapidly evolving job execution environment. But at this point, those don't have to be bound together because they're not seeing exactly the same thing. So lastly, try it out. We're not at 1.0 yet. Our intention at 1.0 is to have a stable config file format. Right now we're making a lot of changes so it's not very stable in that respect but the containerization has been. Try converting a release. We'd like to hear about you, your experiences. We'd like to hear about issues and edge cases. We'd like to help you adopt BPM. There's a Slack channel at the end of this. But one caveat is that BPM is more constrained than the Bosch execution environment and we hope to keep it that way. Sorry answer to this doesn't work might be you should do it a different way. But we think that's a good move for security. With that, questions? Thank you very much. Speaking of this doesn't work this way anymore and how can I adopt it? So I'm maintaining a CPI which is basically an executable being placed by one release being called from a long running job placed by another release. How does or could that work out? We've had some conversations internally this is the execution pattern between the Bosch director and the CPIs and we think probably the correct way in the future is for that to be a network boundary. One of the things that's really interesting is if you think about that as a network boundary then it's possible to imagine that the person who owns controls and provides credentials to the CPI job could be a totally different person from the one that owns the Bosch director and those two processes and those two silos of information wouldn't need to be shared. So the director operator would never have the credentials to the IOS. Which you could potentially do already now with CPI configs and credit but true. Interesting. Well, yeah. Yeah, I think you already mentioned that you might, you could potentially extend the isolation to a network domain also. So I'm wondering what you have thought about in that direction. You could have something like security groups that you know from Cloud Foundry extended to this concept or only allowing linked other job instances, job of network visibility. We haven't done a lot of thinking about it and actually network namespacing was one of the things that we explicitly did not want to do to begin with because there's a huge amount of complexity there. But I think, I mean we are having conversations with the people doing the container to container networking and it might be possible to leverage some of that. Right now it's very unclear but it's, what's nice is that this is a place where we could implement it. Right, it's a layer that gives that possibility. Whereas before it's not even on the table. Sorry. So what if you have a boss job that maybe needs some of that Rudy type privilege or maybe needs to do some stuff like say for example, Dmitry gets busy stripping out all the extra stuff out of the stem cell and maybe your software requires some of that extra stuff to be installed back in. Are you envisioning that the permissions afforded to the container would be configurable so that while you're secure by default you could be insecure by choice? Yeah, I think that's definitely going to be a possibility. So there are two escape hatches. Right now the BPM containers are pretty locked down. I see us adding more capabilities in the future because I think the benefit of having every job be declarative as opposed to running a bunch of bash is more advantageous than forcing every job to be very locked down in terms of privileges. Right now there is one escape hatch and that is that the Bosch pre-start runs outside the container. BPM also provides a pre-start to do some setup. So I'm not sure we're kind of working to figure out where those boundaries, what the right boundaries are there. Ideally the capabilities we can grant would be limited. Like I would prefer if we didn't grant we would didn't make it possible to add every Linux capability. But again, like the value of having everybody become declarative and providing that shim layer might outweigh having it be permanently locked down. I think one of the things that's most important in terms of isolation is walling jobs off from one another so that there's not this possibility of crosstalk, cross pollution, or so one job being compromised and not allowing every other job's credentials to be taken off that VM. So that's one of the things that we're really focused on. It looks super awesome and I'm excited to use it. Are there any Bosch jobs or Bosch releases that you've looked at and explicitly gone, this is out of scope, this isn't gonna work with this kind of approach? There's one that we thought about converting and didn't in that's garden because it's doing the same thing as BPM. Mootless garden might open the possibility for that but it's not clear, that doesn't seem to have landed completely yet. There may be some cases and that might be one of those things where we go ahead and add like a very big dangerous sounding config flag that lets somebody basically do a sort of useless container around the job but just gives us the opportunity to make those jobs declarative as opposed to Bash based. Does that mean the nested container thing is a problem? Like the Bosch light sort of experience that we take for granted? No, the nested containers should be fine. It's more just that we make certain decisions like hiding the C Group file system right now and things that would make garden incredibly hard to implement so we would need to get rid of some of that and put some holes. So if I run Docker Bosch inside a concourse container which itself is containerized with BPM running on Bosch light, we should all be good. Yeah, you'll be fine. All right, good, just... You might have a headache. We currently actually have a CI build that does a concourse build, a Docker Bosch and then a Bosch deployment inside of it, so yes. From the Bosch UX point of view when you do like Bosch instances, FNF and PS. So right now, Monet is responsible for giving the process status. Like what are jobs that are running? In this case, like BPM provides, I mean, how is the integration between the process status? From that perspective, the process just like, looks like a process that Monet is watching. It just happens to be that the PID that Monet is seeing is a PID which is namespaced underneath that. So from the outside, everything looks the same. Monet status should be the same. So essentially, RunC is managing the PID on the host and then Monet is still watching that PID file. So realistically, what we get rid of is the user's need to manage that PID file, but the interaction between Monet and the process should stay the same from the viewpoint of Bosch itself. The Monet itself is not running the process, right? So how does it can like check whether the process is running or not? So it uses the PID file for that information. Basically, if the PID is running, it reports healthy. That's how Monet currently works. Can I throw up that example, Monet? Yeah. I'll let you age through how quickly. Nice animation. So the PID file directive on the second line is what ties the RunC state to the Monet state. So if that PID file actually exists on the host, if the PID running inside that PID file exists on the host, Monet will report healthy. And that's essentially how it's done previously in the CTL world, CTL scripts world with Bosch is the scripts will maintain that Monet file and depending on the content of that Monet file, Monet will report healthy or not. Well, so for anyone that's done more with Monet than just copy and paste, we can add more health stuff here. That would fit straight into all the BPM. Yeah, so from Monet's running on the host and from its perspective, the job that BPM is starting there is just like any other process. It's only from the perspective of the job itself that the world is containerized. So you can still do network health checks and all that. It still works. We have another one, awesome. I mean, if you could bring closer to me, that would be good. I'll come to you if that's all right. So you mentioned eventually it might be a long-term goal to run Bosch processes no longer on VMs created by Bosch but also on other schedulers. But at the same time you mentioned in the beginning that you don't actually want to create images but you have zero images and mount file systems on it. How does that fit together? I guess if you want to run it on a different scheduler, you would need to create an image at some point in time. Probably during Bosch compilation phase or something similar. Probably. Yeah, it doesn't open question. I think in that world it's possible that there might be some sort of like an OCI layer that looks just like the stem cell or the stem cell is an OCI layer that something, something, wave my hands. I'm sure to meet you real figured out. The homomorphism of the configs is just to point out that, so first of all, the world is kind of moving to a declarative setup about how to run jobs and the world is moving to a world where everything is running containers. And once Bosch jobs kind of catch up with the rest of the ecosystem and are set up that way and that they're declarative and that they sort of are a specification for running something in a contained manner, it becomes possible with obviously some caveats and some work to do this kind of translation or this sort of migration, whereas right now going from 250 lines of bash to a pod spec is kind of a non-starter. Yeah, I truly view it as a transitionary period. I think there's jobs that should not be run in Bosch, for example, like Cloud Controller. It's just a simple web server. And for those things, I think you might do, you might have to build an image at some point, but that would be a good problem. I think the point of keeping the stem cell as the image is to preserve the interface that Bosch is providing and that is the stem cell, that is Bosch's contract with you. And also to deal with the rolling updates, right? You would want to solve that problem of like constant security updates differently if you move to an image-based system. Right, I guess the opportunity would be to even further reduce the actual image that you would need to maintain. You could use like Alpine instead of Ubuntu and then releases need to bring their stuff, so you could even further reduce the attack surface. Yeah, yeah. Yeah, we have a bit of a challenge in the Bosch universe where obviously this contract is unknown. Over time, we've tried to rip things out and then you'd find out, oh, you were using that, were you? Oh, perhaps you shouldn't have been. Classics being GCC. I don't know a Bosch release that brings its own GCC, I don't. So yeah, that's the contract, it's a bit icky to move on from. Gentlemen, thank you very much, everyone. Thank you. Awesome. See you.