 Test one, two. All right, I was just informed that I'm between you and lunch, so we'll start now. All right, my name's Brandon Phillips. I'm the CTO and co-founder of CoreOS. And I am here to talk to you a little bit about a project that we started called Rocket. I provided the URL right there. So you can look at the project right now, because I know half of the audience will be just wanting the URL. So there you go. There's a readme. Please check it out. If you don't want to read the readme right now, I'll talk about what we're actually doing with this project. So Rocket, we shortened it to RKT, because many good Unix commands are three letters, cat, git, vim, and rocket. And so I'm getting a little tap when I aspirate. Should I lower the mic? I'm going to do it. No worries. That's good enough. OK, thanks. And this is a single static binary that brings all the things necessary to run Rocket together into a single thing that you can run on your machine. So it runs on every platform with a modern Linux kernel. And we've tested it on, of course, CoreOS. It also runs on Ubuntu and Fedora. There's a bunch of rumors that we're going around when we started this project that it only runs on system D systems, which is incorrect. So in fact, I'm going to be doing all of my demos from a vagrant Ubuntu VM just to prove my point that this is not a CoreOS specific thing. All right, so it all begins with Rocket Fetch. Rocket Fetch does a few different things. It downloads and discovers images over the internet. And the goal of this is to make this a nonprivileged operation so that you can run Rocket Fetch as a non-root user. And then it can use either discovery images or discovery URLs, which I'll talk about in a moment, or direct HTTP URLs. So you can host the images that you run under Rocket on a regular HTTP server in Gen X or S3 or anything like that. There's no registry or special software required to do that. In fact, we do adorable little tricks so that even image discovery doesn't require a special registry. Again, I'll talk about that in the next section of the talk, which is about the specification behind Rocket. And then the next piece is Rocket Run. And so Rocket Run takes whatever the HTTP URL is and actually runs it as a process on your machine. So one of the major design things that we decided to do with Rocket is that we wanted it to be a direct descendant of the process that executed Rocket. So in the case of running it for the grand line, it's bash Rocket and then your application. In the case of running it under your init system, whatever it might be, it directly executes the container. This is important because we found that a lot of people are having a little trouble with Docker and integrating it with their existing init systems. And then also we have kind of a bootstrapping problem within CoreOS where we want to run certain things under a container, but we need to configure the networking daemon, for example, to be aware of a configure like an overlay network. And then we need to configure the Docker daemon. And we want the overlay network software to be running in a container itself. But we have this bootstrapping problem of we need to configure the Docker daemon using a container, which is external to the Docker daemon. Otherwise, we end up with multiple Docker daemons to configure the Docker daemon, which is not good. So Rocket, as I said, executes directly under the process. And in practice, the Rocket run command lines look something like this, where it's Rocket run, and then say coreOS.com slash etcd, and then a version number. And again, this magic where we take this URL and actually transform it and discover the HTTPS URL of the image is done all statically. Rocket is divided up into a few stages, actually. Let me skip ahead. I think I've now gotten myself a little disorganized here. So in the case of Rocket and all these and the specification that we're talking about, what we're trying to do is we're trying to take a number of processes and execute them inside of a container. How many of you are familiar with Kubernetes or the concept of pods? OK, so what we wanted to do with containers, that's OK, what we wanted to do with the containers that run under Rocket is we want to be able to do things like run multiple processes inside of a single container that's launched. So you could imagine, say, you're running a database server, and you want to have the actual database process, say Postgres, and then you also want to have a process in that container that's backing up the database server, say, to S3 periodically, or replicating that wall file off to a hot follower who may take over in case of hardware failure. So the idea is that within a single container, you can be executing multiple processes. And so you can imagine if you're running at CD again, you're running at CD and then you have something that's backing up at CD and then maybe also doing an HTTP health check that is on a loop and sending you an email in case at CD's misbehaving or has lost quorum. So this is the idea of Rocket, is that you run a container, but that container has multiple applications, multiple processes inside of it. So the Rocket binary itself is divided up into various stages. The first stage is what we call stage 0, which is the actual binary that you download from GitHub. And this downloads the image and sets up the root file system of the image that you've downloaded over the internet, verifies the image if you have GPG keys that you trust for certain software and does all that, essentially talking to the internet. And hopefully, if we design this whole thing right, we can use Unix permissions so this all happens. It's non-root. At some point, it needs to execute as root in order to do the kernel setup of namespaces and c-groups. And this is the next stage, which we call stage 1. Currently in Rocket, stage 1 executes a system D init system. So it has a PID 1 that's actively monitoring all the processes within the containers and setting up those namespaces. And we use system D. But other implementations have been prototyped and people are working on, you can imagine, enclosing the container in a QMU KBM process instead of a system D process or using another init system like RunIt or, sorry, if you're interested in doing that. But the idea is that within the container, you have an actively monitoring process that's monitoring all the applications that are running inside of it. And then stage 2 is the actual interesting bit, which is where it's executing your postgres application, your backup postgres application, all these things inside. And you can do things like attach restart policies. So say your postgres backup application crashes, you can say restart it, attempt to restart it. Every 100 milliseconds, if it fails to restart after five attempts, then kill it. I'll just not aspirate any words. And so you can essentially say what things are required to be running within the containers and set policies for tearing down the container if things happen that are unexpected. And then after the processes have all exited, you're optionally allowed to or given the ability to garbage collect. And since Rocket doesn't have a daemon, we have a command line subcommand called Rocket GC, which you could imagine running on a periodic timer or a cron job or something that goes through and garbage collects file systems for containers that are no longer needed. And you can set whatever policy you'd like on that sort of garbage collection. But it's not a daemon that's actively monitoring these container file systems. All right. So Rocket is an implementation of a specification that we've worked on with a number of people. We've had contributions from folks that like SoundCloud and, sorry. So you're off there, right? Sorry about that. Oh, no worries. Test one, too. We've had people helping us out on the spec from various companies. But the idea is that we wanted to define how containers should be put together, particularly these containers that have multiple processes within sight of them, and then enable people to have multiple implementations of this spec. Because we believe that the image and the runtime should be something that can be implemented by multiple folks. For example, the Mesos project really is interested in executing containers as fast as possible. And they're willing to put in the work to have optimized C++ code all over the place to do that. And they don't want to have a separate init system for every process. They want to have their own special tricks to make container execution really fast. And so they don't want a general-purpose container runtime. They want something built directly into Mesos. So you could imagine that they would want to take this spec and write a library to do that directly. And so this spec is definitely a work in progress. And we're actively taking contributions from a number of folks. And I encourage all of you to get involved. The other thing is that it's not just a spec, but actually live code. And it generates containers that can verify a runtime. So these containers run various tests to ensure that its environment's set up properly, the file system's set up properly. It's able to communicate between two processes using the file system, et cetera, et cetera, and generates those containers. So when people update the spec, they have to update the code at the same time just to ensure that we actually are developing something that can be transformed from human thought into machine code, which is important. So there's a few chunks of the spec. The first is the actual image format. And the image format essentially just contains all the files for the root file system. And then JSON manifest that describes the constraints, the isolation that should be put on the application, and where to find the process to execute. This is all enclosed in a tar and optionally compressed with gzip, bzip, or xc. And it's all addressed via a transport URL. So coro s.com slash xcd would be an example of that. And then images can have dependencies on other images. So you can imagine having an image that has all of your trusted certificate authorities, that you trust inside of your environment, and then including that into all of your downstream images. And that way, you've detached where your certificates are managed away from the applications. So you can update that certificate outside of your applications. You can also use it for things such as configuration where you want to generate a configuration container that, say, has passwords or API keys to your environment. And then the runtime has a number of things that it's in charge of. I mentioned that it monitors and restores processes as they've died. It can also execute one or more processes, one or more apps. And it can have a number of hooks that, as it responds to runtime events, say, we've specified things such as you're doing a migration, and so you want to run some sort of process inside the container before that migration happens. It may, say, deregister this container from a load balance server before spinning it back up or notifies the build server that there's going to be an additional delay for this build to happen, those sorts of things. And this is the diagram that I showed you earlier, but essentially the idea that you can have a container with multiple processes executing inside of it. The other piece that we thought was really important while we were thinking through this spec is we wanted to provide a metadata server specification. And the primary thing is that we wanted to have a way of extending trust and identity from the host down into the container. If you think about it, all of our Linux hosts have an identity that we trust. And that's generally the SSH server and the SSH key pair. But we haven't defined that for processes. So individual processes on our machine don't have identity. And this is a problem because a lot of our processes are network services, and they need to gather resources from other network services. And so we wanted to have a very minimal signing and verification metadata server so that a process could make a network call to an IP address and port that is trusted, that it's given via an environment variable, and then be able to verify using cryptography the identity of other processes that may be making network requests. And it's essentially a bare minimum HMAC service where the process doesn't have access to the private key that's being used. So the identity server and the metadata server, I thought, I think are rather fundamental things that exist in other large organizations, but we haven't really defined as a community for processes yet. The other piece is the image discovery piece. So coros.com.scd is the name of the application container image, but it's not actually hosted at coros.com. It's hosted at github.com. Slash, lotta, lotta, slash releases. And so what we do is we've defined a HTTP HTML head section, a meta section, so that you can say, so it makes a query to coros.com. It looks in the head section of the HTML at coros.com. It says, OK, there's a discovery location for this container name, and it says that it's over at github. And so we can redirect you to S3 buckets or anything like that, just very similar to how Golang does their stuff. So when you have a Golang package, you can say, host it on your personal domain, example.com. But the actual bits are hosted at github.com. But the canonical name is example.com. And that's how we have built the image discovery process. So it looks something like this within the meta tag on your HTML pages. It's templated using one of the RFC templating languages. So it's like github.com, slash version, whatever. And then as I mentioned, it's not just code, it's not just a human readable spec. There's actual code to back it up. And we have a tool called ACTool that can do things like build, validate, and discover images so that you can ensure that as we add additional implementations of this app container spec that they actually all play together nicely and conform to the spec as it evolves. All right, so the two projects are github.com, slash corals, slash rocket, and then github.com, slash appc, slash spec. I'm gonna go through and do a few demos really quick just so that we have a concrete idea of what we're talking about here, and then we'll call it a show. Do I have any quick questions before I dive into demos? Okay, I will, let's see, is this being recorded? Is this being recorded? Okay. Well, could somebody hold the microphone actually for me? It's gonna be really hard to type. Okay, so as I said, this is everyone's, okay, good. So this is an abundance machine running Rocket, of course it could be corals or anything else. And so what we can do here is I can do, so I can say things like sudo rocket run corals.com, slash scd, and then I can give an explicit version tag, and one of the things in the specification is that you can have an arbitrary number of tags, so you can imagine saying canary equals true and environment equals NZ or whatever. But the two tags are name and version, and we have a shorthand for that. And what this does is it'll go off to the internet and actually download the image and discover where the image is hosted, if I have internet. That might be a problem. I'm getting it and brought that. Oh, okay. But it's 30 minutes. Okay, well, okay, so I guess I had an image cached on my machine and DNS timed out, so awesome. So what happened is that it went through and found the image and then launched it, and so this process is, this etcd process is now running on my host. Similarly, I could have launched this container using an HTTP URL. So this is the image that would have been discovered via the discovery process, github.com slash corals, etcd releases, download, et cetera, et cetera. And so this template was filled out after downloading the homepage of corals.com and stripping out that head thing. And so this will work similarly because we add a bunch of secondary indexes based on the crypto hash of the image, based on the HTTP URL it was downloaded from, and then from the discovery URL slug corals.com slash etcd. So there's various ways of addressing the image based on essentially your security stance. The crypto hash is obviously the most secure if you want to write down in a configuration file or something, run this image. Obviously you want it to run from the crypto hash because that's the actual image that you want to be running. And then the other nice thing about the specification and how this stuff works is that I'm running the Python simple HTTP server on this host also and I'm able to host the ACI over there without the use of any fancy registry software or something. And we can kind of cruise through the format of the ACI. So I'll remove these things. So the ACI is a regular tarball. So you can extract it using tar and it has a JSON manifest that has some pretty simple sections to it. So the manifest is versioned and has a kind and all that stuff. But then it has an application section which is an optional section describing how the application is actually to be ran. And it's the usual things that you'd expect, environment variables, mount points, that sort of stuff. And then this is where we get into the labels where you can define what operating system it's supposed to be targeted for, what the architecture is, what the name of it is and the version. And the reason this app section is optional is because as I mentioned, you may have container images that are just configuration or just like SSH or SSL certificate authorities that you trust. And then the root of S itself is what you'd expect. It contains the actual binaries that are executable and then an empty Etsy host file. So that's the basic format and layout of the app container images themselves. And this format is pretty concrete at this point, but we're interested in adding new formats. One of the things that I'm interested in is extending the tarball format to be streaming so that you can imagine that you could stream the image using a fused file system and launch a container before the container has actually been downloaded. And that's something that we've started to spec out. And this has got some interesting challenges because tar is probably the worst format ever made by humanity. And it's not really anybody's fault, it's just how it evolved. But it doesn't really have the properties that you need for a streaming format. So you end up having to create a secondary index and then signing that secondary index. And it's a bit of an interesting problem, particularly when you consider that what we do with these tar files is that we compress them, which kind of scatters the problem even further because we have no top level section of the file that we can expect to have all the metadata. So we need to download blocks all over the place and then uncompress those blocks in order to find where the metadata is. So if anyone has some great ideas or an alternative file format to tar that's been more or less standardized across a bunch of tooling, I'm open to suggestions and no zip doesn't work. Yeah, so that's the specification and rocket in a nutshell. So I have five minutes left and I'm happy, thank you for holding a mic. And I have five minutes left for questions. So when you're running multiple processes and do you just, like an aspect, you just create multiple things in the JSON for each one? Oh, yes, that's a great question. So there's two ways to do it. Essentially you would add additional, well, I can't type. Anyways, essentially like rocket run and then corals.com slash scd and then space corals.com slash scd backer upper tool. And those would just be both specified on the command line. There's also a JSON format. If you have more complex requirements, say you have like three volumes that you need to be mounting, you have some more complex network setup and you need the containers to be isolated in a certain way from each other. Like one container gets a gig of RAM and the other container gets two megs. If you have that sort of level of complexity, doing it from the command line is really hard. So we punt to having, you have to provide a JSON document that describes the container that you actually want to be running. Yeah. So to generate the ACI image, do you have a, okay. Do you have any equivalent of a Docker file that you just one file and you give it and it generates the ACI from a tree? So we haven't done that and part of the reason we haven't done that yet is because we wanted to enable multiple ways of building things. So people have built tools that take a Docker image and convert it to an ACI, which is sort of a trivial task from converting a JSON format to another JSON format and then flattening the image. But we wanted the spec to be kind of agnostic to how you build it because we think that there's a lot of interesting innovation that can happen around how images get built. You can imagine language specific tools like a Python specific build tool that takes your requirements.txt file or whatever it's called and transforms that into a standalone container image. And so I think over time we'll kind of see a proliferation of additional tooling. I think the Docker build stuff's great, but it has limitations. And particularly when you start to think about things like doing go static binaries, the build process becomes very, very simple and fast and can be done from hosts that aren't actually running the container runtime. For example, I can build and cross compile and build an ACI for a go static binary on my OSX machine that runs just fine on my Linux host. And so I think there's a lot of interesting stuff that happens there and that's why we didn't define, say, how it gets built. We defined what the build access should look like. And we of course are interested in evolving that to make sure that it meets everyone's requirements too. The spec is definitely still under construction. Yeah. Oh, sorry, sorry. Hey, Matthew. So the spec describes you have support for PGP signing of images in order to verify provenance. And is there any kind of support for chains of trust rather than merely a static list of keys? So yeah, so like X509 chains of trust. There's not right now. Essentially, I'd be happy to add something like that. I think that people in general have extremely hard time managing certificate authorities. And I'd like to see if we can do something better than like saying, yeah, just use the open SSLX509 tool and then good luck. So I'm totally open to having a hierarchical thing. It's just, I didn't feel like anything was the right answer right off the bat besides GPG. And there's kind of a history of using GPG for signing software in the open source community. So I thought it was an okay thing to do. Totally willing to have the discussion though about what a better tool would be. Yeah, so the Docker sort of approach to managing containers is to have like a Docker engine. So it appears to me that you have the opportunity with the spec to have a container managing an image running potentially more than one process. The Postgres instance was a good example of that. Now in Postgres, you have a number of child processes that are running logging and replication and what have you. So do you have, or do you see separate engines or container run times for each image running the container in their own process space? Yeah, so how it's implemented today, essentially every container has a PID-1 which is a full init system. So that PID-1 is actively managing that container. There's no API though, so once the process gets launched the process is launched. So there's no API to say stop this container, we rely on your existing init system to do that. So in a way, there is an engine within the spec because we say that you have to actively monitor processes. Now whether that means that it's implemented as Rocket is today, where there's a direct descendant of processes or if it's implemented how like the Docker engine is implemented or LXD is implemented where there's a daemon that you talk to and then the processes exist under that. I think it's an implementation detail. So you don't see multiple Rocket instances running on a single host? There'll always be multiple Rocket instances running on a single host because of the fact of how Rocket works. Because Rocket is, it's like Bash or Python, like it executes as a child of wherever it was executed from. But you can imagine if we'll call it Fubar for creativity. If somebody implemented the app container spec in their Fubar project, you can imagine that there's a Fubar D and there's only one of those per host and that actively monitors the set of processes that exist in a container. So there's only like one process monitor for that entire host for the Fubar D. It's just a difference in implementation design decisions, whether it's a standalone container runtime or whether it's a container runtime with a engine of some sort. So you have one monitor? Yeah, one monitor, yeah, yes. I think there's, do we have time for one more? Sorry, one more. Are you doing anything to approach storage problems that Docker have, like detachable and identifiable storage volumes? Yeah, so there's the concept of volumes and mount points within the spec. We decouple the two. So within the spec, the runtime sets up a volume. So that could be, we'll say for example, a host volume. So a bind mount point. And then the containers have a set of, the container images define a set of names and then destinations for those names. So say you have the backer upper tool, it just generically has a named mount point called backup. And you could point that at say, varlib Postgres or varlib MySQL. And it doesn't really matter. The backer upper tool just knows that it needs to rsync some directory to some other host or S3 or whatever. So we've decoupled the concept of the volume from the mount point in which that volume is attached to the containers. And the API is that the mount points are stable names that the containers should hold on to across multiple versions. So backup name in this case. But it's okay if that backup directory it's implementation specific inside the container image. Okay. Make it quick. Okay, that's great. I only wanted to ask a little bit. Can I use your discovery system for volumes? So yes, for containers. Yes. So you can use an image as a volume. So yes. Cool. Lunch. All right. Thanks. All right, I think we're never good. Thank you, Brandon. Yeah. Thanks. And just a quick FYI, Brandon and a couple of others will be back at 4.25 for the closing panel for this mini conch. Now it's lunchtime. I think everyone's on their own for lunch. Is that right? So I think everyone's on their own for lunch. We'll be back here at 1.20. Andrew Boag or Boag, my apologies for messing up your name. We'll be talking about AWS.