 Hello everybody my name is Lindsay Salisbury. I'm from Facebook. I'm on the containers team. I'm gonna talk today about building portable service slash container images with Buck. I want to cover a couple of things through this talk. We'll talk a little bit about what Buck is, how it's set up, how it works. What are the goals of the project that we have for building container images with Buck and portable services with Buck, what the basic components are, how all the pieces under the covers work. I'll do a very risky live demo and then we'll talk a little bit about the future sort of where we're going with this project and how this is gonna play out over the next few months and years. So first we'll talk about Buck. It's got a cool name. This is from the Buck website. The point is to add reproducibility. Buck will only use the declared inputs which means everybody gets the same results. This is basically trying to build a reproducible and hermetic build system. You can go read more about Buck on the website. These points that I'm pointing out are important for us for container images, that's why I'm bringing them up. So we need to be able to get correct incremental builds. So if parts of the build change, we want to be able to use those incremental build parts and they should always be correct and you shouldn't have to rebuild the entire thing from scratch. So you can sort of compose things as subparts of the system change. And very importantly we want to be able to understand your dependencies. With Buck Query you can basically like walk through all the things that your build depends on. You can do reverse dependencies to find out what things depend on you. And this is pretty important and useful for figuring out and sort of like debugging where parts of your build or your container image come from. So Buck uses Skylark from Basil. I believe Basil I think is how you pronounce it. I think the new one is Starlark. They've changed the name. We're still using Skylark which is a slightly older version of that but it's pretty much the same thing. Skylark is a domain-specific language that's non-turing-complete. It allows you to sort of it has a subset of Python that allows you to build config things. I'll show a little bit of that in the demos and throughout the talk. It supports C++, Go, Rust, Haskell, D, Shell and now container images. So what are the goals of the Buck image build project? Well, we want to be able to build primitives necessary to support hermetic builds and progress towards reproducible builds. Not just in binaries or compilation aspects or the compilation artifacts but also container artifacts and images themselves. So being able to reproduce them when you rebuild from a particular Git hash or HD hash and actually having full hermetic builds which means that your build environment doesn't affect the output of the build itself. The file system construction language is declarative so it means that the compiler will check for file system actions for compatibility. It'll sort them automatically. It'll figure out what the dependencies are. You don't have to imperatively understand how the file system is going to get constructed in order to reproduce or sort of build a file system in a sane way. And it is strict. To the extent possible we'll enforce that actions fully succeed and we don't add features that don't compose predictably with others. So this is sort of an important aspect of it because there are lots of gotchas when you start constructing file systems. One of the things that we don't do is we don't allow arbitrary command execution. So there was a talk yesterday about Bazel and their lack of that as well. We also don't really allow arbitrary command execution because you can't basically enforce what's going to happen to the file system in that case. The nice part about Bazel is that there's this thing called a general which is where you can essentially write bash to generate build steps and build targets that are sort of pseudo reproducible and that helps sort of mitigate some of the complications with that as people start using this stuff. So one important part is that we want a deployable container to feel like a regular build artifact. As we move forward and as we're deploying lots of services across all of our infrastructure, the thing that we find is that users and customers of this don't really care of the fact that it's a container image. They just want their service to run. They want all their dependencies wrapped up. They want to push everything out into their into their production environment and what they want to worry about is sort of managing and operating their service, not necessarily like all the sub components that are required to go into that thing. So the more we can treat a container as a build artifact, the more we can leverage the existing understanding and knowledge and sort of behaviors of the typical build chain. So things like continuous build, CICD, testing, that kind of stuff, we can sort of wrap roll into this and get a lot of benefit from that. Ultimately what you test is what you run. So when you run a test or you build a unit test or an integration test against your container, it's testing against the thing you're going to actually run in production. That's a big goal for us. So what are the components of this build system? So we build using a local ButterFS subvolume. This whole thing is built around using ButterFS locally to construct the file system. Part of the code of this is we actually built a ButterFS send stream verifier. So we have a bunch of unit tests that actually parse and understand the extent structure of the send stream itself. And the benefit of this is we found a couple of bugs in the send stream structure and like in the kernel actually using this this mechanism to actually validate the send stream. What it means though is that we can build a container file system, dump the send stream of that file system, and then run it through this verifier and actually verify that the send stream actions that are being performed are correct based on the build input that we have. We use systemd nspawn for isolation. I couldn't find the systemd logo. I was running out of time for my presentation, so that'll have to do. Nspawn is the thing we use to build to run everything to build the image itself. We use private networking. We turn off networking and make sure that nothing can talk to anything else. We use it for all the bind mounting and read only bind mounts and protecting certain mounts during the build. The compiler is written in Python. Python 3. We have a strict test with 100% coverage. So one of the goals of the project is that we will always have 100% coverage on the code base. This is important for us because we want to make sure that we're actually testing the compiler itself pretty strictly. It's also dependency free. We have no third party dependencies outside of Python standard library. Another interesting aspect of this is we actually also don't rely on the systemd on the hosts to do the build. So we can essentially build this in a pretty constrained environment without having a working debus and everything else to talk to systemd Nspawn. So the Skylark is used for the bug target definitions and as I mentioned previously, it's pretty highly extensible via the general and built in bug primitives that allow you to sort of extend this thing. The cool part is as we start playing with this, we start being able to add a whole bunch of additional features on top of the basic feature primitives, the image feature primitives, which I'll walk through here in a second. It allows us to build sort of more complex behavior using all the smaller building blocks and end up with sort of an abstraction in a general in bug that gives us some sort of pseudo API that users can use to do certain things like enable a systemd unit inside of an image during a build. Okay. So this is the actual Skylark parts. I'm going to walk through these. We have a thing called image features. So all of the various file system operations that can occur on a inside of an image are wrapped up in what are called features. There's a bunch of different options you can do for features. I'm going to walk through a few of them in the stock. One is obvious, something you need is a maketers. This gives us the ability to make directories inside the file system. We have a couple of different structures that we use inside the syntax. One you can do a sort of a tuple shorthand, which is the director you want to create inside and the director you want to create the new one in and the director you want to create. And then we have more expanded options where you can specify modes and users and owners and things like that. So as referencing back to the fact that you need to know and that this is a declarative system and it needs to know all of the input components. Interesting thing about maketers is that you can't have something dropped into a directory that doesn't exist yet. So the compiler has to be told explicitly which directories you want to create. So it doesn't do make dur minus p, essentially. And one of the main reasons for that is because we can't derive the mode of the subdirectories that are necessary for that whole tree, you have to tell us. Another one is simlinks. So you can create simlinks in the file system. This is creating a simlink to enable a multi-user target, a service inside a multi-user target. There's two modes of simlinks, files and directories. The reason that they're different is because semantically on the file system they're actually treated slightly differently. And so we needed to the stat properties are slightly different. So we needed to actually have two different top level attributes. Another one is executable. So this is sort of the more functional part of how things get rolled up into container images with buck because this is essentially saying the call in my service binary means go find the target in buck named my service binary, grab the output of that and put it at user bin my service. So this is where the dependency of the binary that the service owner might be building or whoever's building it lives. And this tells it how to install it into the into the file system. Again, the paths need to exist. So if the user lib exec, for example, doesn't exist yet in the file system, the compiler will fail. It will fail on build and you have to make sure that you have user lib exec created. And another one is mounts. So mount points are really interesting, sometimes important and sometimes not important and terrible, but we need to be able to support the ability to do two basic things. One is sometimes we have to mount stuff from a host into a container. That's not always a great idea. We like we like isolation if we can isolate the host entirely from the container. That's the best model, but unfortunately that's not always the case. And so sometimes we need to be able to mount from the host into the container somewhere. So this is specifying please mount etsy from host into a directory called host mount slash etsy. And again the directory needs to exist in the compiler. The second one is a little bit more interesting because what it says is basically take the layer target, the image layer target, which I'll get to in a second, and mount it in the container at service slash compose dash v1. So this gives us a mechanism to start building file systems, different layers that can basically be composed together inside of other layers through this declarative structure. And we also support rpm packages. Right now we it's all rpms but we do have aspirations to support other packaging models as well. We use rpms internally so that's what are we focused on. But this basically allows us to specify a set of rpms that should also be installed inside the image. Now the interesting thing is rpms contain directories. They contain other things that are sub, like it might put stuff into a directory that doesn't yet exist in the file system. What the compiler will do is inspect the rpms and see what the paths are and what all the stat options are from the rpms and then add those to the the declarative structure of the of the image. Okay so features are the various individual actions that need to be performed on the file system and a layer is the thing that actually composes all the features together to create a ultimately what's under the covers of butterfs subvolume. And it's essentially just a list of all the features that you want to compose together. So we put all the features we just defined, the maker, the symlinks, the mounts, packages, and then we have another one in here that we added inline that's not sort of a standalone thing because it's just something that's specific to this layer and that's where we're going to copy a file from the target default config which is just like an export file or something that lives in the in the the source tree and we're going to put it at etsy config d00 default. This obviously won't compile because it didn't create all the directories but and the other part of this is this build ops which gives us this option for a build appliance. So the concept of a build appliance is that we can essentially have it's sort of like like repo root or there's others like I'm trying to remember the names of them but where you can basically have a trute environment that you build and run your your your file system construction in. So this is essentially like running nspawn instead of nspawn in order to build the file system that we're ultimately after and it just gives us another extra layer of protection because the build appliance is a known entity and we know exactly what it looks like and we can reconstruct it and and and and guarantee that the file system being constructed is running inside that unknown unknown context. So you can also have a derived layer which is a a layer that's built on top of another layer so here's a child layer built on top of the previous layer just defined and it adds an additional image feature which is a custom config that adds you know 10 10 dash custom which is some you know special override specific to this layer and this has a build option where we say we want the sub volume name to be a child layer instead of the default which is just volume and then we have a couple other special layers one of them is called a send stream layer and this is essentially reconstructing a layer a butterfs sub volume from an existing send stream that was already built so the main purpose behind this is being able to have layers have images that are being built they're tested they're validated and they're they're basically packaged and saved somewhere as a send stream and then later on something else can reference the version of that layer of that image that has been tested and validated so it's essentially like having a stable version of some some artifact that you're going to build against and you don't have to always build against master sometimes master's broken so the interesting thing is the layers are the build targets the features themselves aren't really build targets they're just sort of inputs that go into building the actual layer in the buck system you will actually build the layer itself so you do a buck build and then the path to the whatever that that layer target is okay so after we have a layer now this is the thing that's built inside of a inside of your working environment it's not really useful for you yet you need to sort of export it and and put it into a format that you can ship around and that's what image package does an image package supports right now we support send streams butterfs loopback image files and squashfs outputs and so what this does is takes the sub volume that's contained inside the inside the the the build environment and exports it is whatever this format is so send stream send streams the st you can see i'm gonna hurry because i only have five minutes left and i wanted to show some actual good so packages are just basically a layer exported format you can see this is sort of just a demo or a command line of what that would look like so buck run is an interesting aspect of this buck gives you the ability to actually run executables inside of that environment what what we can do with our image environment is give people an interactive sandbox so we buck run we add a special dash container endpoint to the to a layer target and it gives us a shell inside of that container dirt in the build environment that we can sort of play with inspect and validate sort of run command see what's going on in there if you have a special flag called enable boot target on your layer we will actually boot the the image with and spawn dash dash boot or the equivalent thereof that will actually spin up system d inside of that container which we run in in our environment and then give you a full sort of system d container instance that you can play with and test and validate and sort of like mess around with these interactive sandboxes are done in snapshots ephemeral sub volumes and so whatever you do in this interactive snapshot gets lost when you leave which doesn't break the hermiticity or the reproducibility of the image build okay buck test is built on top of buck run what it does is allow us to build unit tests that actually run against the image itself so we can run tests inside the the true environment or inside the container environment we can run them as nobody we can run them as root we can run them in booted mode so we can actually write unit tests and integration tests against system d's containers to make sure that services spin up to make sure that mounts work sockets work that you know sort of all the variations of of spin up shutdown exit exiting of services all all sorts of unit tests that people need to write for their container environments okay so let me see if I can make a demo work here so I have I'm going to walk into really quick this booted environment which is a an image that we use called slim os which is just a stripped down version of CentOS and what we have in here if I go and look is basically this you know the system d is running I have access to all the services that are there I can exit out of that and you can see that that temp volume actually gets ends up getting getting destroyed let me show you some of the actual can you guys read that at all the purple is great so this is sort of like what a general looks like for fetching a pre-built image it's basically just a batch script that like copies from somewhere we we have some stuff we're working on that will basically ensure that there's a hash so if you download it from a remote location it will validate and verify the hash to make sure that you can't just like install whatever from wherever and then you create a send stream layer from that actual the output of that general and then you can build an image layer on top of that that does more stuff you can install you got a feature here that installs a file in a user lib system d system for this service called meow which is just a python binary that will meow at you and then this install executable which is referencing this other this other buck target which is the actual python binary itself so you can see how we can compose the the binaries or the the targets from the binary into the file system and if I build this it will build a file system and it's going to download the send stream it's going to unpack it it's going to put it into a sub volume and while that's running I think I'm pretty close to out of time so I'll go ahead and start taking taking some questions if there are any that's pretty much I can show more demos if you want but sure any questions okay yeah I guess I have to end my talk awkwardly it's the story of my life yeah sure I can repeat it too I didn't see where you had put any kind of assertions in there so I can show you some one quick test that I have here on this demo which is sort of a pretty contrived example but basically I just want to validate that this thing is running as a particular user so I run who am I inside of that container and then I can check for to see make sure that it's nobody how's that hooked up to the buck thing itself because this is just a unit test so in the buck target itself it's just a unit test but this special python unit test image right so this is all the magic where it compiles the unit test and spins up the layer and then actually puts the unit test inside the layer and then invokes and spawn to execute the test inside of that container okay so you mentioned rpms and I have two questions about that so how do you deal with scriptlets for example post installation scriptlets in rpms we don't we don't run them you don't run them so then tearing that that means that some rpms will be just broken yeah so there are some cases when when the rpm installation is not quite what we're expecting but there's not a lot we can do I mean in in terms of the scriptlets we have to sort of ensure that when the rpm gets installed it doesn't change stuff one one thing we're going to try and do to address that is actually that build appliance concept is where we can essentially build a layer that we can reuse so we just build it once and the scriptlets run once and then we don't rerun them because we use a basically a memoized version of the install that makes sense and can you specify versions of the rpms so so that's a part that I actually I guess I shouldn't end in my talk so so early I actually had to talk about which is we have a mechanism for snapshotting all the repositories and so the versions that you get when you install the rpms are actually whatever the current version is in the repo snapshot and the reason for the repo snapshot is that you then you have a hermetic build and it's reproducible at that particular point in time so we resnap shot the repo and then we'll put those hashes into the the version control system and then if you rebuild from that hash you always get the same version of the rpm so to advance that you have to update the metadata about the the snapshot and that's part of what's coming in open source as we make this a top level project and we keep sort of pushing more of these changes into the open source hey and I guess that's also covered my question on this slide but how do you handle dynamic libraries and references to libraries that you also need while running stuff so so the most of what we're doing here in the install executable especially in our environment where we do pretty static static compilation so we don't have a lot of shared libraries to build or to copy in with our particular binaries one thing we're working on is an actual ldd feature support which is essentially where we'll trace the the shared objects that are loaded and that are linked against that binary and put them in as well and that's something that we're working on to actually do more than just the buck target buck target installation but also take source layers that are already existing and sort of like extract the things that we care about if you know how the making it cpio works for like arch and some other distros where it'll just like you give it a binary and it will walk the the elcher the ldd yeah we do that yeah as well that makes sense yeah yeah yeah we're it's we can basically build this a very similar thing with the general stuff and like on top of the existing target structure any other questions okay he's telling me we're done so cool thanks