 Hello everyone. Does it work? Yes. My name is Jacob. This is your build in a data center. So I work at Google on a build system called Bazel. And this talk will specifically be about remote caching and remote execution in Bazel. So first of all, what's Bazel? Bazel is a build system that has been developed at Google almost 10 years ago. And it's to this day building all of Google's software. Bazel has been open sourced about three years ago. And so it's a build system similar to like CMake, Make, and Gradle and Maven. But with the difference that Bazel doesn't have one favorite language. It's a multi-language build system. So with Bazel you can build Java, C++, Python, Go Rust. You can do Android and iOS development. You can build Docker containers and so on. And if your favorite language is not supported, well Bazel has an extension language. It's a subset of Python. So it's a familiar syntax to many people. And it allows you to add your own build rules, to add support for your own language, or to improve your existing language support. And so one of the distinguishing features of Bazel is that it's focused on correctness. Meaning that Bazel tracks all your inputs to that effect you built. And it notices if something changes. And then when you build your project again it will only rebuild these things that have changed. So some people like to say that having to do a clean in Bazel is considered a bug. So no more clean builds, you only do incremental builds. And so Bazel has recently gotten support for remote caching. And so what's remote caching? So the idea is pretty simple. Bazel can connect to a remote cache that runs in a data center or in a cloud. And it can upload build outputs to this cache. So then if a different developer or a continuous integration system like Jenkins or CircleCI wants to build the same source state with the same compilers on the same platform he doesn't have to. Because Bazel can then download these build artifacts that have already been built previously from the remote cache and reuse them. And our users have reported somewhere between two and ten times speed up for the CI builds. Because you don't always build from a clean state, but you get to reuse 90 plus percent of the builds outputs that haven't really changed. And so why can Bazel do that? I mentioned before that Bazel tracks all your dependencies in a big dependency graph. And when you execute a build, it creates from this dependency graph an action graph. And an action graph you can think of as a graph of individual steps that have to be executed in order to complete the build. And so an action graph consists of actions. And actions can depend on other actions, meaning that action A has to be executed before action B. Because action B depends on some outputs of action A. And so what's an action? Most commonly it would for example be just a compiler invocation. So an action consists of a command like a GCC call. It has declared input files that the command can access. It has declared outputs that then other actions can access. And it contains a platform definition so that you know that on which platform this action runs on and where you can share cache outputs. And so the way we in a sense implement remote caching is you can think of it as a remote cache as a big hash map where the action is the key and the build outputs of this action are the value. So if someone tries to build the same action he looks up the action in this remote cache in this hash map and if there is an entry for it the remote cache will just return the build outputs. And so an action contains enough information to be able to generate the same build output. And so how would you use that? So one setup that our users use and that we generally recommend is that on your continuous integration system you have base running connected to the remote cache and it can read and write to the remote cache. So like on your Jenkins or on your CircleCI and then you have developers who just read from the remote cache. And what's the idea behind this? So let's assume that you have a remote cache that has all built outputs of the current master branch. And so then a developer opens a pull request that's synced on this master branch and some changes on top. So you open a pull request and this will trigger your Jenkins build, your CI build so it will trigger a basal build test and since this is synced on top of the master branch and the remote cache contains all the outputs from the master branch base will be able to fetch most of the build outputs from the remote cache only build the changes in the pull request and then write the build outputs of the changes back to the remote cache. And this will typically go back and forth with code review and then once the pull request is ready it gets committed but before that your CI system runs again, builds the changes, stores it in the remote cache and then once this change is committed the remote cache again contains the state of the current master branch and so your following pull requests and so on get a lot faster because they really only need to build what has changed in this pull request and not the 95 other things. And so developers then typically just read from this cache and here the classic scenario would be a developer comes in in the morning he syncs his, it does a good pull syncs his master branch to the latest commit but the CI system has already built everything so he doesn't really need to rebuild that but can just fetch the outputs from the master branch and so that's remote caching so remote caching you build locally, you share remotely and then the next thing that Bazel can do is remote execution so remember an action contains all the information to create a build output so Bazel can also send this action to a remote execution system running in a data center in the cloud and then this remote execution system can execute this action in a data center and send it back to Bazel and so why would you want to do that so first of all you also get the benefit of remote caching Bazel sends an action to the remote execution system the remote execution system can check hey did I already build this and if so just serve the outputs and don't do the work again but secondly data centers have a lot more cores and Bazel is really good at understanding your build figuring out what can be paralyzed so Bazel can typically run 2, 3, 4, 500 actions in parallel and the data center gives you enough cores to do that while your local machine wouldn't so you can also speed up the things that have actually changed and that you really need to build and so these two things are about performance but there's a third reason and the reason is cross-combination so a remote execution system can be connected to not just an environment that resembles your development environment but it can also have say a pool of Windows workers, a pool of Linux workers can have, I don't know, Android phones connected to it and then you can sit on your Linux desktop and run tests on Windows from a Linux desktop using remote execution which is a big boost in productivity if you happen to need multi-platform development so you don't need to switch workstations or VMs and so on and so remote execution is a bit more complex than remote caching so it does not just use HTTP but it uses a GRBC based API that we developed and open sourced and we also built a open source remote execution system that you can take today and run and try out and give us feedback so we are developing it in collaboration with Uber and Twitter and it's still working progress so for example cross-combination support is not there yet but we are working on it yes and so generally what are we working on the API is also still evolving so the remote execution API, the caching API is set and we are trying to add cross-combination support and we are big focus for us is to have sandbox execution in Docker so a feature we are working on is basically can locally run your actions in a Docker container that you specify so you will be able to strictly define the environment and the tools that your compiler locations will run in and this will allow you to get like bitwise identical outputs across machines and across environments because everything is running in a Docker container and so for people who need this kind of reproducibility this will be a great boost to be used in combination with remote caching and additionally right now remote caching does require a good network connection like it's downloading a lot of built outputs and so we thought that if builds are incremental in Bazel why shouldn't downloads be incremental too so we are currently working on a R-Sync documentation for remote caching and execution in Bazel and so far our tests have been pretty promising in that they showed up to like a 90% reduction in downloads that's it, we have recently launched a documentation section on remote caching on our website, please check it out and if you have any questions please ask them now, thank you time for questions and any questions Are the remotely built object signed or authenticated to make sure that you do not as a user or receive bad stuff as opposed to what you would compile locally? So the remote execution, so the objects itself so all the built outputs are stored in a content addressable storage so they are named by their hash and typically you would want to run a remote execution system where Bazel would need to authenticate against it and so one idea of Bazel is to have reproducible builds so you should be able to run the same action locally and remotely and get identical outputs so that would be a way of verifying that it hasn't been tampered with Anyone else?