 Hello everyone, welcome to Envoy.com. Today, I'm going to give a talk about Support ARM60 platform in Envoy. My name is Li Zan Zhou. I work at Tetrade. I'm also an Envoy maintainer as well. So first, let's go over the story of ARM Support in Envoy. The first issue about ARM Support opened back in 2017. At that moment, we didn't have enough resource, either engineering resource or compute resource to have official support for ARM. But there are several people in community try to build Envoy on Raspberry Pi or other ARM board. Then the open lab folks at the third party CI in Q2 2019. This year, we got some support from ARM. Then we were able to add some experimental CI in Q2. Then we get it officially fully tested in Q3. The first official release with ARM60 for support was released in last week. The version 1.16.0 is the first version have official ARM64 image build. I'm going to talk about the story how we get the first official release in CI and build system. So let's go over what Envoy build system. Envoy uses a Bazel as a build tool. It provides the benefit of hermetic build and remote build cache. We use a remote build execution from Google. Also we use cache from an open source project called Bazel remote with S3 backend for ARM. The build is really large today. We have just for binary test targets we have 744 tests. Each test target have tens or 100 test cases. So this is really time consuming to run with small machines. How to make this run within time was also a challenge to get proper ARM support in Envoy CI. Also Envoy have a lot of build dependencies. Namely the big one is gperf tools which is a metal extension and nghtp2 or htp2 codec and yaml and other like protocol buffers and so on. Luckily we didn't have any major issue with those dependencies. We had to do some small patches to make them work with ARM build. But overall that was a small part. So let's talk about the CI. The Envoy CI runs on Azure pipelines. Before we have the ARM support we run format check and do the release and that the binary build in the release build go to the Docker image. And we run some sanitizers coverage and gcc and etc. I omit the macOS and windows support here. Then with ARM we add a new ARM64 release job here. And then we take the binary build from x64 release and ARM64 release to a Docker multi-arch job. This one builds a multi-arch image for both x64 and ARM64 and push that to Docker Hub. So there were some challenges for support ARM. The first one is Bazel. Bazel didn't have official release before 3.4. We worked with team in Google to make the official release happen. Azure Pipeline started support ARM64 in Q2 which is actually when we looked at it. So this was the right timing. And because the Azure Pipeline doesn't provide the managed instance for its CI workers, we built our own self-run agent infra on AWS. And at the same point, AWS releases its new ARM64 instance, Graviton 2, which provides powerful instances. This gives us a lot of flexibility to run the CI on large machines. The CI infra basically set up idle instances in AWS that wait for a job from GitHub. And then it works on the CI job from Azure Pipeline. The code is in the CI infra repo and it's very simple. The next is the Docker image build. Docker now have the multi-art support with the build kit. We can use the same Docker file to build both ARM64 and AMD64, which is actually 64 images. We change the debug image from Alpine base to Ubuntu base to better support ARM. Because the Alpine GDPC base image that we use doesn't have the ARM version. So next, I'm going to talk about porting Envoy to ARM. Envoy is a modern code base. We didn't have any major issue to build Envoy codes into ARM64. There are some caveats that we have to pay attention. It's the endian and the signed char versus unsigned char, which differs on compiler default for those platforms. The memory size dependent test is also failed initially because the pthread pointer size is different. This affects the hot restart version. The biggest one we have to handle is the exception handling. Initially when we started ARM building, while the build produces a binary, but it fails like 100 out of 600 of tests. This is due to the C++ exceptions are not propagated through C codes on ARM platform, at least by default on the client compilers. We needed to pass the dash F exceptions to compile C code. This was very important because our HTTP codex depends on this behavior. We're on the path to remove the exception from HTTP to codex, but that one was still the issue at that moment. We also see some test flakes on ARM64 platform. This is mostly due to different timing codes by different test timing. Surprisingly, some tests run faster on ARM64, this causes the integration test failures. So let's talk about the build performance. We use the AWS R6DG 8x large instance. This one has 32 cores and 256 gigabytes memory. We do cache with Bazel remote, which helps a lot on the build performance. Without cache, this one costs like 40 minutes for every full CI job. With a cache, it normally runs within 15 minutes. This includes pooling build docker image and producing test results. For future development, we have some items left behind. One is WebAssembly support, which is not merged into the upstream master yet, but it currently excludes ARM64. WebAssembly is a really important feature. So we will need to add the WebAssembly support to ARM as well. Also, there are some downstream builds that doesn't have ARM64 support yet, like Istio Proxy or GetEnvoy. We will work on this soon. Thank you for listening to this talk. If you have any question, I'm on the platform to answer the question. And you can also ask me on Twitter or Slack.