 Hi everybody, welcome to our talk about developing and debugging WebAssembly filters. I'm Yuval Kouhavi, Chief Architect at SoloIO. And I'm Shane O'Donnell, a Software Engineer at Solo.io. Let's talk a little bit about the agenda for today. We'll start with Itzu and Anvoy. We'll do a little bit of an overview. Then we'll talk about how to build and deploy WebAssembly filters onto your service match. And then we'll talk about how we troubleshoot and debug these WebAssembly filters. Let's talk a little bit about Itzu adoption that we see in the industry. So we like to divide it to this first step. The first step we see is kind of a crawl step where people need support for upstream Itzu. They're just testing it out. They need long-term support. They're beginning to learn what service mesh means technically and operationally. Then the next step is people add on more features. We call it the walk phase. They're doing maybe a developer portal, maybe an Ingress API gateway. Start to use MTLS for zero trust security. Start to use the observability features. Once they get more confidence, they usually go to the stage we call the run stage, which involves more delegating responsibilities to different teams with various objects that allow delegation. WebAssembly filters to enrich the data plane. Once organizations feel comfortable in this step, the next step is really take a full advantage of the service mesh. Multi-cluster meshes, federated services that can fail over to each other, et cetera. With that, I'll hand it off to Shane to give us some of the background on Itzu and Envoy filters. So let's do a quick overview of Istio's architecture. Istio deploys a sidecar proxy to each microservice running in your service mesh. These sidecars are each instances of Envoy proxy, which we would collectively refer to as the data plane. This data plane of Envoy proxies is controlled by what we call the control plane. In Istio's case, this is Istio de. And Istio de will communicate these configuration updates from the control plane into the data plane. Once our customers start looking at Istio, one of the first questions they ask us is, how do we extend it? We want to build something like DLP, data loss prevention, or WAF, web application firewall. Are any other number of transformations and custom business logic that you're not going to get with Envoy filters right out of the box? Now, luckily, Envoy is incredibly powerful, and it provides us with a really, really powerful extension layer inside of its filter chain. So the filter chain out of the box will give us some really helpful filters like external auth and rate limiting, but it also gives us the ability to build our own custom native Envoy filter. Now, this does have some drawbacks. First of all, you have to write the filter in C++. And second of all, you have to actually compile the filter into Envoy, which means you need to be familiar with the Bazel build system, which is non-trivial, and you have to be familiar with compiling the entire Envoy tool chain, which takes a lot of time and a lot of power, and it's not that easy to do. Luckily, this is where WebAssembly comes to the rescue. So what is WebAssembly? WebAssembly is a binary format that was originally meant to run on the web. It was meant to run non-JavaScript languages inside of a browser context. Because it was designed to run inside of a browser context, it's built with simplicity, security, and performance in mind. But these attributes are also incredibly desirable when running in other contexts. The context we're especially interested in here is running inside of the Envoy data plane, or more specifically, inside of the filter chain. So what does that look like? Rather than having a native custom Envoy filter, now we're running Envoy's built-in Wasm filter, and passing it our Wasm filter as code. This means that it has all of the advantages of the other Envoy configuration, such that if you want to add or remove a Wasm filter or update its configuration, you don't need to restart the proxy. There's zero downtime. It's also secure and reliable. Wasm is going to run inside of an isolated VM. In addition to that, not just you can run not only C++ filters, but any other language that's implemented the Wasm ABI runtime. Right now, these languages include TinyGo, C++, AssemblyScript, and Rust. It's got near-native performance, which means you can run this on your data path and handle real requests at production scale. And it's sustainable because you're just maintaining Wasm filters. You don't need to rebuild all of Envoy and worry about what happens when Envoy pushes new security patches. So I'm going to pass it back to you, Vol, to talk a little bit about user experience. Let's talk a little bit about the user experience. And we like to start with this tweet from Salman Haig just to demonstrate the power of WebAssembly. Here is Salman, the creator of Docker, mentions that if Wasm on Wasi existed in 2008, he wouldn't have needed to create Docker. This can demonstrate the power of WebAssembly and that it can unlock many use cases in the data plane. We like to separate the technology from the user experience. With Docker, the technology wasn't new. Linux containers were already there. Docker made it really easy to build and distribute those containers. Same goes with WebAssembly. WebAssembly exists today in Envoy, but to use it today, there's a lot of stuff you need to do on your own. And with GlueMesh, WebAssembly have, we aim to simplify that. So let's talk a little bit about the lifecycle from the developer writing the code until the filler ends up in your Envoy data plane serving requests. So the first step is to build a filter. Now, Envoy WebAssembly filters need to adhere to the Envoy ABI, application binary interface. That's the interface between Envoy and the filter. That's how Envoy knows how to interact with the filter. We have pre-packaged filters in several languages and we allow you to quickly and easily get started. With MergeCTL GlueMesh command line tool, we've added a Wasm sub-command that allows you to easily initialize a filter with this MergeCTL Wasm in it, creates a base filter so you can start writing code. In addition, in order to build, each language has its own built use to build a filter. So we provide MergeCTL Wasm build in order to build a filter from code to WebAssembly and package it as an OCI image. And I get to why we use an OCI image in just a second. So the power of OCI images is that it can be pushed and pulled to a registry, much like Docker images. We provide WebAssembly Hub as a community resource where you can sign up for free and push and pull your images too. So the next step in the workflow would be to push this image to a registry in this example to WebAssembly Hub to make it available later to be pulled into the cluster, much like you're doing today with your Docker images. In order to facilitate that, we created the Wasm Artifact image specification, which specifies how to package a Wasm binary into an OCI image so that GlueMesh and other tools know how to retrieve it and send it to Envoy. The way GlueMesh sends the extension to Envoy is using what's called in Envoy an extension config discovery service. This allows us to configure Envoy to grab additional extensions like WebAssembly filters from a separate control plane, separating the lifecycle of regular mesh configuration and your WebAssembly filter. In this example, we use mesh CTL Wasm deploy to create a filter deployment CRD that GlueMesh will then use to create an Envoy filter ECO CRD to inject this ECDS extension onto Envoy. Envoy subsequently will contact GlueMesh and get the Wasm binary and will load it and activate it on the request path. This request injects the filter that we just built and pushed into the app rating workload in a cluster called Management Cluster. The last step as far as development goes is we may want to do source level debugging and that's not something that we currently have but it's something we're working on and it is to provide a mesh CTL debug command that will allow you to source level debug your filter as you are developing it. And with that, let's see a quick demo that shows all the commands we took up until now used in practice. All right, let's see a quick demo for the Wasm developer experience with GlueMesh. So, first of all, on the left side of the screen you can see the GlueMesh console. It gives you a nice overview of your deployment status. If we drill down to the meshes tab, you can see that we have two cluster, Management cluster and the Remote cluster and they're connected with a single virtual mesh. And you can see the various traffic target workloads and policy kind of high level overview of what's going on. You can also drill down to the policy tab to see specific policies. Now, we have added a Wasm plugin to the mesh CTL command line. Mesh CTL is the command line that comes along with GlueMesh. We've added the capabilities of Wasm to mesh CTL using the Wasm plugin. And let's see how the developer workflow for deploying a Wasm extension to the mesh works. So, the first thing we'll do, we'll start with mesh CTL Wasm in it. We'll give it the name of the filter we want to create and the language, in this case Rust. This extracts a template for an easy getting started experience. So, let's open up the code and see what we're working with. As you can see in this example, we implement HTTP response header. So, this method will be called every time an HTTP request is made, the response is returned and as the response headers are returned, this function here will be called. And in this case, what we can see, we have a few lines just to make it a bit more interesting and it sets a response header, hello, and with the default value world plus this number which comes out to be one. All right, so now that we have the code, we want the next one would be to build this filter into a Wasm binary. We have a command for that as well. So, mesh CTL build, we tell it the language so it can select the right tool chain to build with. We tell it how to tag the resulting Wasm filter. And in our case, this Wasm filter will be packaged as an OCI image that will be sent or pushed in the next step to WebAssembly Hub. So, we give it an image name that contains WebAssembly Hub and my username in WebAssembly Hub. Now, this is very similar to how Docker images work. And lastly, we give it the folder that contains the filter. So, building pulls in a container with build tools. In this case, it has a Bazel and it's running and building the filter. It's happening live, so it'll take a minute. In the meantime, I'll mention that after the filter is built, the resulting Wasm file will get packaged in an OCI image that will be stored on your local machine. And then in the next step, what we'll do is actually push it to WebAssembly Hub. All right, everything builds successfully. Image was tagged, everything looking good. So, let's move on to the next step. Mesh CDL push. And what this does is very similar to how Docker push works. It'll push this image from the local cache into WebAssembly Hub IO, so it's available wherever it's needed next. All right, so far, so good, the image is pushed. So, let's move on to the next step. And the next step is to deploy the image, right? So, so far, we have the image in our local machine. We have it in WebAssembly Hub. We need to get it to the service mesh, right? In this case, we'll use the Mesh CDL WasmDeploy command. So, you can see here, Mesh CDL WasmDeploy to Itzio, to our current management cluster. The deployment that we want to create, that's the filter deployment, it's going to be the head header. This is the name of the CRD that will be created to express the extension, right? So, everything is managed with CRDs, and the result of this command will be a CRD that GlueMesh will process and inject the filter, the Wasm extension, into the workload. So, this is the name of the filter deployment CRD that is about to be created. We give it obviously the namespace and the image. This is the image we just pushed to WebAssembly Hub. We give it the cluster because remember GlueMesh can deploy this into multiple clusters. So, this is the cluster where we want the filter to apply to and we give it the label of the workload that we want the filter to apply to. So, in this case, to the ratings app in the cluster that is management cluster. All right, and let's deploy that. So far, so good. And now that it's deployed, you can see that if we'll go to the Wasm tab in GlueMesh, you can see the Wasm filter here as well. So, with the UI, we really try to give you a full 360 view of what's happening in your service mesh. All right, next step, we're going to test that it's actually working. Here is the step where I cross my fingers. So, what we're doing here is essentially executing to the product page pod and from the product page pod, we're going to curl the ratings pod, right? If everything works well, we will see exactly the Hello World One header. Now, you'll notice that the header is lowercase here because with Anvil, all headers are normalized in our lowercase and the value is exactly world and one, which is exactly what we expect it to be. So, so far, everything is looking good. So, this is kind of the basic demo of what we have today is a tool developer workflow starting from deploying a source code template that we can start working with, making it easy, building it into a Wasm filter with the correct build tools, pushing it to a binary registry, right? An OCI registry where the filter can be hosted and then pulling that to the cluster, injecting it to our workload. And these are all things that we have today and that you can experiment with. Now, I would like to give a brief look into the future and this is not something we have just yet. This is more of an illustration of our plans ahead and that's debugging. One of the challenges with WebAssembly is source-level debugging. It's not something that's easy to do today, especially not in Anvil. So, for this demo, we prepared an example on how we think this is going to look like and the way it's going to work, we're going to add a MSTL debug wasm which will actually attach LODB to the Anvoy and using a special runtime can also source-level debug the Wasm filter itself. So, if we'll go here to my history of commands, we can break on HTTP response headers and, all right, this is obviously unresolved because the filter is not loaded yet. We hit Run. We let Anvoy load it. Now, because this Wasm runtime needs to generate the debug symbol, it will take it a minute to run. You can see it's still loading the listeners. Let's clear this script. Here we go, never mind. All right. So, you can see that two locations were added to breakpoint one. So, now, in order to trigger the breakpoint, I obviously need some sort of a response. So, let me just curl localhost so I can get a request and a response. And this demo is for an envoy I'm running locally. As we mentioned, it's not yet ready. So, let me kill that curl. All right. Now, we can see the first breakpoint has just happened to be a function with the same name that we don't care about. So, let's continue. We can see that we break exactly in this function with this code and we can step through it as we would any other program. And, obviously, resume execution. And that will allow us to use the tools that we're familiar with to also debug WebAssembly filters. And that's all for the demo. And with that, I'll pass it on to Shane. Thanks, Yvonne. So, attaching a debugger is incredibly powerful and it's one of the most requested features that our users who are using Wasm filters are asking for. And we're really happy to be working on that. But most of the time when you're working in an environment, it's not a single mesh, single cluster, single service environment. That's just easy to attach a single debugger to. It usually looks something a little bit more like this. You know, you've got multiple clusters. Maybe you've got some load balancers and databases and all kinds of different infrastructure going on in there. So, let's look at a few tools that we have to help you debug and troubleshoot in production that are a little bit more suited to use at scale. And with that, this is a good time to start with a demo. So, this is what our demo environment looks like. We've got two clusters. They both have an Istio mesh installed on them. And each of them has about half of the Istio book info example installed. So, the cluster on the left, which we're calling the management cluster, has reviews v1 and v2. The cluster on the right has reviews v3, for example. Both clusters are managed by BlueMesh in what we call a virtual mesh. So, it acts as one logical mesh, which just makes it a little easier to manage. So, let's jump right into the terminal here just to kind of see what this looks like. I'm running canines here. And you can see on the left, we've got our management cluster. And on the right, we've got our remote cluster. Note that we've got some things running just in one cluster like product page and some things running in the second cluster like the details. One thing I'd like to call attention to before we jump into our code editor is that we've got a Wasm deployment here for our ratings service where we're exposing some metrics. I'm just taking a quick look at what this looks like. You can see that we've defined our WebAssembly filter. I've actually built and deployed and uploaded this to WebAssembly ahead of time just so we can save a little bit of time on the demo. You've already seen how all that works in our previous demo. And then you can see here we've got a workload selector. We specify which clusters this is going to and then we've got a selector by app name. You can see this is going to all of the ratings apps in the book and phone namespaces across both clusters. Okay, so jumping into the actual code here a bit now we can see that... Well, I guess first let's start. What does this filter do? So this is a WebAssembly filter written in Rust. It's pretty basic. We're kind of using this as an example to show the different ways you can debug things. But essentially it will accumulate the body. It will buffer it as we're receiving the stream. And then once we've got it all it'll parse it out into JSON. If you're familiar with the Istio book info app because we have this on the rating service we're going to aggregate all of the ratings and determine what the average review out of five stars is. So we're going to be discussing three different ways we can kind of debug here. Some of them are suitable for production. Some of them are more suitable for a development environment. First let's talk about debug logs. And this is one of those things that's a little bit more appropriate to use in the development environment. Maybe your local environment. It's a very heavy tool and it has a performance impact but it can be really useful for especially the early stages of development where you're trying to figure out exactly where something's going wrong and pinpoint exactly where a bug is running in a really advanced multi-cluster deployment. So how this works is we just use the Envoy ABI which exposes a function called log and we're just passing it the log level of debug. In this particular example we log once the request is received and we log the body once we have it and then finally we log the average reviews. So we should see all of these in the debug logs. So what does this actually look like? Let's jump back to our terminal here. Okay so we can clear out the remote cluster to get a little bit more space because we're kind of focusing on the cluster that has the glue match management plane installed on it. Just be aware that this is a multi-cluster environment. So the first thing to enable logs on any Envoy is that we need to crank up the log level. In order to do that we will be interfacing with the admin API so to do that we have to expose port 15,000 on the rating service and with that port exposed we're going to crank up the log levels and just to kind of show you how that works if you're not familiar with the Envoy logging API we're just doing a simple post to slash logging where we're specifically cranking up the component wasm to debug level. So you can see here everything else all of the other components are still left at warning and then here at the bottom you can see wasm is up to debug. Now that wasm is at the debug level we can make a few requests. So how this is going to work is we're going to execute a curl request from the product page going into the ratings page or the rating service I should say. So this rating service is going to be the one that has our filter installed or it already has our filter installed. Once we do a few curls here you can see we get the request back and then if we actually look at the logs here we should see our debug logs and because there are debug logs we can just get the logs directly with Qubectl I'm getting them specifically from the Istio proxy pod and then I'm just going to grep wasm log because that's what all of our wasm logs have started with. The debug logs are obviously going to be very, very verbose so we want to make sure that we are only looking at the wasm ones for this example. So here we can see our wasm logs let's give a little bit more space here. You can see on HTTP response body which is what we saw previously in our code and then you can see the body gets printed out here this is the body of the response and finally you can see the review which we've calculated as part of our filter just jumping back to our filter real quick kind of tying that together you can see on HTTP response body which we just saw in our logs you can see the work body is printed out the actual body itself and then finally down here you can see the log level so that's the first way in order for us to debug things in a complex environment again like I said debugging is definitely something that you want to do in a local or dev environment it's not really well suited to use in production there is a performance impact so if you want to do more logging at the production level at high scale without impacting your performance how we do that is something called access logging so for access logging how it works in Envoy it's a little interesting so we set a property in our case we're calling it average reviews and the average reviews property is a filter state object that we're setting for Envoy by itself it doesn't really do much but in glue mesh we've created a CRD called access log record which lets you configure at a fine grain level which workloads and in some cases even which headers to match so that you're only getting the logs that you really care about at any given time we've exposed a API in our glue mesh enterprise networking pod which can expose exactly which logs you need to look at at any given time so what does this access log yaml look like how do we configure exactly what gets logged and what we want to look at so what we're going to do is take a quick look back in our terminal we're going to look at this access log record and this is the customer resource I was talking about earlier you can see here we're only applying it to requests on the path slash rating slash one we are applying it to both clusters and you know you can scale this up to however many clusters you have inside of your infrastructure and what we need to do for this to to work is we need to expose the enterprise networking pods port 8080 so I'm just going to do that here that's because our enterprise networking pod exposes a glue mesh API for monitoring this for observability purposes we're then going to call that API which is the slash observability slash logs and we're going to watch it so this is a live stream that we're going to keep open and then finally we're just going to make a few more requests again same request we made earlier where it's from the product page to the ratings service so make a few requests here and then if we full screen this because we're running a little low in space see we're getting response code 200 and we should have filter state objects in here so let's take a quick peek maybe I shouldn't send so many requests but here we go filter state objects so one thing to note is it is bytes value here and that's because our wasm filter didn't actually know the data type when it was saving it we just saved it as bytes but it's worth noting you can see here that it's NC for one is the value and that's just the byte value so if I get out of this and then I do NC for one and just pipe it to base 64 decode and echo that result back out again you can see that's 4.5 which is the 4.5 stars average review that we calculated in our service so this is something that we've piped through from our filter that gets executed on the request path but it's handled by this access logging architecture which is architected so that it can be run on the data path in production at scale without tripling your performance so this is really handy and this is something you can leave on in a real production environment which is great so lastly the last thing we're going to talk about is metrics so I'm going to jump back to the code for a sec and we already talked about our logging and we already talked about our access logging but the other thing that we've done in this filter is we've defined some custom metrics you can see up here in lines 17 and 18 so we've got a metric for when the JSON that we're parsing in the filter is parsed okay which we're calling okay and then we've got one for when the parsing fails and they're defined up here at the top and then you can see down here where we're executing them if it's a success we're doing a record metric metric okay and then right here at the bottom when it's a failure we're going to record a failure and the great thing about these metrics is that they're exposed as your standard envoy metrics so if we jump back to our terminal here let's just exit some of these guys and get more space we don't need that port forward anymore okay so we should be able to curl local host 15,000 slash stats and then we're just going to grab for debug filter which is part of our stats so you can see here we've got two error requests recorded and 10 okay requests and these are the two metrics that we defined in our filter if for example I make a good request to ratings v1 and then I look at these stats again you can see it's gone from 10 to 11 similarly if I make a request to a bad URL that's not the accepted rating one and then we look at the stats again you can see it's gone from two to three and the great thing about these metrics is that they're first-class citizens in envoy stats so you can export these to Prometheus you can get them in a Grafana dashboard where you can set up alerting on them depending on how important they are you know this is just a really powerful tool especially for something in production that you want to just keep an eye on and set alerting to this is this is great so in summary we've looked at how you can attach a debugger to do dev level debugging we've looked at how you can look at debugger logs inside of your filters we've also looked at more production appropriate use cases like access logging and metrics which you can use in your clusters at scale we've looked at the entire life cycle of a wasm filter of how you can go about building it how you can publish it to a registry how you can discover filters that other people have written and published up to the registry and how you can deploy it across your cluster we've highlighted the ease of use of all of this throughout the ecosystem as well as the various languages you can use to write these filters in and we've shown you that we've given you the tools of how to do this in a multi cluster multi mesh environment thanks very much for listening to our talk if you want to learn more please check out solo.io or web assembly hub.io thanks