 Hi, everyone. My name is Yan, I'm software engineer at Google and Unvoy Senior Maintainer. Today's talk is a little bit on the technical side, maybe a little bit boring side. But I wanted to give an update on the changes that are happening in one of the Unvoy key components, and specifically in the component that validates HTTP protocol. So before we dive deeper, I wanted to give the present state what's happening today within Unvoy. And this was mentioned in the previous talk. Unvoy uses codecs to take the bits and bytes that it gets from the network connections and turn them into the protocol elements. In case of HTTP, it's a header map, body, trailers, sort of typical things that the HTTP application expect. And we have three different HTTP versions, and we source codecs from three different vendors for those. And the validation today is part of the codec. So we have three different versions of HTTP validations, or even three different visions of how HTTP validation should be performed. So what are the problems with this situation? First of all, the validation is not consistent across different HTTP versions. This can be maybe annoying at times when a request that is accepted over HTTP one is rejected over HTTP two, or it can actually lead to fairly serious consequences, such as when, for instance, a codec allows a carriage return in a header value. This can be abused to smuggle requests when Unvoy's configured to proxy across protocol versions, which is a very common scenario. And we have a total of seven CVEs that are one way or another relate to the inconsistencies and the validation in HTTP validation. So it's a validation code is difficult to understand and comprehend. It's a high cognitive load on developers to go and spelunk in a vendor code to find out what's wrong or how things work. And validation cannot be easily modified. It's possible, but very impractical. And we consistently get requests such as, can we support UTF-8 in the path values? It's maybe not strictly compliant with the standard, but it's a common feature in a lot of implementations, so people want something like this in Unvoy and we can't easily deliver something like this. It's very, the observability of protocol errors is very poor today, partly because codec surface very generic errors, invalid content length. But why is it invalid? Is it negative? Is it zero? Is it too high, too low? Very often, you have to debug using something like TCP dumps, which is very difficult, very time consuming. And also, today is very, the current situation, it's very difficult to apply Unvoy's policy of protecting all of the user-visible or protocol validation changes with runtime flags. We often get those changes from vendors. Sometimes we might not even notice them, whether they, you know, they might not even be documented in the release notes and it caused disruption to the existing traffic in the past when we updated the codec. So our answer to this is universal header validator. And in simple terms, we are taking that validation out of the codecs and we're moving it into the one centralized component, which is, you know, it's part of the Unvoy codebase. Unvoy now owns protocol validation. So what's, what is specifically changing? The first thing that's gonna change is that we're gonna get consistent validation across all protocol versions. As I mentioned, it's now an Unvoy component. The changes there don't depend on codec vendors. We can, you know, make them runtime flag protected. All these good things. You know, we know we can audit. It's very easy, it's much easier to understand now. It's also an extension, so anyone can go and modify the default behavior with sort of, you know, one of the possible, for instance, one of the possible extension is to make Unvoy fully compatible in terms of protocol validation with Nginx or HA proxy to allow more seamless migration from one proxy to another. And also we get a much improved observability because the elements that caused protocol errors can be immediate into the access log. So you can actually see, you know, what's wrong with my request? Why am I getting protocol errors? There are some limitations. We weren't able to cleanly extract validation from existing codecs, or we call them legacy codecs already. So it only works with the newer codecs, which were sourced from the open source library, Kish, where we work with developers to allow us to cleanly disable validation in the codecs and allow it to move to the universal header validation. So deployment. For deployment, one of the key requirements was to prevent any disruptions to the existing traffic. So what we want to happen is when you update, when we update Envoy to the universal header validator, there will be no changes. It will first be running in a full compatibility mode with the existing or legacy behavior. So it will be turned on and ideally nothing will change. So in order for us to ensure that there are no changes or fully test this compatibility mode, we built differential fuzzer, which we found to be very efficient, comparing codecs side by side, what behavior they have. So we're fairly certain that there would be no unexpected behavioral changes. We don't want to stay in that compatibility mode forever. I mean, our goal is to move into a better, consistent, more compliant world. And so we'll start to turn off legacy features off gradually and the way we're gonna do it is to first evaluate the traffic, the effects on the traffic, at some of the major Envoy operators. Probably some of those changes will be freebies because nobody really relies on these legacy behaviors. Some of them may be more complicated, but this is a path ahead for us to bring it into the sort of a new, more consistent, more compliant world. So the full list of behavioral changes is right now in this kind of inconvenient GitHub issue. We will surface them in the documentation, in Envoy documentation, where it will be easy to see what are the things we're planning to change, which are the things that are turned on right now, which are the things turned off. So for configuration, we don't expect like any configurational changes needed, Envoy will take your existing protocol validation options and move them to UHV. But if you have to change something, there is a new configuration option in HTTP connection manager for downstream connections and cluster protocol options for upstream connections. So for availability, when I submitted this talk, I fully expected that it will be available, but then life happened. And so now the availability is at the end of 2023. And that's the end of my talk. Thanks, Jan. So we'll do questions for all three talks at the end if there's time, which there may not be. So I'm Josh Morance. I work as an Envoy maintainer. I'm also managed load balancing in the Envoy platform teams at Google. So a brief rundown of what the admin console is, if you already know it, I have a couple of new things to show you. If you don't, this is a good basic overview. So it's an interface that lets you query the state of the running Envoy server. You can also modify some aspects of the server. It has basic interfaces. You can access it by curl or Wget or there's also a web interface. So we'll show some basic admin features and how to configure it. So the basic configuration that most people would do is to configure an HP port by which you can access the admin console. This is actually a little bit risky because this does give right access into the server. So you probably don't wanna do this on a port that's exposed to the internet, for example. So be careful. Another more secure option is to use the C++ API. If you're modifying Envoy's binary, you can hook into the C++ API and I have pointers on how to do that. And that way you can go through your own secure mechanism that is logged in, ackled, and so on. You can also remove the admin interface entirely at compilation time when you build Envoy or you can disable the HTML interface if you just want the curl interface. So curl is pretty simple. You just curl the port you configured if you curl slash help. It shows you what the commands are. You can get, for example, some stats. You can filter them. Or you can modify the server which you would do with a post. Browser UI is kind of similar. You bring up the root page and it gets you a pretty primitive user interface showing you what all the commands you can do. The endpoints that only observe server state are HP gets and they look like links. The endpoints that can modify server state are drawn as buttons and they issue posts. So I'll go into stats. It's probably the most interesting one. It's certainly the one that I've spent the most time on. You can toggle a checkbox to include only the stats that are written since the server was started. That's usually what you want to do because there's frequently a lot of stats that don't show up. There's a RegEx filter that you can use. You can also specify what format you want. HTML, JSON, text, and Prometheus are options here. We're gonna focus on HTML, which is mostly like text. Except that it lets you kind of incrementally modify the features. You can also filter by stat type. You can have all the types or filter for one of them. And then histograms have actually four different display modes that you can use. And I'm gonna show you the one that's not default, but it's called Detailed. So this is what you get when you hit the stats button that I showed earlier. You get all of the stats that have been modified recently that match the RegEx. And the HTML mode shows you a detailed view of the histogram just to dive into a little bit about what you're seeing. So this only happens in detailed mode. If you use any of the other modes, then you get a textual rendering of the histogram just showing you kind of numbers on a line. And so I find it a little bit hard to interpret. For all of Envoy's life, it's actually kept a ton of detail in the histograms, but not really exposed to users, but the detailed entry lets you see them. So this shows you that this particular bucket that's highlighted in yellow, it's an interval that starts at 66 milliseconds and ends at just before 67 milliseconds. It shows you that there's 10,000 of those have been counted since the server started. It shows you that 976 of them have been counted since the last kind of five second polling interval. And this happens to be the P50 mark, so this is the 50th percentile. Another feature that was added recently is Active HTML. So one of the things that might be interesting if you're running an Envoy server and something starts behaving strangely and you don't know what, you don't know where to focus your logs yet because you don't know what's acting strangely. So you can turn on Active HTML mode and it will give you a view of stats that are sorted by how many times they've changed since you started this mode. So if some subsystem is behaving interestingly, usually there'll be a stat that bumps up and you can start seeing it at the top and then you can start focusing your logging on that. Another view that's interesting is the config dump. Config dumps come out in JSON and you can see the raw JSON on the right. One thing I've noted is that if you wanna see a more hierarchical view of the configuration dump, Firefox actually shows you a hierarchical JSON viewer by default, so that's something you might not try if you don't think of it, but I find that pretty useful. So those are all features in Envoy where you can observe aspects of the server. You can also change the server behavior. This is the logging control. So here I've changed the default logging level to be traced and you can do this while the server is running when you get a lot more output. In a minute, Botang will show you a lot more detail about how you can zoom in more carefully than at the component level. I wanna circle back on security. The admin port must be protected at the OS network or firewall layer. If you configure an HP port, it's your responsibility to keep that port protected. And that's the secure alternative is to use the C++ server. So I wanted to leave you with some notes and the slides are uploaded so you can grab these there. The console documentation has all the information which I just shared. If you are interested in contributing to it, there's a lot of room for improvement in the console and there's links to the source code there and API access which I showed you earlier. And there's an example that Matt Kline did of adding a new admin endpoint as an extension. That's useful for the tap filter and more detail about the stats design as well. So I will now hand it over to Botang who will show you a lot more about detailed file level login. Thanks Josh for the comprehensive intro to the admin interface. So hello everyone. I'm Bocheng Yao, a software engineer from Google's Unvoy team. I'm also active contributor to the Unvoy. Today I'm going to talk about the login system in Unvoy. I will focus more on the fan green logger in Unvoy which is different from the component logger. So what is fan green logger? The short answer is it is a log management system that can control the log level by C++ source file rather than the subsystem which is the component. And I also talked about some basic usage of the fan green logger from the admin API and provide some performance insights for that. So right now Unvoy has two log management system. One is component and another one is fan green logger. So let's take a look at what component logger's like. And right now Unvoy has roughly 60 component loggers and this is managed by the logger logable ID. For example, we have the admin client connection and so on. And one thing is it's right now it's not extensible for extensions. With the fan green logger, we provide fan green granularity to the source file level and that means we will have each logger for every active files. The active means the log entry is executed at least once. So let's take a look at how we use that. It's pretty simple and it's a command line option just enable fan green logging. One thing I need to mention is it is either or relationship. We can only enable one log system. And then let's talk about the admin login usage. We want to update the log level in runtime. The first usage, we can list all active loggers and through the poster right now, it is post not get. So we can use a curl post to the API admin port slash logging and we will list all the active logs with the log level, the zero means trace. And like for critical, it is five. And the second use case like it can perform the file base name match to adjust the log level. The file base is like for this example, it's a TCP listener input or the UDP listener input and it's a change to trace. And the second use case is like globe star match and we can use the star to match any characters for the file name. For example, like slash network slash star, it will match that file tree and all these matched files will be debug level for this. And the first use case is like the question mark, the question only match the specific character for that. So we also support multiple settings and we can use multiple update for a single curl post request and then I need to mention that it is state of word update and for each curl request, it will only update the matched file to the specific level and all other unmatched files or the default level, for example, is info. So here's another actual example, like typically on-way defaults to info logging in production environment. So what happens if we have actual incidents for your HTTP filter and we only care about that file, specific files. With component logger in high possibility, we can only like post the logging HTTP to trace and then all the files under that log ID will be logated. So for example, we got a lot of unrated files and for trace, but we only care about, for example, the last two lines effect your client IP. So then with then green logger system and note it is either or we can only use one of them and we can use another request and only match the effect your client IP to trace. For example, after this request, we only display the specific files log entries. It's pretty useful, right? And then I will talk about some performance insights. So the TLDR for performance, like we got the fine green logger system, management system is slightly better than the component system from the CPU performance. So there are three setups. For example, the left two bars are the logs, log entries are not printed and the middle two bars are with 200 stress and the right two bars are the trace level logs. We will see the CPU process time and the real time for that. Sorry. So the vertical axis is the time is nanoseconds. The lower the bar, the better. So time is smaller and for lower security lines. All the three bars are same. So we can find more speed tests from the link. It's in the on-way repo. So more, so this is similar as the WeLog and LogSystem in Google and it is still open sourced. And right now, fine green logger is currently running in Google for global load balancer. So in both internal and external traffic and also thanks some discussion and contribution from Kevin and Princess17. And thank you. Any questions? Scan the QR and I will for Yan and Josh.