 Hello everyone, welcome to our talk. Today we want to describe our efforts to enable service mesh scenarios on the Windows platform using the on-wipe proxy. Hi, my name is Praveen Balasubramanian. I currently work as an engineering manager at Microsoft. I lead a development team that focuses on the core networking platform and the network stack for Windows. My recent work has focused on high performance networking, internet protocol evolution, and container networking. With me here today is Nick. Hey, how are you doing? I'm Nick Jackson. I'm a developer advocate at HashiCorp and I'm also an SMI maintainer, service mesh interface. I'm also writing a book on service mesh patterns with my good buddies, Lee and Paul, and I am incredibly excited about seeing service mesh absolutely everywhere, including on Windows. Hi everyone, I'm Sotiris and I currently work as a software engineer at Microsoft. I am a maintainer of on-wipe proxy and a core member of the on-wipe Windows Development Group. On on-wipe, my work focuses on low-level networking and supporting the Windows platform abstraction layer. I'm also a contributor at open service mesh. Hi everyone, I'm Kalia Subramanian. I'm a software engineer at Microsoft. So previously at Microsoft, I worked on Windows container networking and I contributed to Qproxy, Flannel, and QBDM for Windows. Now I'm on the open service mesh team at Microsoft where I have been helping out getting Windows in OSM and also working on secured clusters for OSM. So now let's get into our agenda. So what is on the agenda today? We will start out by recapping what is a service mesh and we will make a case for Windows platform support. We will then describe the four major components that are required for a service mesh deployment, including the platform, the proxy, the container orchestration system, and the service mesh control plane. We will then demo a service mesh powering both Windows and Linux containers. We will conclude with a summary and our future roadmap. So what is a service mesh? In a nutshell, it's a solution that provides various enhancements to your applications, service to service communication. In the context of Kubernetes, a service mesh will inject a proxy, such as Envoy, commonly called as a sidecar into selected workload pods. The sidecar proxy intercepts incoming and outgoing traffic based on configured policies. As you can see in this diagram, Envoy is intercepting all the inbound and outbound traffic for two containers running a Node.js application. Service mesh enables users to easily apply features that the proxy offers consistently and at scale. Some of the prominent features of a service mesh include controlled traffic management, for example, load balancing, traffic splitting, applying policies such as rate limiting or retries. Service mesh also enables observability of various metrics like service to service latency, request per second and number of HTTP errors, et cetera. And finally, a service mesh also improves security by bringing encryption and authentication, for example, by enforcing mutual TLS and encrypting all the network traffic in the cluster without any application code changes. So you may ask, why do we care about service mesh on the Windows platform? As you can see here, a lot of users are asking for a service mesh that supports Windows. In the industry, many organizations have a mixture of OS platforms in their environments. Many customers are claiming that lack of service mesh support for Windows is one of the biggest blockers to containerize their applications or even to bring their enterprise workloads to the cloud. Without a cross-platform service mesh, they face really hard choices. Either rewrite the applications to run on Linux, which is very expensive, or continue maintaining legacy applications and infrastructure. There is also strong interest from many Microsoft first-party applications which are containerizing. They are adopting a microservices architecture and desired a service mesh that supports the Windows platform for all the benefits we talked about previously. Observability, security and traffic management. Now let's look at the four components that need to come together for a service mesh solution. First up is the operating system platform, like Linux or Windows, that supports containers and container networking. Next up is the Sidecar proxy that is used to redirect traffic and has the rich feature set for observability, security and traffic management. Examples here include Envoy and Linkardee. Next is the container orchestration solution, for example, Kubernetes. And finally, the service mesh control plane, which makes the solution scalable. And examples here are Istio and Open Service Mesh or OSM. The focus of our talk today is the Windows Server Operating System and Kubernetes is used for container orchestration. Our service mesh solution will be completed by Open Service Mesh, which uses Envoy as the Sidecar proxy. First, let's begin by talking about the first component, which is the operating system platform. Windows Server 2022 is the most recent release of Windows, which is now generally available. In this release, we have added platform capabilities to Windows to support traffic redirection for containers. The redirection works for both outbound and inbound traffic and utilizes the Windows filtering platform capabilities. We have new host networking service or HNS APIs, which support new policies for endpoints. These policies are applied on the container host and they require admin privileges. Currently, these APIs only support TCP over IPv4, but we do have future plans to add support for UDP and IPv6. We will take a deeper look at this policy in a later slide. Windows Server 2022 also includes new Winsock socket APIs that allow a Sidecar proxy to query the original destination for a redirected connection. These APIs are available on the latest SDK and have already been integrated with the Envoy proxy. This release also features smaller container images and extended support for nano-server-based images. Now let's compare and contrast the redirection policies between the Linux and Windows platforms. If you look at the left-hand side on Linux, the IP table policy allows configuration of proxy redirect rule with specific port numbers for both outbound and inbound directions. The rules also allow skipping redirection based on user ID, IP prefixes, or port numbers. This allows exemption of Envoy's own control traffic and prevents redirection loops. On the right-hand side, you can see that the new HNS policies in the form of a PowerShell script. If you notice, there's a lot of similarities. The HNS endpoint command let takes in redirection policy in the JSON format. Both outbound and inbound redirection are supported with specific proxy ports for the sidecar. This policy also supports exceptions, which are again based on IP addresses and port numbers. In the user set field, you can specify either a user or a user group that is exempt from this redirection policy. For Windows containers, one challenge is that user sets are generated randomly at runtime. So it becomes impossible to know in advance the user set that Envoy is going to run as in every part. This is where specifying a group set is extremely useful. You can then run the proxy from a known user group. Next up, Sotiris will describe the next component of the service mesh solution, which is the Envoy proxy. Thank you Praveen for the summary of all the changes in Windows Server. Over the past one-and-a-half year, the Envoy Windows Development Group has made substantial progress on porting Envoy to Windows. We announced general availability support for Envoy on Windows in May. And for the past 10 months, the Envoy pipeline has been compiling and running tests for almost every feature at every pull request. We strive to make Envoy configurations cross-platform. We believe that this will help control plane developers who want to apply the same configuration across their entire cluster. An example of such control plane is service mesh. With that goal in mind, we have a cross-platform implementation of the original destination listener that underpins service mesh Envoy configurations. The only notable difference is that on Windows, you must also specify the traffic direction property of the listener. We use this information to ensure that proxy connections will have the same security properties as the original connection. This is important for security reasons as the proxy and the application might have different firewall rules. We also have confirmed that the traffic direction property is already set on most service mesh implementations. Also, we have implemented cross-platform access loggers to standard output. So you can enjoy the observability that Envoy offers on all platforms. Finally, we have been working on the Envoy network data path. We improved the asynchronous event loop on Windows to scale across multiple threads. Also, we have introduced synthetic events that allow us to emulate EPO behavior on Windows. The last piece of the puzzle is configuring Kubernetes to apply the HNS traffic redirection policy for you. Traditionally, on Linux, redirection policy is configured by the init container. The init container is scheduled by the service mesh to run before the application and it applies the IP table rules on the pod. This is not possible on Windows. The reason why it is not possible stems from a core difference between HNS policies and IP tables. HNS policies are a property of the host and not of the pod. As a result, these HNS policies need to be configured by a CNI plugin. CNI plugins are already supported by Kubernetes. For experimentation purposes, if you cannot the redirection policy manually to the static configuration of your existing CNI plugin. However, we accept that for production environments, you will need a CNI plugin that will read these policies from pod annotations and have the options to have different policies on different pods. Finally, you need to run Envoy as Envoy user. Envoy user is already set up for you in the Envoy container image. The Envoy user is already configured to be part of the network operators group. You can leverage the fact that Envoy user is part of a well-known group to configure traffic redirection exemptions for Envoy. With this in place, Envoy traffic will not be redirected and you will not have to deal with nasty infinite redirection bugs. Open ServiceMess or OSM for short is a lightweight and extensible cloud native ServiceMess. OSM takes a simple approach for users to uniformly manage, secure, and get out of the box of durability features for highly dynamic microservices environments. It leveraged an architecture based on Envoy reverse proxy sidecar and works by injecting an Envoy proxy as a sidecar container next to each instance of your application. OSM is built to be simple to understand and contribute to. Users should be able to install, maintain, and operate OSM in their cluster with minimal effort. It is easy to troubleshoot and it even comes with tooling that makes debugging as easy as running an app on the terminal. You can easily configure it with a ServiceMess interface. A standard for all service messes on Kubernetes that provide a basic feature set for the most common use cases. Now for the exciting part, let's see a demo where Windows Server 2022, Envoy, Kubernetes, and Open ServiceMess work together for the first cross-platform ServiceMess demo. Hey, I'm here with Kalya who is a contributor at OSM to give you a demo of the first cross-platform ServiceMess implementation. Before we start with a demo, do you want to introduce yourself and tell us more about the OSM project? Thanks, Atyris. Hi, my name is Kalya and I'm a developer on Open ServiceMess or OSM, which is an open-source ServiceMess. So people can use OSM to secure and manage traffic between the services that they're running. And for those who aren't familiar with service meshes, generally the way that it works is that there's a proxy running next to each application that manages incoming and outgoing traffic. We call this the data plane and OSM uses Envoy proxy to implement the data plane. The data plane gets controlled by a control plane running in the mesh. The control plane makes sure that those proxies get set up properly and reflect the right traffic-related rules for the application. So as a user, how do I configure the control plane? So users can configure the OSM control plane using the ServiceMess interface or SMI. So SMI provides a set of APIs for the most common features that people use service meshes for. For example, there's a traffic-split API which allows users to define how they wanna split traffic between different versions of an application. There's several meshes that have built against SMI already. So you might even be able to use your existing SMI configurations with OSM. Awesome, I think we're gonna also see more about the SMI on the demo as well. But before we get to that, do you wanna walk us through about the demo that you have prepared on the OSM repository? Yeah, so the OSM demo is comprised of the bookstore service, which is a top-level service that load bounces across various bookstore application pods. And we have two versions of our bookstore, bookstore V1 and bookstore V2. And each of those versions get their own service as well. So the V1 and V2 services act as backends in our bookstore demo application. We also have two clients, Book Buyer and Book Thief. And those clients make requests to the top-level bookstore application. And then we use the traffic-split to serve those requests to the corresponding backend services. We also have Book Warehouse and the bookstore makes requests to Book Warehouse to restock on books. And I assume this is not the standard OSM demo, but there's some services running on Windows and some are running on Linux, right? Yep, exactly. So if we take a look at our cluster and we do kubectlgetnodes-owide, we can see that we have two worker nodes that are Windows Server 2022, as well as a Linux worker node. And if I get all the pods, I can see that some of them are scheduled on Windows and some of them are scheduled on Linux. Exactly. So if you take a look, you can see that we have our bookstore V1 pod running on Linux and we have our bookstore V2 pod running on Windows. We also have Book Buyer, which is running on Windows as well. And I see that the OSM control plane is running on a Linux node. Exactly. So the OSM control plane runs fully on Linux, but it's still able to program the data plane, regardless of whether the Envoy proxies are running for Windows applications or Linux applications. And how can we see the interaction between the Book Buyer and the bookstore now? So for this demo, we have installed an Ingress gateway. So if you get the public IP, then you'll be able to access our Book Buyer UI. So here you can see our Book Buyer UI that we're accessing via Ingress. You can see that the Book Buyer is buying books from both bookstore V1 and bookstore V2, and it's buying pretty evenly between the two. So there is an SMI policy that configures this, right? Yes, we're using SMI traffic split. And if you go to our cluster, we can actually view the traffic split that we're using. So here you can see that we have our traffic split spec and we have two backend services for bookstore V1 and bookstore V2, and they're both assigned equal weights of 50. We also see a service field, sometimes referred to as a root service, and this root service just refers to bookstore, which is the top level service that represents the two backends. And I assume this is highly dynamic. So if I want to change the weight between the two services, I can just update this policy, right? Yep, so that's really easy. All you have to do is just update your traffic split and apply it, and then the OSM control plane will take that new traffic split and use that to configure the data plane appropriately. Okay, I think I have this new policy ready here. I just cut it on the terminal so people can see it, and let me apply it right now. So here you can see that with this new traffic split, we're going to be sending all of the traffic to bookstore V1 since it has a weight of 100, and no traffic will be going to bookstore V2. And if you look at our book buyer UI again, you can see that we are no longer buying books from bookstore V2, but we should see the bookstore V1 counter increment. In this presentation, we revisited the fourth verticals that underpin any service mess. We saw the HNS policies and Windows socket options that allow you to configure the operating system to redirect traffic to a sidecar proxy. We went through the progress that Windows Development Group has made to make Envoy proxy run seamlessly on Windows. Then we went through the configurations needed to ensure that Kubernetes applies the redirection policy for you. Finally, we showed the first demo of a cross-platform service mess running on OSM. We thought I will hand it back to Praveen to tell you what is in our roadmap and how you can stay updated. Thanks, Sothiris. That was a great demo. On our future roadmap, we have worked across the OS platform, OSM, as well as Envoy. On the Windows platform side, we would like to enhance HNS policies to support IPv6 and UDP traffic redirection. On the service mesh side, we would like to upstream production grid support for the Windows platform on OSM. And finally, we'd like to improve Envoy extensibility by adding web assembly support on Windows and also publish Windows Server 2022 container images with Envoy. We encourage you to join us and contribute to this effort and also provide us feedback for future improvements and our roadmap. All the work here on Envoy and OSM is being done upstream on GitHub. You can look at known issues, report new issues or feature ads on GitHub. PRs are also very welcome. In addition, we have Slack channels where you can get support and answer two questions and help coordinate work. This work would not have been possible without tireless effort across the industry. We would like to thank all contributors across Windows networking and the OSM and Envoy communities to help make this a reality. And with that, thank you everyone for attending our presentation. We'd like to open it up for any questions you may have.