 All right. So we're going to talk about current and future state of E-Steel. So how many of you have used E-Steel before or heard of E-Steel? Most of everybody, how many of you heard E-Steel had a big announcement last week about ambient service mesh? All right. A few of you, very good. So a little bit quick introduction about me. I've been working on the E-Steel project, probably for five plus years now. Well, I was at IBM, so I worked at IBM for 90 years. I've been one of the funding contributor and also one of the funding technical oversight committee and steering committee member of E-Steel. I actually wrote a book about E-Steel, E-Steel Explained, which helps you to get started with E-Steel. I'm also one of the new CNCF ambassadors, and one of my fun job where I was at IBM was writing patent disclosures, and apparently I was pretty successful doing that. Before I leave IBM about a year ago, I actually captured I have 207 patents that IBM filed for me and that it's become issued, either with US patent office or some of the patent office across the world. So that's a little bit background about me. I now work for a little company called Solo. How many of you heard of our company? All right, a few of you, very good. So it's a start-up. When I joined Solo, I'm like the 30s employee, so I'm now leading the open source contribution at Solo. We are very well-founded. We have hundreds of customers, some of the big brand names like, gosh, there's a lot, American Express, for example, BMW, so all these are our customers, and we recently valued $1 billion valuation. So we are a leader in application network. As Solo, we're very passionate about educating our users, particularly on things that we care about, which are Istio, Envoy, EBPF, and Selim. So if you scan that QR code that give you free access to our Solo Academy, what we provide a Kubernetes environment in the Cloud, where you can go through the training, which includes the newest Istio ambient training. I actually wrote some of these trainings when I first joined Solo. So now I want to turn your attention to Istio, which is the industry's leading service mesh per the past two CNCF surveys. So the history of Istio started with 2017. I was part of the launch while I was at IBM, and around 2019 to 2020, Istio had a major architecture change. Well, we simplify Istio Mixer, there was Istio Pilot, there was Istio Citadel, and if you remember all that, we actually simplified to be Istio D, where the D stands for demon, and we get rid of the mixer component, it becomes the part of the data plane. So that was a major innovation in the Istio community. Now look forward. This year in April, we just had Istio Khan, I was the conference co-chair, and Eric Brewer from Google, Istio is going to CNCF. I believe we just had approval from the CNCF TOC about that, so really excited about that. And we just had another big announcement last week about Ambient. So today, I'm so excited to take you through, why did we create Ambient as a new data plane mode in Istio, and how, what's the story behind of Ambient? So let's start with why Ambient, right? Istio has been very, very successful, the most successful service mesh out there, mostly deployed in production, we have customers running Istio, in thousands of their clusters. So why do we create Ambient? So if I'm going to ask you to think about, what do you want from a service mesh, what would you say? Like what would be the top word or top three things you come, if I ask you this question, what do you want from service mesh? Anyone wants to say something? No? You guys don't have opinion? That can't be it. All right. I'll share with my opinion if you guys are not going to share. So there's a lot of things. I think the number one thing is transparency. How many of you agree with that? You want the service mesh to be transparent to your application so that your application doesn't need to know service mesh exists. You want to incremental adopt a service mesh so you don't have to pay it for everything, right? You don't want to pay for layer seven, header-based processing, you don't want to pay for retry resilience when you don't need it, right? So that's also very important. You want to be very easy to upgrade, very easy to operate. You want to be secure, right? The service mesh is a security product. Most of our customers are using service mesh to achieve zero trust, right? It has to be secure itself and you want it to be performed really well so it doesn't increase much latency on top of your application. So let's talk about the challenges with Psycho today. The biggest challenging I would say is the transparency, right? You all know as you run SEO today with the Psycho, it's not entirely transparent to your application because you have to kind of carry that Psycho always with you, right? Because you rely on that Psycho to solve these challenges solved by service mesh which is connected to secure and observed, right? So that requires you either run automatic Psycho injection or run manual Psycho injection so you kind of rely on a control plan to inject the Psycho for you. So the end of the YAML is not as predictable to you as you had hoped for. The second issue is if you ever use Psycho, it's, you will know that Kubernetes doesn't native support a Psycho model. There was a CAP proposer in Kubernetes about Psycho but it didn't go any well. So essentially there is a sequence problem between your Psycho container to your application container, right? Whatever your Psycho maybe reaches running after your application reaches running, whatever your Psycho was shut down before your application was shut down, right? Cause you need the Psycho to be there secure your traffic for you. So what if it's not at the right order? So that unfortunately has caused tons of problem for our customers. And then the other thing is unfortunately Envoy as secure as it is, it has a lot of CVE in layer seven processing particularly because processing headers, processing rejects are very complicated things to do. So unfortunately, whenever it's still or Envoy roll out a CVE that you would have to restart your application because that Psycho needs a refresh but your application may not have any CVE. So you could potentially generate unnecessary restart and because of the Psycho. And last not the least, there are certain applications are not compatible with Psycho. For example, Kubernetes jobs, right? When the Kubernetes job finished what about the Psycho? Does it automatically finish today? It still doesn't support jobs. What if like my Cico who speaks server first and first protocols is still doesn't handle them really well because we couldn't do protocol detection really well. And as the server was sending the first traffic. So these are the transparency issue with Psycho. The other issue with Psycho is incremental adoption. As I was mentioning earlier, you would always adopt a Psycho regardless whether you just need one single feature of Istio. Maybe you just need Metro TLS. Maybe you just need ternometry or layer four or layer seven. Maybe you just need retry or header-based routing. You always have to pay for the whole Psycho regardless whether you are using it or not, right? So most of the user who we actually find the only reason they adopt a service mesh particularly when I was at IBM every single customer and internal user we talked to, it's always about their security team says, look, in order to run your microservices in the cloud, you have to have neutral TLS. You have to have FIPS compliance. And the only way to do that is either you do it yourself or you delegate to a service mesh to accomplish stuff for you. And a lot of people delegate to service mesh to do that. But you're paying for the entire Envoy Psycho who can do ternometry, who can do layer seven processing, who can do resilience. So you're paying for all that even though you just need neutral TLS cryptographic-based identity for your application. So that's another second challenge I would say with Psycho. So there are certain other challenges with Psycho, particularly in needs for other areas I would want to highlight. So for instance, security. You probably think Psycho are very, very secure. In theory, they are. But one of the challenging is what if your application has vulnerability that could be a tech surface for the hacker, right? So when the hacker gets into your application, they automatically get to the Psycho, which means they automatically get to the rest of the service mesh infrastructure. So that may not be as detailed as you want to. The other thing is the over-provision of resources, right? Because of the Psycho has minimum requirement, you tends to have to provision the resources regardless whether you need it 100% of the time or maybe you just need 5% of the time. You still have to kind of have the resource provision out there. And also because the Psycho doesn't scale independently out of your application container. So if you run like 10 replica for your web service, you still have to drag down the 10 Psycho, but in theory, you could potentially have two replica of envoy proxy doing the work instead of 10. So you don't have that control. Complicated to operate. We talk about upgrade was one of the same. We talk about Psycho injection was one of the same. In general, we've heard a lot of feedback from our OASIO customers. They are relatively cost to upgrade and they were afraid to upgrade sometimes. And performance, I think it's on the list, but it's actually on a lower concern because OASIO in general performance, we're talking about minute seconds delay introduced to the application. So a little bit of fun tweets. This is a tweet from Makaya. When OASIO initially introduced names with isolation through the Psycho resource, and he was so happy to see the resources used by OASIO, particularly memory in this case, went down dramatically for his cluster. So that was very cool. The other thing is I find this from a Cloud Foundry repository where they were saying OASIO has resource requirement. It was excessive for their needs, right? Because in Cloud Foundry, I mean their cycle probably doesn't do a lot of work. And interestingly enough, I also found this message on our board, discussion board of the OASIO community where they say the default resource limits for OASIO proxy cycle, they want to remove that. So you can see user have different requirements. Some users say, I don't want to have your cap on the default resource limit. Some users say, your resource was excessive for my needs. So they can control that, but the problem is it's harder to control that for different workload. So this is where the ambient comes in. It's still ambient mesh is intended to introduce a new sidecar list mode. And it's still so your application doesn't have to run with sidecar so that you can potentially run your application same as it is. You just included part of the Istio service mesh. You could also run your application as sidecar if you choose to. So we continue support both. So we've got a couple of blogs out there on Istio.io. If you're interested, we have a launch blog. We have a guide to study guide. I'm actually one of the co-authors for these two blogs. We heard a lot of concerns from our user. What about sidecar? What about security of sidecar? How does this compare to sidecar? So we wrote a security deep dive on that. So last week I actually did this tweet for Istio. For the Istio account, as you can see, there was a lot of excitement in the community. And we also got Matt Klein, who is the envoy creator, endorsed the approach that Istio introduced regarding ambient architecture. So let's dive into how ambient is doing this. So this is ambient, it primarily started with solo and Google contribute to the community. And this is something we're welcoming everybody else in the community to participate. So right now it's an experimental branch in the Istio repo. And we're hoping to contribute to the main branch very, very soon. So primarily if ambient was introduced for simplified operation, the transparency I was talking about, do you be able to include applications into your mesh without sidecar? To be able to upgrade your proxy your layer for layer to the security portion without disrupting of your application. So that's the primary goal of ambient. The second goal is reduce the cost. So you don't have to over provision your sidecar resources. You can provision based on what you need. And if you don't need the layer seven processing, you could just run a light service mesh. And the third goal is improve the performance. So what exactly is ambient service mesh? So there's a couple of innovation of ambient service mesh. The biggest innovation I would say is the two layer approach, where we introduce two layers to the data plane, right? So in this diagram, for instance, we're saying we have a layer called secure overlay layer that handles layer for processing. So what does that mean? So that means this particular layer, which we call zero trust tunnel who can establish HTTP secure HTTP tunnel between your applications on behalf of the applications for you, for every single path co-located on that particular node. So the Z tunnel, you can think about it as a CNI running as a demon set on your node as a CNI plugin. And it takes care of establish mutual tier. It takes care of, so first take care of the identity, right? So be able to figure out cryptographic identity based on your part of the information. And then takes care of upgrading that connection to mutual tier as application A cause application B. The Z tunnel located on application A pause node would take care of getting the right private and public key sign with the Istio control play and be able to upgrade the connection to mutual tier as when sending to the target source, which would be received on the target Z tunnel. So from a business owner perspective, we're looking at reduced compute costs, right? Because you're not paying for every single cycle as purple on the diagram, you're paying for Z tunnel per node to help you handle layer for zero trust security. From a platform team perspective, you simplify operation. So instead of inject that cycle, all you need to do is kind of label your namespace. You say, I want this particular namespace to be part of ambient and then the Z tunnel would manage the paths in that namespace for you for the co-located Z tunnel. So it's reduced maintenance for you without you needing to worry about upgrading your application when a cycle needs update. The biggest benefit in my opinion is the transparency really. You just include by labeling the namespace without the application to restart, without application owner to do anything. So the other thing we introduced, we mentioned two layer, right? So we talk about the layer for layer, which is the secure overlay layer. The second layer is the layer seven processing layer. So in Istio community, we don't believe you could do layer seven processing for envoy for multi-tendency. We don't believe that's the right approach because the noise label, the cost of attrition, what if one tendon to impact the other tendency? So we believe, I think this also goes back to Matt Klein was tweeting about we're doing the right approach. It's about the waypoint proxy who essentially is a layer seven processing proxy for us is per service account, right? So it's per identity. So every single identity in the mesh has its own waypoint proxy if needs to be. It's optional, you don't necessarily have to have it, but if you need layer seven processing, like resiliency or header-based routing, canary testing, or maybe layer seven telemetry, then that's when you deploy a waypoint proxy for your service account, which are used by your service. So it's optional, it's highly configurable. So this approach allows continue on the benefits we discussed, it's pretty much same benefits, right? Reduce compute costs because it's optional. It simplifies the operation because it doesn't inject any cycle. You can upgrade the waypoint proxy independent of your application and it's also transparent to your application. So you don't have to break anything. So if you take a detailed look of the layer seven processing layer and also the secure overlay layer, you can see secure overlay layer provides all the benefits related to layer four, whether it's traffic management, security, authorization policy, observability. So that's when you need Z tunnel, which is there by default with the ambient profile. All you need to do is label your name space to enroll your parts as part of ambient. If you actually do need layer seven process and then you start to pay for waypoint proxy. So these are the traffic management on layer seven, like the routing, low balancing, circuit breaking, resilience feature, or security where you reach authorization policy. For example, you need to say, I'm only going to allow GET method for this particular JWT claim, maybe only for this header. So for those scenarios or observability where you need HTTP-based metric, access login, distribute tracing, that's when you want to look into waypoint proxy, but it's completely optional only if you need it. The other thing I want to quickly highlight is SICA does have a place, it's continued to be supported. So you can see SICA and SICA this co-exist. The Istio control plane, which is one single Istio control plane, it supports both mode. So it can support SICA along with SICA this. So this enables the maximum flexibility that you can choose based on what you need. Now let's talk about how you actually want to adding application to ambient. It's actually really, really simple. How many of you use SICA injection using on a namespace layer today? Like you use kubectl label namespace, right? So the exact same way you label your namespace for SICA injection, you put on a different label called the data plane equals ambient that enables that particular namespace, every single part in that namespace to be part of ambient. So inside of one single Kubernetes cluster, you could have full part of ambient bar is with SICA and you could have cheese without anything in the mesh and you could have beer as part of ambient. So it's very flexible. You choose based on your needs. Now let's talk about ambient architecture for a second. So we talk about Z tunnel, which is part of the ambient profile, right? So the moment Z tunnel starts to manage your paths is when you start to add that data plane ambient label of your namespace. The layer seven, the waypoint proxy, we talk about, sorry, the diagram has PAP, apologies, it should be waypoint proxy. So we actually initially had a name called policy enforcement points, but we decided to rename it to PAP before it announcement, forgot to update that. So essentially the waypoint proxy is the optional component, only if you need layer seven. The SEO control plane is sending, pushing configurations for Z tunnel along with waypoint proxy. So the SEO D control plane is a well of all the endpoints, your application A, B, C, D in the cluster is also a well of all the connected endpoints which are Z tunnel and the waypoint proxy and then it knows to push the right configuration through XDS config down to these Z tunnel and waypoint proxy. On the secure overlay layer, which is provided by Z tunnel, you can see when application A cause application B, it actually goes through plain packs from application A to Z tunnel on the application A co-located on the same node of application A's Z tunnel and then when the source Z tunnel cause the application B, the application B's Z tunnel is going to capture that traffic. So the traffic was sent to the target Z tunnel first who runs on the same node as application B and then from there it's going to be plain packs from the targeted Z tunnel to application B. So this means from source Z tunnel to the target destination Z tunnel, it's going to be encrypted metrotour as encrypted traffic with H-Bone, HPP overlay through HTTP connector. So it's going to be encapsulated, it's going to be on single port which we're going to explain that a little bit more very soon. On the layer seven processing layer, as we mentioned, waypoint proxy is optional but in the case you need a waypoint proxy because you need layer seven processing, for instance in this case application B do you need the waypoint proxy? Then issue control plane is intelligent to tell the source Z tunnel, which runs on the same node as application A that application B has a waypoint proxy, the traffic has to be routed to the waypoint proxy before the traffic can continue to be send it to application B, which would be captured by the application B's Z tunnel before it reaches to application B. So as you can see that connection in orange is all metrotour as it's all encapsulated using the HTTP connector tunnel. So let's take a minute to talk about Edgebone. So this is a new thing that was contributed to upstream in Istio, it's our intention to have it in upstream so the sidecar can also support Edgebone. So essentially Edgebone allows all the traffic tunnels through a single mutual TLS connection using HTTP connect. And it'll allow that connection to be reused as long as the source and target uses the same service account pal, right? So from application A to application B, it allows you to use the same connection regardless whether you have multiple requests because that source and destination is using the same identity pal of service account. It helps to amortize the cost of mutual TLS handshake over multiple connections because it's the same tunnel that you could use for multiple connection as long as the source and target pal is the same. One nice thing about Edgebone is you don't need to sniff anymore, it doesn't require any metadata exchange. So it's always secure, it's always mutual TLS secure so there's no permissiveness mode. So it's essentially decouple the mutual TLS encryption from the application to Z tunnels. Z tunnel is solely in charge of that without having the sidecar running next to your application. So the next question, if you're using Istio today, you might be wondering, what about my existing Istio resources? What if I'm using gateway resource? What if I'm using virtual service? What if I'm using destination rule? So the existing Istio resources are expected to continue to work. Even though today maybe some of the resources were still working through the details such as service entry, but it's our intention that you don't have to change any of your resources as you move freely from sidecar to psychologist or maybe from psychologist back to psychiatrist if you actually have such a scenario. So what about my sidecar if you are wondering, right? So as I was mentioning earlier, we added Istio upstream to support edge bone for sidecar. So the sidecar would be able to send traffic through the edge bone encapsulation to Z tunnel. So that enables the application running with sidecar be able to talk to the application managed by the ambient Z tunnel. So with that, let's go to a live demo. I think I only have 10 minutes. I will try to be quick. All right. So let me clear out my screen a little bit. All right. So what we are going to do is I'm going to show you a live demo. Sorry, I can't type. All right. So I actually have booking for in start. So you are going to see on change but I'm going to bring my K9S. So essentially I was worried about network bandwidth. So I installed booking for ahead of time. I also installed two clients, which is sleep. If you use the sleep from Istio, it's basically just occur client that allow you to occur a request and not sleep is also occur client. The only reason I put, could it no sleep is it has a different identity. So with that, we are, you can see my cluster. Basically I have the famous booking for, how many of you use booking for application before in from Istio? All right, a few of you, good job. All right, so you can see I have booking for running. I have, all right, I have my sleep and not sleep running, right? So what if I'm going to call my booking for from sleep, right, I got a booking for a simple bookstore app. So that's all expected. So that's working. So the next thing we're going to do is we're going to go to enable TCP dump on the product page node. So this is the command I'm going to copy over because I can't remember the command. So I ends up writing a script to print out the command. So as you can see product page runs and being worked to. So what I'm going to do is, sorry, that did a copy for me. So let me re-copy this. So we're going to go to ambient worker two which product page runs and enable TCP dump. Not that I'm only enable TCP dump on 1980 and 1508, why? Because booking for product page runs on 1980 and also 15008. If you remember, that's the edge bone port, right? So we only listen to traffic on these two ports because the other ports could generate too much traffic for us to look. So now we're going to do is call sleep to product page. As you can see the traffic comes in, it's plain text, right? Because it's not encrypted traffic, right? This is when your security team may have an issue. Hey, you're doing microservices, you're not encrypting your traffic, you're not doing mutual TLS, right? So if they do have an issue, let's go ahead and install ambient. So you can see I'm just using Istio Cardo to install ambient. I'm afraid the Wi-Fi is going to be good on me and Istio ambient is a ambient profile. So basically I specify the ambient profile and I specify, let's go ahead and put access log on so I can actually see some of the logs. So all right, let's walk through the ambient components. We talk about Istio control plane, which is Istio D. We talk about Z tunnel in my cluster, I have three nodes. So I have a control plane and also worker two and worker one. We also install Istio ingress gateway as part of the ambient profile. All right, so we got Istio install, it's just that easy. So now the next thing we're going to do is we're going to expose the product page on Istio ingress gateway. So people outside of the cluster can access the product page because I'm only running on Mac and the meta LB with Mac doesn't work very well. So I'm going to access it through the sleep client but you can see the sleep client hits on the Istio ingress gateway instead of hits on the product page directly. So you can see that's what this gateway service and virtual service resource did for me, essentially exposed my product page to Istio ingress gateway. So if my Istio ingress gateway has a low balancer IP then I would be able to access it. All right, so that works too. Okay, so now the question is what about adding our services to ambient, right? So we talk about label the namespace, right? So all we need to do is adding data mode ambient. So we've labeled the namespace. All right, that's enable TCP dump which we have the TCP dump here pretty much still running. All right, so the next thing we're going to do we're going to call booking for from the sleep and to the Istio ingress gateway to booking for. So you can see I no longer have plain text traffic. I actually have mutual TIS encrypted traffic with service account based identity and this is all provided by Z tunnel without me doing anything. The only thing I did was install Istio and added the label to my default namespace. So let's check out how this works, right? What are the certificates, right? You showed me encrypted traffic but how exactly it works? How is the identity derived, right? So these are the two worker Z tunnel worker one and worker two remember I said I have two nodes. So if you run the secret proxy config secret command you can see these are the certificates managed by Z tunnel and if I run the same command on my second Z tunnel you can see my product page is actually right there, right? So the next thing we're going to do is we're going to dive into one of the X 509 certificate managed by the second Z tunnel. So I expect to see the product pages certificate. So this is a certificate signed by the Istio, the Kubernetes cluster, right? Istio is cluster.local. It's only valid for 24 hours until next day and let's look at the send, right? It's based on the service account of booking for product page. So this is pretty much same as the site card today. Now the next thing we're going to do is enable authorization policy on layer four from allow sleep and Istio ingress gateway to access product page. So what we're doing is apply this policy and we're going to do a code command, right? Okay, so this code command works. It's expected to work as we're calling Istio ingress gateway and now we're calling from... Now what we're doing now is calling from not sleep. So this would fail because the Z tunnel on the target side would reject that. So this is also provided by the Z tunnel automatically. So what if you're ready for some layer seven function? Remember I mentioned that's optional. You don't have to enable only if you need to. The way to do that is to deploy a gateway resource. The key thing is put your gateway class as Istio mesh and specify your service account as an annotation. So for instance, in this case, I'm deploying a waypoint proxy for booking for product page. So if you go here, you can see the waypoint proxy is running and you can see the logs here, right? So what I'm going to do next is let's do a layer seven authorization policy. And in this case, we're going to say, I'm going to only enable the get method. So I don't want to allow delete or update or post. Nothing should be allowed based on your trust configuration, right? We should allow the minimum that can be allowed. So you can see the delete is rejected. If you go to the logs here, you can see four or three is rejected by the waypoint proxy, right? So it's doing the work we wanted to do. And you can see, you know, if you do get commands, it continue work. You can also get metrics, same as SciCard today. If you hit down the stats, premises endpoint, right? You get all the metrics. The Istio request total, you know, any HTTP layer related metrics. Then the last thing we're going to do is deploy a virtual service that inject five seconds delay to our product page. So with that, if we call our product page from sleep, now we're calling through the Istio Ingress Gateway, we're going to see five seconds. So if you count five, yeah, that's the five seconds delay. It does take a little bit longer. And now, if you don't like ambient, so I talk a lot, if you don't like ambient, all you need to do is put on your namespace to say, you know, guess what? I don't want ambient, you know, let's uninstall Istio. And, you know, that's the wizard product page you're booking for. It's continuing to work, right? But the problem is you go back to plain text. So if your security team are okay, you know, that's your choice. With that, I believe that's the demo. These are the key takeaway. I have transparency, two-layer approach with Zetano and Waypoint Proxy. You know, the value of ambient, simplified operation cost, improved performance. There's no Istio API change, and it can interrupt with cycle. There are a lot of resources because I only have this much time. So we have a Istio workshop with ambient. So if you're interested, go ahead, register for that. We have run live stream, we wrote multiple blogs. So scan that QR code, that's when you can register for our workshop. I think I may have one minute for questions. If not, I'll be here, answer any questions you guys have. Thank you so much. I really, really appreciate everybody. Does anyone have any questions? Oh yeah. How many Waypoint Proxies does one need? Okay, the question is, I didn't get how many Waypoint Proxy you need. Yeah, like the Zetano is really one per node, right? Yeah, that's right. So Zetano is one per node. Waypoint Proxy, you have control. So first of all, your control point is, do you need a Waypoint Proxy for your application, right? So that's a checkbox, yes or no. Once you check yes, and then the second question is, how many replicas do you need for your Waypoint Proxy, right? So depends on your workload, you could potentially run two to five to six to 10. The beauty of Waypoint Proxy though, is it scales independently out of your application. So if your application runs 20 replicas, you could potentially still run three or four Waypoint Proxy and you end up to be a lot more friendly for your wallet, because you're paying way less. With Cyca, you don't have that choice, because you just have stuck with 20. Yeah, there's no choice. Yeah, that's the beauty of the Waypoint and it's also the beauty of the separation. Yeah, good question. Yeah, yeah, thank you. Well, thanks everybody. Hope you guys enjoy doubling and Open Source Summit.