 I'm not a deep It couldn't be here. Unfortunately, so I will be moderating this panel, but my name is Nina plush Kova I'm a software engineer at solo on the glue platform team And I've worked a little bit on the ambient project and Istio and I was actually a 127 shadow for the enhancements for the 127 release, but Let's introduce our panelists. So I'd like everyone to introduce themselves first and tell me what you work on and Say your favorite thing about Amsterdam. So you've all do you want to start? Sure? I'm a I don't think it's on Can you hear me? Okay, I'm Yvonne Covey. I'm the chief architect at solo IO currently I'm working on data path optimizations for the Z down component and Things that I get Amsterdam definitely strobe waffles very tasty Hi, I'm Eric van Arman. I'm a senior software engineer at IBM and I work in the Istio community I am a member of the technical oversight committee I'm also the work group lead for the test and release work group and I also do a lot of doc maintenance as well PR approvals I Guess I always say when I go places I try the beer whether it's a good or bad and I actually found some some really good beer And I have to admit I found a really good lemon cello Hey, I'm John Howard, I'm a software engineer at Google I've been working on Istio for about four years now on the technical oversight committee and various other things In Amsterdam, I've enjoyed just walking around the city. It's you know, very nice walkable city I'm Keith Maddox. I'm an engineering lead at Microsoft working on service mention in Istio my focus right now is on Some Istio ambient stuff as well as this new safe mood thing that maybe you heard about eventually my favorite theme on Amsterdam is Probably the bikes love seeing all the biking around the city. So yeah, it's been great For sure and then one more icebreaker. So I want everyone to answer. What is their lease to everything? What's the one thing that they hate about service mesh? Who wants to start? One I can't pronounce it. I will say it's your instead of it's it's Okay, my least favorite thing about service mesh is Probably all the CRT's I I My least favorite things probably that we you know We realize once you stick a proxy in between all traffic There's a lot of things you can do and so there's a lot of things we try and do and there's a lot of things You need to understand that we want to do And just everything can go wrong when you're capturing 100% of traffic and trying to do things to it Well, I was gonna say I have the same problem with It's you know when I try to type Some reason it doesn't come off my fingertips the same way. So that's one And I'll pick on the you know the pipeline stuff since I'm the test release kind of guy It always seems that there's something that wants to break There's something in the pipeline we haven't used for a while and We want to go use it and then you know things just don't work like we want so not necessarily Istio related but at least Istio build related So the way this is gonna work is I have some pre-prepared questions that I'm gonna ask And then we're gonna open it up for the audience So start thinking of questions you want to ask the panelists So the way the questions are gonna work is we're gonna start with pretty mild easy softballs and then move up in spice So the first question is just to make sure we're on the same page Everyone understands the sidecar architecture for service meshes But what's this new sidecar lists architecture? I've been hearing a lot about Does anyone want to start? Yeah, so the main difference with sidecar and the sidecar is is essentially that the workloads Don't need a sidecar attached to him and that has the consequences. You don't need injection you don't need any containers and We do this by creating a node proxy that's Responsible for all the layer 4 and security and separating out the layer 7 to a separate component We call that the waypoints. That's kind of my Basic overview it has all sorts of advantages, which I'm sure we'll talk about we'll get to Yeah, anyone else want to add to that? The way I like to think about Sidecar lists is less headache to operate When you have a sidecar next to your application. It can be really challenging to Coordinate restarts of all the applications when you do upgrades. It can be difficult to Debug it's right in the path of your application developers monitoring their own and owning their own services Sidecar lists as architecture allows that all of that complexity in theory to be Pushed down and managed by someone other than your other than you perhaps rather than application developer And just as generally gonna you know and again in theory less headache for the people who are providing value for your business Okay, we can move on to the next one It's kind of related. So why the hell does anyone actually want a sidecar the service mesh? What are the benefits like we kind of touched on it, but yeah, I mean it was kind of touched on but it's really about taking the Cost associated with the service mess that's been traditionally associated with the service mesh But that's actually not inherent to a service mesh, but rather the existing implementations based on sidecars Retaining all the value that we have a service mesh, but removing that cost, you know, whether that's resource overhead Complexity shoved on to the application You know pod owners and whatnot or all these life cycle issues upgrades management, etc Yeah, I was gonna say basically the same thing the resource usage is obviously a big thing the The ability to not have to restart your your pot your pot every time you want to update a proxy is another big one And I'm hoping performance will be another thing that we'll find out is is much better as well Yeah, I was gonna touch on what you said. We're essentially decoupling truth truly decoupling the application and the platform So we separate the life cycles. So I think that's kind of the key Anything else you want to add Okay, let's move on so ebpf hot topic So doesn't ebpf solve a lot of service mesh problems. It's actually also solve all your personal problems. Really? So well just to make sure everyone's on the same page, what is ebpf can we give it? Yeah, ebpf is a technology that essentially allows you to extend the Linux kernel and ask it to do custom behavior it Is a bit overhyped I would say what it does it is Let you customize the kernel a certain hook point So the kernel has to be aware of it You can just insert ebpf program that would change the logic arbitrarily So it has very well defined semantics and it can definitely help But it doesn't solve all our problems specifically the harder one that to solve are the l7 layers that are Depending on the protocol hard to impossible To do interesting stuff with the ebpf Yeah, the way I view it is a bit like how have you crypto currency or wasm or AI? You know once it's new we have all these conferences where everyone's talking about it all the time applying it to everything And it's fun to talk about it's great But once it matures a lot of those use cases go away and what remains is a few cases where it's actually extremely valuable, right? AI is probably overhyped, but it's very useful in some cases. Maybe not blockchain, but you know the rest Will provide some use cases. So there are areas where ebpf is great You know we have actually started using it in some areas of Istio and some limited manner But it is you know the those places are fewer than is presented in conferences like these when we're all You know trying to talk about the next big thing I think another aspect of ebpf that I think is important to keep in mind is that a lot of the Performance benefits and things that you see from ebpf being used in software Comes from being able to do operations in the Linux kernel itself instead of crossing into user space That means though when you have to cross into user space for anything some of those performance benefits Kind of go go away, right? And so it's important to remember that the linux kernel is is fantastic It's done so much and it's in its time and we see all over our industry But there are some limits and those limits are also the limit of ebpf And so when you have to cross in user space to do something like you know something layer 7 or a TLA something like that then you really start to Hit the viability limits of where ebpf can actually be useful Yeah, so are there any specific examples where it can be useful like? You've seen and yeah, yeah, definitely Basically ebpf can like make linux aware of your topology, right? So we can definitely optimize stuff with ebpf save traversals through the network traffic You know save network sacks save, you know Having you know data reach from one pod to another pod Because we know our intent we could use ebpf to express it in it and Optimize the current behavior But it's not a panacea. So we definitely plan to use it to for optimizations in general network traffic in observability Security even this is definitely a lot of valid use cases Yeah, so security is one that a lot of people bring up are like how specifically can ebpf be used for security in like a service mesh scenario and Anybody else or should I? Yeah, I can take this I think there's a lot of areas where it can be applied and you see you know a few products that are kind of inspecting You know all the syscalls in the system and auditing that and analyzing it Which is not really something that's associated with service mesh typically Within service mesh the foundation of security is typically MTLS, right? That's what brings a lot of people to service mesh in the first place That's kind of the foundation of how all authorization policies You know all the security benefits used to you and other service measures bring Which is something that's not actually implemented in the kernel Barring some minor things Practically it's not something that can be done in ebpf, right? And so when we want to Enforce the security aspects of a service mesh typically we're not going to be able to do that purely in the kernel or in ebpf Yeah, that actually leads nicely to our fault that our next question What is MTLS anyway, and why is it important in the service mesh context? Like why do people care? So I want to first start by asking all of you who's heard of MTLS in this room Lots of people great put your hands down Who here is required to have MTLS by their organization or some regulation? Also a good bit of people right and the reason for that is because MTLS is what is responsible It's the the protocol for encrypting traffic on the wire between two things You've got two computers of some sort maybe their VM maybe their Kubernetes pods and I need to talk across a network Something needs to encrypt that and the M means that it's mutually encrypted So the not only does the client this thing sending the request check the identity of the server The server also has the opportunity to detect the opportunity identity of the client and say only Certain people can can talk to me And again, it creates a syncripted syncripted tunnel The MTLS builds off of the the regular TLS that many of you might be familiar with with your ingress gateways or your edge gateways, you know with This also called SSL But the M part the mutual part actually gives you a lot of power For more advanced use cases because identity is mutually authenticated And I'll just add that it's authenticated in a cryptographic way. That's impossible to forge and Does not have any eventual consistency problems And So the talk covers sidecar sidecar lists and proxy lists, right? So do all three of those support MTLS and like different service measures they support them the same like does anyone want to speak to that? Yeah, so all all three of these are based on the same secure MTLS transport like if I think I mentioned earlier, this is really the foundation of use geo security and You know, we offer that security in all our data plane modes The details are slightly different although for the MTLS part, they're they're very similar On the security models if you're interested We gave a talk on Tuesday. I talked a whole 30 minutes specifically comparing the security properties of sidecar versus ambient So I guess check that out afterwards if you're interested in learning more Anything else anyone wants to add Okay, well, we're gonna open it for questions from the audience now So if you have a question, there's mics in the middle So feel free to come up and ask Otherwise, I will have to resort to our backup questions, which aren't as spicy as things that you can ask. So Well our first question. Hi the waypoint proxies sound like a bit like just another layer of proxies go to fix all our problems and I'm wondering do you see that the The waypoint functionality the l7 Stuff authorization policies and so on move into middlewares Eventually as a performance improvement or is that a route that you think so smashes when this deal is never going to take With a fallback to follow up question when you say middleware, what? What are you describing like you have like a flask or Django or like some PHP whether it's like these authorization policies and l7 functionalities on the receiving side on listener side would eventually Get like XDS compatibility in every Language and whether that's something that that you think is worth working towards or whether we should keep everything in a proxy or sidecar Yeah, that's good question So in East Joe we do actually support It's it's much less known than the sidecar or the new ambient mode But what we call proxy list gRPC, which is essentially what you described We we take the gRPC client and server That some of you may be familiar with and we add XDS support so that it can be dynamically programmed by the East Joe control plane and Enforce East Joe policies routing rules, etc The typical use case for that as opposed to the other architectures is that you're willing to take a little bit of pain and do a Little bit more work for ultra high performance, right basically no overhead Now does that replace the waypoint proxy by doing it on the server side? To some extent it can The thing that's important to realize though is that the waypoint proxy is doing both Things that were traditionally done by clients in the sidecar architecture and things that were done by the server in the sidecar architecture So things like routing But also things like authorization policy So even if you put a lot of logic in your service whether that's through gRPC or you know You just write the code yourself You can't really do load balancing on the server side application, right? You've already picked the server You're going to go to you can't do routing canary rollouts, etc Whereas by the waypoint being abstracted away from the service application. You are able to do those things So you could replace the waypoint architecture with smart clients and smart services Which is a bit like the proxy list gRPC or even sidecar option Those are things that we still expect to support in addition to ambient, right? It's not an either or situation It's more about what use cases are You know you need and what trade-offs you're willing to make now Of course ambience kind of the new thing so we talk about it a lot But that doesn't mean that it's the only thing and that we're going to remove, you know everything else Any other questions come up Some of the folks of linker D as well they they were doing another talk and they were talking about The benefits and the weights of sidecar less And they generally like they they one of the the pain points of like sidecar less It's more like when it comes to oh you still have to maintain an l7 proxy to do all seven stuff Generally, you could do that like only one proxy per node. Well, that's that's an advantage to then not hold so much like not spin up so many pods of like envoy and hold so much memory, but That's a single point of failure as well, right like if you if that proxy goes down You lose the whole node Do you guys have opinions on that? Could you expand on it? Yeah, my opinion you already have a single point of failure on the node. That's the Linux kernel, right? But you trust it and it proved itself over the years and that's what we're aiming for with Z tunnel, right? We it's written in rust memory safe language and It is becoming critical infrastructure my mental model for Z tunnel It's kind of becoming part of the Linux network stack and our goal is to make it so reliable So it wouldn't crash just like Linux today didn't crash but 20 years ago. It had plenty of crashes. So Yeah, to expand on that too is you know when you talk to other folks, they're comparing sidecars to There's more than one implementation of a non sidecar based service mesh, right? And in East Joe, we've been very very cautious about where we put what functionality. So It's not reasonable. I think to say that we're going to take some gigantic pieces software and stick it on the node single point Of failure and it's never gonna have bugs because we tried really hard, right? But what is reasonable to say we're going to do one very small thing and we're gonna do it very well and we're gonna be highly focused Not have feature creep. We're going to move that complexity somewhere else, right? Which is what we did in Z tunnel. So basically the only functionality of Z tunnel is just to enforce the MTLS Which we talked about plenty before and that's about it So it's very small and very focused and that's why we're confident that we can one keep the overhead and resource consumption etc low but also keep it stable and You know reduce the likelihood that we hit the single point of failure, right? Thank you. Great. There's a question in the back Hi, hi, do you consider in terms of performance Not pushing everything into colonel into colonel ring, but using the PDK like I know Calico VPP and putting the filtering and termination stuff into entire Network interface I Think the question was have we thought about offloading some of the work that Dpdk Dpdk and so Yeah, dpdk is basically a way if correct me if I'm wrong you to pass through Traffic from the network card to user mode kind of skipping over the colonel Currently we were focused on ebpf and in terms of you know optimization of the data path and I Wouldn't rule out dpdk, but it's not something. We're currently looking at so Mine is I would try to be a bit spicy. So ebpf is overhyped and AI and also Crypto and I would like to know from you besides of course proxies. What are the Big thing that you think is the next thing for service mesh or killer feature from you I think the next step for service mesh is To learn how to be boring. I know that's not quite as spicy as you might have been Expecting but for a long time service mesh has been this thing You've got conferences on conferences and talks about it kind of what you've always saying before for something like a z-tunnel or Istio as a whole to become critical infrastructure that you can depend on at the Linux kernel It's going to have to learn how to be boring and sometimes that means, you know, rejecting The new and shiny thing and new and shiny features and focusing on being production ready on being Enterprise ready and it's yours done. You know has a remarkable track record being used in some of the largest organizations out there Let's take that a step further and learn how to be boring Yeah, I was gonna sort of add to that, you know, besides being boring just something that it's there and does its job and People just don't have to worry about it Hello, so I wasn't familiar with the tunnel so I quickly looked it up and understood it would most likely be replacing and boy Or maybe you can correct me. I'm wrong. But does that mean? Like will we still be able to like extend z-tunnel with wasm or is there like gonna be a solution for that because I know currently the For extending Istio like you can write wasm modules Yeah, so in some ways it's it's not really replacing envoy if you're familiar with the architecture We also have a waypoint proxy that is optional on a kind of perv service counter namespace basis That is still an envoy and does all the functionality that Istio sidecars do today including wasm including extensibility etc But the z-tunnel itself is not likely to have that extensibility right like I mentioned before the z-tunnel is very very tightly focused It's only job really is to do the encryption and authentication and authorization on the node level and We intentionally have kept us small and avoided adding things like wasm which can cause kind of questionable We're not going to say questionable, but unreliable inconsistent performance attributes, you know If we start doing a lot of complex processing on the node it becomes a higher attack surface It uses more resources, which it's really hard to scale a daemon set by the way And so we move all that logic out into a waypoint proxy Which is just a standalone deployment or even just some opaque service And that's where we put all the rich functionality of the service mesh I can scale up it can scale down etc independently Thank you. Yeah, I guess the one thing I would have added to that was It also then only impacts the waypoint proxy to only impact that particular Application it's not going to impact everybody So an assigning resources so CPU memory to sidecars. This was always a huge headache and over provisioning. It was really easy with waypoint proxies this since to be a lot of improved but Will we reach someday a point where we just say is to figure the right size out? Like multi-dimensional autoscaling or something like that So that's really the right amount always So the waypoints you can think of them as a regular deployment And you could apply to them things like the horizontal pod autoscaler Or you know any other scaling mechanism you want That's part of the why this architecture so powerful But don't I get a lot of hops between nodes then? You'll get a hop between z-tunnel to waypoint and waypoint to z-tunnel But the main important thing to remember here is that you're only getting one l7 hop And l7 where all the features are that's where all the kind of most of the latency is spent So by decreasing instead of two l7 hops of you know client sidecar and server sidecar To one l7 hop you'll get a similar performance characteristics Even though you're going through more proxies Can I use horizontal autoscalers with it with waypoints? Yes Hi, thanks. I have a basic Maybe basic question. I don't know one of the A pain on the sidecar is that If for any reason you need to Change the ca root of your proxy and the things It is a pain With a working cluster. How does it change in the new architecture with? Yeah, that's a that's a good question So any time you need to change your ca root in just about any system it's there's going to be a level of pain unfortunately Where I think Waypoint or the ambient architecture helps at least a little bit It's the fact that it's no longer in Squared kind of scenario and here's what I mean by that in a sidecar model You had a you know every sidecar has to know generally about every sidecar And so when you're chain propagating changes out that can take a long time and make the ca Rotation process a bit longer and a bit more A bit more dangerous Even though yeah rotating your ca something you should always be careful with I think that with ambient because Now you just have kind of a you know, you're always going to have Less nodes than pods. There's less things to distribute that new route to And you know potentially you're talking about along a shorter period of time So there's even maybe some optimizations in your process that you can make to to You know Propagate that new route train of trust if I may may know Where the certificates are located that Change this behavior in the new So with and someone else can hot pan here if they if they want but with with ambient the certificates are going to be on the z-title And with whereas with the sidecar option the sidecar Way they were on every single sidecar So there are more Places that the is to a control plane have to send the certificates whereas now It's there are fewer I guess the other thing I might add at least I think this is the way it works John probably correct me if i'm wrong, but um the number of certificates you have to have Becomes less because in the z-tunnel case you're only worrying about the Certificates for those things on that node whereas with The sidecar you end up having to know about the certificates as you mentioned for you know pretty much the whole mesh unless you have it You know scoped down All right, and then I think a question in the back and this will be our last question from the audience How will we how will you make the z-tunnel highly available per note? I mean you have a demon set that you can't update without downtime and there is just one Park per node. So how are you going to address that? Yeah, so this is where ebpf does solve all our problems and We can actually We we're doing some research here about using ebpf to slowly drain a single z-tunnel As you upgrade basically you have two demons that deploy at the same time with ebpf We'll implement dragging mechanism where old connections go to the old z-tunnel and new connections go to the new z-tunnel And after you know drain period you can remove the old demon set Okay, well, um, we'll move on to um one More question from the slides. So we talked about this briefly, but what's the difference between sidecar lists and proxy lists? Uh, yeah, I briefly mentioned this so I'll expand on it a bit. Um We should probably stop calling things less and talk about more about what they are and and what it really is for proxy lists is putting more dynamic logic in Applications in in particular grpc And what this means is that you know if you use grpc today without this proxy list feature Uh, you would you know put some options in and say I want To route my service here. I want to use this connect this option this option this option But it's not dynamic like we have a service mesh, right? We you can't apply a Virtual service and suddenly have your grpc applications do it or authorization policy, etc Um, so what proxy list is is enabling that right? We use the same xds, which is the protocol used between envoy and estio to get configuration The same protocol is used by grpc as well now And so we can dynamically configure it to do Many things that envoy can do not quite everything but a lot of things And the reason someone may want to do that is performance. That's basically it, right? It's not as seamless as envoy We don't have auto injection that automatically configures it and your application doesn't know about it, right? They actually go and change your application code and say I want to use the xds options in grpc But in return they get basically the highest performance of any service mesh can offer right the overhead is effectively zero So If you have a use case where that's all you care about is I must have no overhead. It's really one of the only options Awesome anyone else want to add Yeah, I was going to basically sort of say the same thing You're always giving up something for something else and you know one of the things I always tell people about service mesh Is it takes the application developer out of the picture? I mean they typically you know at least in my mind they don't really have to do anything It's up to the platform engineer to have to worry about it And in this case, they're taking away that benefit at least what I consider a benefit And pushing a little bit back on them, but the other you know the game there is the performance Yeah to try to hopefully like add some more color to this Who here is used javaspring boot? Who's used something like like the dot net framework, right? So these are all ecosystems and libraries that you're using to build your applications But the issue is in a service mesh deployment You want to be able to use different languages and there's no guarantee that something like spring boots can talk to dot net core Can talk to jango can talk to node and have all that re implemented in different places So g rpc is the protocol that all the different libraries can standardize on And then when you fire up g rpc clients, they all know how to do this xds dance And do the mtls and all the dynamic stuff by talking to this to a control plane So really what proxy list is is you know putting stuff back in the hands of the application developer like eric said Whereas with sidecar lists, you just remove you're just removing the sidecar and putting the proxy somewhere else Yeah, great. Um, I think we have time for one last question. So it's more of a crystal ball question and I want everyone to answer it Um, so where do you see service mesh in 18 months? Who wants to start john? Uh, yeah, I can take this. I think it's Largely going away and what I mean by that is not that we're going to you know cancel estio or cancel service mesh But that we won't need to think about it. We won't need to talk about it. It would just be there, right? which is really the origin of the ambient board, right? Like you shouldn't have to go say, okay I want all these cool functionalities of a service mesh So I'm going to go deploy estio and onboard my applications and do all this stuff, right? You should just be able to start a kubernetes cluster and you have tls you have mtls even And if I want to do a routing rule, I just apply an hdp router a virtual service and I get routing If I want to enforce Authorization policies that are beyond whatever policy offers I just apply the authorization policy and it works, right? This is how a lot of things in kubernetes work today like network policy, for example It's just an api. You don't worry about how it's implemented and I think that that's where hopefully service measures going You know, um, yeah, I was going to say something very similar I mean today when you saw a kubernetes cluster the first thing you install is a cni and I think that the Second thing you would install would be a service mesh just because it'd be so prevalent and to add to john's point after What I think will happens that everybody will standardize on a service mesh You could also leverage The identity it provides you as an application developer because all of a sudden the service mesh can provide you an identity that you can trust To the application itself and you could use it to kind of leverage those features and save you some time writing your applications You have one less concern to worry about um, well, I As I sort of alluded to earlier, I sort of agree with john's idea. Um, I don't know if we'll get there in 18 months But I do hope we get there at some point You know 18 months from now um, I think we'll We'll have a better understanding at least On iskyo service mesh does this sidecar list does this ambient mode? Really play out I'm really kind of you know waiting to see what people can you know try it get it out, you know and and let us know how it works And we'll see how how things change Um, but again, I think there's a lot of stuff as you were mentioning we come to these conferences Um, and there's always the new shiny thing Um, I'm I don't know what that shiny thing is going to be in 18 months But I do think there's probably going to be a shiny thing Um, which is going to you know, that's why I don't think we're going to be there in 18 months because We do have a winding path Um, and I hope at least from an iskyo standpoint, it'll be ambient mode Um, and people will be able to use it and it will make things simpler less complex less costly Um And hopefully they don't have to debug as many issues as sometimes they have to debug Where do I see service mesh in 18 months? Uh worth it is what I uh is the best way I can think and summarize I agree with everything. Everybody has has said, um I was on another one of these panels. I think in in detroit and someone had a question about the total cost of ownership of service mesh Um, it's way too high In in current it's way too high for our users and in 18 months I want to see that if not it's not going to never going to be eliminated But I want that to see to see that severely cut down. So that service mesh will be worth it Yeah, well, that's uh the end of our panel. Thank you for sticking around and thank