 Okay, I think we're live Well, welcome everyone to today's linkerty and user panels on case studies on production And I'm Catherine Paganini with buoyant the creator of linkerty where I work very closely with our end user community Including our panelists today. We'll hear from four panelists today from a variety of industries All of them have lots of linkerty production experience. And so yeah, why don't we go through? The line and why don't you introduce yourself and tell us a little bit about who you are Your implementation and what were you trying to achieve? Ali Yeah, so my name is Ellie Goldberg. I'm with the platform team and salt security Salt as a as an API security company. So we protect the APIs for our customers we're running linkerty on multiple production clusters and we came to found linkerty because we were actually looking for a GRPC low balancing solution and that was a perfect fit for us. So happy to be here Cool, my name is Casper. I work at a company called lunar. I'm the lead platform architect I'm also a CNCF ambassador and yeah, we use a LinkedIn in production across multiple clouds So we primarily used a multi cluster feature and we are sort of slowly adopting service machine in general Yeah, my name is Christian Hoening. I'm the tech lead or the Cloud lead for at Philip connect. We are using linkerty since 2019 in production. So we were quite early adopters and Philip connect is a financial services provider from Germany That's why we care deeply about the mutual TLS features and these kind of things Yeah, my name is very clean by I work for a small consultant company in Norway Oslo I've been helping various companies over to the cloud and more Kubernetes landscapes and using linkerty on a couple of products awesome Yeah, and definitely different reason Different reasons for you to linkerty, right? And so let's talk about your PC load balancing real quick So you did implement your PC at salt security and can you tell us a little bit about that? And how you realize that you needed a service mesh? Sure. So, um, Sorry Yeah, so it's all security. We're running with about 40 microservices on Kubernetes and We introduced your PC in the past couple years because well, we were growing rapidly and we've seen a lot of friction between teams and deployments of microservices So API's would break and we're looking for kind of a way to prevent Prevent that from happening before we reach production So gRPC was kind of the perfect solution for us Being based on proto buff protocol buffers and an HTTP 2 So it came with amazing backwards compatibility guarantees so those are a real great choice and Well, gRPC is also based on HTTP 2. So as opposed to HTTP 1 You don't really get actual a little bouncing for that. So that's what we had for linkerty Yeah, we had we had the same use case We also using gRPC a lot internally to communicate synchronously between our services and With the scale that we sort of get right now and the gRPC load balancing is definitely becoming a problem internally at our clusters So having, you know, a service mesh that can help you actually Know balance between all this those parts. That's Yeah, really valuable We actually circumvented that because we only started using gRPC once we had linkerty We only noticed that this is a problem once we ran this like in depth without linkerty occasionally. So You got it for free Okay, and You did write a CNCF blog post and then there you wrote that once you meshed your service your services You a new whole world of reliability visibility and security opened up. Can you tell us about that, right? Yeah, so I mean at first we didn't really know what a service mesh is We're looking for that like the simplest solution for gRPC load balancing But as we're kind of meshing our services and adding them adding into the service mesh So we realized we have something far more powerful than a sophisticated load balancer So first our inter pod communication was now encrypted and it's super important when you're aiming for zero trust But more than that we we just started seeing all failures like all requests on the network level We saw those failures and we were able to monitor it and trigger alerts for specific teams owning the services So that's pretty amazing And finally like in terms of reliability You can actually add retry policies so that errors are start looking like small delays and And yeah, it was pretty awesome for us. That's really what got us into the journey of reliability And service profiles, right? Yeah. Yeah in liquidy that's It sort of gives Service owners the opportunity to actually, you know define that this endpoint is retryable and then you sort of get this feature built in and you Yeah, it's very explicit Super simple. Yeah Yeah Awesome, and so Christian You were I think the earliest linker the adopter you adopted it when it was still called conduit, right? Yeah So for thinly MTLS was the key driver right and due to some EU regulations you had to implement it within Three weeks that sounds crazy. Can you tell us a little bit about that? I can so that was when the team and I basically the teams over There somewhere hey team When we joined the company in 2018 that was basically the beginning of a cloud native transformation And as part of that we had roughly in nine months to do everything Stand-up Kubernetes introduce Helm charts get off all the things Prometheus monitoring and at some point we realized okay We also need MTLS between everything so far this was done with custom engine X proxies in front of every service on VMs and stuff and So we looked at options and what we found was Quickly, okay, people have done this already. There's the service mesh space. We found linker D1 We found also Istio naturally and we looked into the solutions and as everybody was kind of doing Istio at the time We also tried that but we quickly figured it's working But the load on the developers already was too high to also then write the required Istio specifications again and then so we found conduit and I just joined a slack and we had some conversations and what I heard Felt really convincing and reassuring There was no MTLS back then, but they said like November ish December ish. So we were like in October. I said, okay Let's give it a try And they delivered so that was good. So trust was set up and by I think April 1st We went live with I think the first really released version of linker D2 with Actually MTLS enabled. Yeah, and that was possible because it was really easy to install it And not write any additional Specs to get it working with the services. So this is a piece of pick up was really good. Simplicity is such an important role here Yeah, definitely. Yeah, we tried out actually the old linker D1 Back in 2017 I guess Actually, my colleague Bjorn is just sitting down here. He was the one who tried it out We had a lot of issues back then and I think both the issues that we saw but also at that time He wasn't really mature enough. We were me as a platform team basically and then Bjorn came in and sort of tried things out and Yeah, at that time in service base was just wasn't there for us And I think there were some issues with the the old model as well. So but yeah Really wanted all of those features, but I guess it's about maturity as well. And when are you ready to adopt? Okay, and I think now Frederick will have to help me because I butcher the name every single time I say Elkiop Yeah, let's put on I don't know like generally when I said it like no like the people in the Nordics do not understand Which I mean so it's like Elkiop. Yeah, so you are a service provider, of course And you helped Elkiop transition move a huge legacy app to a cloud native platform and You wanted to recreate a past experience in-house. Can you tell us about that and the role that Lingardee played? Yeah, it's hypnotic is Some went up the kind of best buy of Nordics and sell like electronic goods large online presence as well They had moved more of their application development in-house instead of buying off-the-shelf products and Was then building up the microservice Architecture on Azure app services, which are great up until a point We have a lot of those microservices and it become quite costly and then Henry Hognos the cloud solution arcade at the time I introduced Kubernetes and hired me as a consultant But when you introduce me to Kubernetes and Lingardee you or not Lingardee with Kubernetes You lose some of those metrics that you get from Azure app services and also you lose the encryption So that's where we started looking at Lingardee easy to get started great to operations documentation and community Where we get that Insights especially on a networking part That that saved us a couple of days before black week We found some some bugs in some applications where they were not you're using sockets correctly for example And that would have been difficult to to troubleshoot without getting that the link to the insights Yeah, I think especially this out-of-the-box for golden signals was also back then in these early adoption phase really what Helped us also to understand the stack as we were also new in the company. We didn't really know the stack completely So we were also then able to to look at things and see okay. This is working This is maybe not working all that well So this is a not-of-the-box thing that if you do not have custom metrics yet that that really helped us also Yeah That's a nice additional feature again That's a lot of like these additional features you sort of get when you add up the service mesh You you may do it for one thing when you get a lot of other stuff in the mix as well Which is super awesome You can get a lot of those a lot like event sites out of those metrics So I can see a lot of requests going or few of them So you can see the volume and you can also see their quest as well So just see the whole three golden metrics over there. Yeah, it's pretty cool Yeah, we want to do also to move as much as possible onto the platform such as retries that move on and encryption Leave yet the developers from that so they can focus on their application instead of Rotating certificates and encryption for all of the communication Awesome, and of course I mentioned you are a service provider, so you're not only working with you work with a variety of Different companies on helping them in their collective journey. Can you tell us about another example? You're currently working on where you implemented linker D. Yeah, that the project I worked on after L shop is if the insurance company in the Nordics as well Again, they have already chosen linker D. We know when I'm starting the project. So job job done there Somewhat similar problems at the encryption and the insights which had to go get ups everything as well and That that the line perfectly with how we can install the linker D and use that as well We're gonna have a talk at four o'clock. Oh, no, that's one check that out. Yeah Great, I mean and Casper you centralized a bunch of platform services. Why was that and what resulted you to you? Yeah, so maybe I should just you know roll back a little bit in time and so we as mentioned been evaluating service mess for a long time and and we have been long followed that in that period You know looking at all the nice features we sort of could get if we adopted But wasn't ready to take on the complexity that we actually fought came with a service mess and then at some point we We figured out that yeah We wanted to centralize a lot of services because we didn't really want to replicate like our Observability stack between all our different environments and you know logging monitoring all of that So we want to create like a central cluster instead and then find a way to connect clusters across Initially it was just between AWS accounts clusters running in those But now we are actually doing it between different cloud providers as well But the basic idea was in the first initial one one was really to centralize our log management solution so just use the multi cluster features to basically mirror the This Kubernetes service in front of our log management system to X amount of clusters out there so though That through and bit which we use to collect all the logs and just transparently send all the data Through that Kubernetes service that is being mirrored to the opposite side and directly into your log management solution so that's sort of Why we started to look at service mess because Lingardee came out with this in 2.8. I think and Yeah, that that was sort of this is this is what we really need So that's what really tipped it for us and then we got all of these other nice features as well And it basically was like you started the other way around right like First with multi cluster and then mashing our services, right? So yeah, it's kind of I guess opposite of At least maybe your story is at least that you sort of want something in cluster And that is why you adopt what we wasn't we were really not ready at that point to take on that It's still we were like two three people and Running a lot of stuff everything ourselves replicated, you know environments and stuff like that So we wasn't just ready to do that So but then we have just gave you skate a lot in the past year. We've added for 500 new people We went from 200 to 700 in a year. So that's just been a lot of growth and and now we also sort of maturing in our platform organization to have enough people to actually take on this and Yeah, so multi cluster just Makes a lot of sense for us and it just gives this transparent communication that works really well And it gets very simple right because if when you hear the word service match for the first time It sounds a bit scarier, right? Like what is it? What do I have to do to run it? Like how many people do I have to they have to the kind of supported when you get it running? Kind of rolls Yeah, and for for us we so we basically sat down To two people actually beyond again. I'm pointing to him again Me and him we just wanted to try out the different options now. So we tried out this year. We tried out linkety I tried Setting up this multi cluster feature in linkety and it took like an hour and our clusters were just connected and I'm not sure if it's beyond the skills or We couldn't get that up and running. It took like forever. So we we just came to the conclusion This is just so simple and it just works And so that's why we we went with linkety Yeah, I'm actually looking into the multi cluster as we speak and quickly realising that it's quite a difficult problem to solve So curious to see how to to move To the summit of the similar subject and the traffic is encrypted also. So it's not connected You need to have the like the certificates and manage the whole thing Yeah, and it's just like it's it's not like everything can can talk to everything You you can really control which services do you want to expose and now with several Authorizations and stuff like that in in a good year as well. You can actually put on some You know policies of who can actually talk to who and get that validation and security in place as well It's pretty awesome Okay, so let's talk a little bit about business impact Um, particularly with uh infrastructure investment. It's often difficult to justify because it's kind of invisible Um, so were you able to frame it? Frame your adoption in terms of tangible results to the business And this is a question for anyone. So just jump in. Well, yeah I mean, we were able to go live and they prefer while hitting the compliance goal That was the thing. Yeah, and I think now We really struggled a bit with implementing for instance network policies on the cni level because we wanted to do that In a way to expose it to the developers that they can Self-service themselves and that's a bit hard on that layer So with linkity policies also now with the new 212 release I think it becomes a lot Better to do that to to allow The teams that are doing that in a dev ops way to to say i'm talking to these services So that's will also enhance the security and allow it to to hopefully Very quickly spin up this this whole policy mesh if you want and So that's that's also a good business impact because you can then take another box We did a backup and envelope estimation on the cost setting of just hosting the the application at lship I think we were 80 85 percent cheaper. We went to the kubernetes Stuff like that and Yeah, and also the policies becomes quicker and easier to team flipping instead of going souvenets and UDRs and all that stuff and you can you can quickly it right and then try it out So the policy features coming out is definitely something you're looking into at the current project as well Yeah, yeah, I mean like what are the features that you get with with having a service specifically a linkered ease Is that like even if you use it for htp requests? So linkery will automatically upgrade it to htp2. So it just becomes faster out of the box That's also something you can measure yeah for us also the cost management aspect of the cost aspect of Adopting like a d especially in the case of where we are sort of centralizing all entire monitoring stack logging stack and all of that Is at some point also becoming we are not completely there yet. We haven't like migrated everything but But just removing some of the yeah the replicated dedicated resources that just runs and costs some a lot of money if you can centralize those into You know a studio cluster and only one run run of them and yeah, we save some some costs on that as well But also for us this multi cluster is part of and is part of our sort of growth strategy as well We acquired two companies last year and we are set out to probably acquire another company this year So we need to figure out how do we then you know onboard This these new acquisitions and and make everything work and come into the same sort of environments That rerun so that we can you know, so we are in the financial service industry and A lot of policies and we build a lot of tooling around the platform We run in AWS today and we really want to extend and Yeah, provide to all the different providers. We might be on at some point But I think this is this is an especially good topic this this reproducibility We also in the multi cluster business right now. We are not doing cluster mesh yet I see that upcoming but um at the moment what it helped us to do also is We run cockroach for instance and we also want cockroach of course to be TLS secured And if we do that With the building stuff and we rotate the certificates often you have to restart all these databases and as we Replicate our stack. We have like 30 cockroach database clusters or something and just using link ad for that helps them to to to Don't have to care about these search rotations anymore So that also reduce the operational overhead on that end or some other edgy solutions like I don't know rotating them every 60 days automatically. Yeah Okay, so what advice would you give other people considering adopting a service mesh? Just do it. Yeah One of the things we saw at that ship was to maybe not go all in on service mesh and we're quite relieved to see That link they made it possible quite easily to to just add some application to the service mesh And still be able to communicate to the applications that weren't meshed Then we can opt in the other ones and gradually move everything over to the mesh Yeah, that's a really good point. That's also how we sort of started because as mentioned we started with you know connecting clusters and only Mesh putting the services that are sort of on the edge Into this service mesh and not all the rest So you can really you know adopt as you want to and just take it that slow as you want to which is really a really good point Yeah, right, right. I mean because it's so easy. I mean we found out we had we had like certain bugs In some places that we were just weren't aware of because it didn't show up in logs But we could see like excessive calls from one service to another you can see like, you know, it's being hit really hard and I mean, we wouldn't know that if it wouldn't, you know, introduce a service mesh. So you might find I Hopefully not but you might find out you have like things like that. It's just extra observability really Makes your life better I think what's to add is also that you shouldn't Yeah, get engaged with the community like for anything if you have any question Like can this technology be injected with linkity or something you could hit the slack and there's people Who already did that or tried that or have a tip on how to do it? Like stuff like rabbit mq and other things that might be a bit special and network-wise depending on how you integrate them Yeah Yeah, check out the documentation and I really love the documentation in the community around linkity It's very welcoming and easy to to yeah get in touch with everybody really great the ceo William always when you ask question in the chat, he also replies. So it's very Like a little family so to speak The biggest cause in in system in the system lifetime in my experience is the if the maintenance part not the development part So if data operation is horrible, but it's easy to get started You should be all the other way around if anything and we saw that linkity and not only the community, but like you You said also the documentation is next Yeah, second to none really good Yeah, but also, you know, day two is also something that that's all I guess already always interesting to to chat a little bit about in You know panel like this. Yeah, because that's also of course also at least as where we are right now And that's one little issue or a knowing thing around, you know We had to restart your proxies whenever you sort of update to a new version and all of that which is when you start to inject Proxies all over the place and I don't know 800 900 parts or whatever we're running And having to restart all of those sort of sets puts and requirements and yeah, how we actually can do that um But yeah, yeah, I'm seeing that a barn is now presenting the the full managed what is called automated Something something Seems like there's a solution coming up a linkity operator. Yeah. Yeah. Yeah. Very cool Heliky also We had some principles Started at the different projects and working on one of them is that you should have You should actually have a problem that you're trying to solve And then introducing something like a service mesh And the good thing about linkity that's very focused on solving particular problems and you can opt into having more it's not like Now you have an ingress controller and then on and on and on and you can solve the mutual tls I can get some insights and then you can opt into the other modules if if necessary like that Yeah, I mean, I think some of you touched upon it a little bit But what are the biggest unexpected benefits and the biggest unexpected challenges? When adopting linkity one big challenge We had was as we were starting out with that early on There's a couple of hard things in computer science. One is naming. The other one is rotating certificates I guess so when you really run this for over a year at least in the beginning you At some point you had to rotate these certificates And there's many intermediate ca's and it was a challenge in the beginning So we managed it kind of um, but after that we decided that we should contribute and then just added this Cert manager integration, but that was certainly a challenge. We had to overcome in the beginning But then the benefit of that was of course that we since then have like this on a full autopilot and it gets auto rotated So that's nice Yeah, that sounds really nice. We we built some scripting as it is right now around rotating certificates So I'm definitely gonna adopt your solution at some point Good But that was also one of our biggest challenges. How do we roll and update certificates? Yeah I mean we really want to roll these as often as possible at some point right and I guess you can do that. Yeah Cool. We started testing out the uh, the scalability as well and quickly realized that the the proxy for LinkedIn is It's not a problem. It's too efficient It's usually too less of city and memories. We had to tune that a little bit too To be able to to kick off the scaling part. That's a really not an issue, but uh, I got you Okay, maybe anything about day two experience anything to elect to share Maybe just like maybe just a small comment on one of the things that we sort of Didn't knew we would get but we are really sort of, you know, really happy that we have right now is the concept of traffic splits From the smi the service mesh interface Which is also implemented in league of d It's really really interesting to have that feature where you can so what we used it for was actually verifying that Fluent bit was able to buffer locks if the connection between the cluster Between the clusters were down. So we just that basically, you know, inserted that traffic split that You know at some point you read just forwarded all the locks going through to a black hole and we saw all the Fluent bit buffering up as expected and then you could just flip The traffic back on and see how it sort of backfilled the the locks. So getting that sort of Hook in the network is also something that can help you a lot in terms of chaos engineering If you want to, you know, explore that concept a little bit more That's something we're really super excited to try out, you know, canary deployments and You know using what it's based on traffic splits, right? So canary deployments and You know, it's pretty awesome I'm gonna try that I can just add to that that the canary deployments are really awesome. You're doing that, right? I think I think Jan over there, he really picked it up very early So we we enabled that very early It took a while for the teams to actually use it to pick it up But by now we have one product that's completely deployed in a canary way through linker d And it's yeah, that's I mean, that's that's just good to have and it's a nice day two thing that you can discover As you said like as you go, you can adopt this if you want Um, because you have to make your decisions based on because with everything you adopt you grow deeper into the technology But I think here that's uh, that's a solid bet Yeah, okay, and um We hear a lot about this from from benderson generally, but where do you think What is or should be the future of the service mesh? I mean circuit breakers or something. We're really looking forward to I think. I'm not sure linker d supports that yet Um, but that's something we really want to want to do because You know when things start to fail and then services start calling the failing service We start getting warmed up and sometimes it doesn't allow the service to kind of recover from from those errors That's really something we're looking forward Um, another thing is maybe I mean, you know service to service communication Uh, that's one thing but also You know communication coming into the cluster. So sort of like maybe an api gateway somehow Um, you know kind of connect with all that Um, that'll be something I'm really looking forward to see I certainly think that this whole security space is interesting the whole policy management and and good tooling around that because what we see We run 5 000 ports in our production cluster and if you if you go with some solutions that do like this auto detection You get this massive 30 000 line yaml file and you should just apply that because that's what you're doing And okay, and then I think also here There's a desert space to have a nice solution to go bit by bit service by service somehow Define like the dependencies and then grow this policy network So I think that's where I would love to see more or better tooling also and advance it a bit And then I also think we should ensure to stay true to the kind of the unix principle Do this one thing and do it well and not everything So I think it's in a nice space at the moment Yeah, yeah, but from from my perspective, I think there's still some things missing around You know life cycle management and Kubernetes around sidecars that are not really fully there in terms of You know Don't shut the proxy down before my application is you know finish off loading whatever it needs to do So having sort of that life cycle. I know it's it's something that's been worked on the community, but It would be nice to have that because it's not really fully supported. So we just right now, you know rely on timeouts or grace periods But yeah, and another thing in the future of the service miss I really hope to see a lot more, you know automation around the music cluster Set up and linking clusters You can link them in one direction or both by direction or if you want to but It's kind of like you need to go into this cluster. You need to run this command You need to go to the other cluster and apply this A yaml that the other command output it So having a way to sort of you know an operator or something that you know Generates these things and put them somewhere and another place can be fetched those and apply them and make that you know process a lot more Automated it would be Would be really nice to have like yeah more more automations more radius. I guess one cluster to rule them all. Yeah We see we we saw also the side card pattern And the support for cron jobs were a little bit challenging because the archive wouldn't stop and then their jobs wouldn't stop either Also another question we we got once they saw the the good overview and insights that we got from all of the applications on one cluster A couple of people were asking could we extend this service mesh to non-cubinette this subject as well? Like in a vm or and have the measure of everything I'm not sure if that's the future Got some questions for that one Okay, I think we are open now for questions so You can either come to the mic and ask a question or lisa will bring your mic Okay Hi everybody and by the way kasper, why aren't you wearing your cncf? I know In this very warm weather The online question is are any of you using or interested in using automated failover features in lingerie 2.12 Why or why not? Well for for us, I think the the way we see multicloud as it is right now So if you're not familiar with the failover, I think what it can do is that if that's service in your cluster Yeah, I sort of responding with some some bad things or whatever it can You know failover to the same service running another in another cluster somewhere so that you get that capability For us it's our goal is not really to to come to a point of you know Federating services and running free in this cloud and free in this cloud It's more like we want to locate and put services where it makes sense close to the to the business integrations or the critical integrations That this particular service needs to communicate with um, so it's not really something that Having that that failover thing is not really something for us at least not across clusters But it could be maybe interesting to explore more. Yeah in cluster I certainly see it coming. I think as as the company grows we're going Growing across europe and as as this has offering Expans we might also see like a multicluster setup and then it's a natural thing to I I believe and then linker d would be a good fit In my opinion to just do that But the real struggle here is I think to solve the the state replication in the back end because sadly we have state So that's that's like more of the challenge. I guess actually works well with cockroach right with your deployment because it's Oh it does Very good. Yeah, yeah Well looking looking into that that that area or that that direction For for the current client. I'm working with now. It's a bank idea bank accept. Which is the identity and payment provider norway and it's um It's not good if those are down So and to having to fail over to another region not only be um Yeah, another region in the cloud provider as well having multiple clusters doing the get up setup Definitely something you make looking into yeah, but one thing we're toying around with is we actually mesh in our api gateway. So if we You know if we like Enable fail over for that for that purpose So you'd have requests coming into one cluster in the api gateway, but I would like redirect you to another cluster So all the rest of the services are actually serving it from the other cluster That's the idea because you know it's further away But um, if it's everything everything is failing So you'd rather have slower quest than you know failing one. So yeah, actually a good idea. Thank you You mentioned a couple of times that The reason to choose linger d was that it was more simple to run or work with than Istio So I was wondering Could you mention or are there any places in linger d where you currently find it too complicated? Like where are the parts of linger d themselves where we could improve in the community? To make something even more simple so any places we can do something better well, I mean like well one example is I think Um, the whole thing with off policies Or like service profiles like for me it was simpler. Um, there are certain features and let's say Istio For example jwt authorization Which we might want to see in linger d, but I don't know I just felt like it was A little bit distracted away in linger d. It was a bit simpler In terms of like, you know deployment was you can deploy with the with the linger d cli But also I think it's just like the architecture is much simpler like for me to understand like when I had to look at You know other service meshes. It just took me a while to figure out what was going on So just like I think that's like the main driver for me when looking at like What am I going to deploy? Um, it was simpler for me to understand what was actually going on in there the architecture, right? Yeah, and I guess what you also mentioned the Certificate rotation is something that we as a community could work a lot on and it's probably being worked on With the operator as well to actually make that a lot easier I think we solved it up to the trust anchor and that now needs to be solved as well So you have this trust anchor and if you want to rotate that then you have to It just trickles down if you have to do it everywhere, but it's doable, but still maybe more complex than it needs to be Yeah On the project we're not opposed to complexity. We're more looking towards avoiding unnecessary complexity And that's that's where I think lingerie does a good job Any other questions? Okay, nothing Okay, well, uh, I think we're all going to be at the buoyant booth in a little bit So if you have any further questions, uh, you can join us there Otherwise, thank you so much for joining us. I hope you learned something about lingerie today and yeah Maybe until next time. Thank you