 Hey everyone, welcome to From Zero to Proud in two months. Happy to have you here Let's begin by telling you a bit more about me My name is Maria and I am a career changer in my past life I used to be a nutritionist and around four years ago I decided to go all in with the programming courses I was doing on the side I got into a web-developed bootcamp in Mexico City and right after it I got my first job in tech where I was cornered into figuring out our cloud setup in AWS Which I only had a faint idea existed and slowly became the DevOps there And since then I have intentionally continued to pursue this path. I met Kubernetes last year While on my former job at Pento, you can see now I work at StickerMule And it is in Pento where this journey takes place From starting to learn more about Kubernetes passing the CKA and attending my first KubeCon along with my colleague and friend Gabby So far it has been a great ride and this is where the Lincordie adventure takes place and well, but to be honest along the way the Cloud of unknowing has been the one I have spent more time navigating and getting used to exploring learning figuring out stuff and Asking for help and this is also how my Lincordie adventure started with me knowing zero about it So let's move on to setting the scene What was going on at Pento the company I was working at the moment? It is a fintech it operates in a regulated space where security is a big deal So what was going on now in the platform team at that point in time? We were three and we were implementing a github's pattern using Argo CD Which also meant that we were moving away from doing any manual steps in a month mindset of automating as much as possible So later down the line I ended up being a team of one after some unfortunate events involving layoffs Hence the title of this talk. So sometimes you will hear me talk about We and how I did get help from my help from my team during the process Even if I was the one responsible for the project So now let's go to the engineering team side where the catalyzer event takes place So around Q3 last year our engineering team Started breaking down the monolith because we're moving to a microservices architecture So that means that up until that point it did not make sense for us to have a service mesh In fact quoting material from the liquidity course because a service mesh works by operating and measuring the traffic between services It is only really useful in a microservice application Cool now we needed it and to be more precise We were more we were primarily interested in the set of features around security So mutual TLS so we could move towards a zero trust security approach Which was also a push from some of our customers It would also help us make sure the communication between services was encrypted Preventing any other actor that could potentially infiltrate our network to intercept that traffic We wanted to take that from the engineers plate who are busy breaking up the monolith bug fixing and feature creating and did not have the time to implement service to Service to service authentication for each one of the new services and quite frankly Why would they when a service mesh can do it even better? So with empty lesson place we could enable client-based Authorization and control traffic to mesh pods in a more granular way using authorization policies on top Okay, so we had established we needed a service mesh, but which one should we implement now? so we knew it had to be simple to install and maintain and Service meshes in general are well known for being tricky to get right and we wanted to avoid dealing with unnecessary Complexity since the only resource available to take on the implementation was me So also my manager had experienced working with Istio before and his life was not happy The feature we needed authorization policies had been released into Linkery the previous year So we were good to go on that side In summary, we did not have the resources the time or the confidence to run the alternative Linkery had all the features we needed and they had us at the ultra simple part So due to all the previous reasons while we were at an unofficial team off-site working in Spain And after wondering for a while my what my manager said I want to try Linkery and I said okay I'll do it trying to appear as confident as possible So let's get into it Here I will cover Quickly and on a high level the beginning of the process where the goal was to have a proof of concept Working as fast as possible in our staging environment, and I am putting a pin in the things we are coming back later So to get started. I'm pretty basic I used the 101 linkery tutorial along with the guys Introduction to service mesh and getting started so to test in my local environment using mini-cube Linkery CLI and emoji voto that was super fun But it was time to put the linkery CLI aside because in our particular case We used customize instead of helm to manage our infrastructure resources So I then invested most of the time getting right the templating by tweaking the values.yaml file to adapt it to our needs Considering we already had realized we wanted to leverage automatic certificate rotation So there is the github's mindset already creeping in so I faced my first bump in the road as soon as I deployed into staging I had an issue with the proxy need containers not initializing So short long story short using the Docker container runtime instead of container D You need to set the proxy need container to run as root for it to work. So it is able to update IP tables I also wanted to have the dashboard you get by installing the linkery this extension Which was useful to see what was going on while we started meshing names faces And here's where we faced our second bump which was figuring out what was happening with our cron jobs But more about these later too Okay, so after validating our assumptions and making sure everything worked as expected in staging and demo environments We started getting ready to deploy linkery into our production environment So I found two resources to be really helpful helpful at this point and went ahead to create our checklist using Linkery in production one-on-one and buoyance linkery production runbook and we can divide the tasks that we tackled into two different kinds one would be preparing our system and linkery to be ready for production and the second one would be setting ourselves for long-term success managing linkery which were tasks mainly related to automation documentation and monitoring Okay, so I will start covering each one of the tasks we tackled to get our system and linkery ready for production So the first one of all would be run the automated checks sounds pretty obvious But it is easy to forget about it and it can save you a lot of headaches In case you're using GKE review your firewall rules We previously had updated those to allow certain ports in all of our environments, so we could go ahead with next steps Then use customize or helm to deploy linkery So the linkery CLI with linkery install are really convenient to get started and test locally But for a repeatable and automated approach we used customize in our casters from the very beginning and I would highly recommend using this approach early on in your journey so you can figure out certain management Flags to use what custom overrides to include in your values.yaml files You need to pass when using helm install or helm template to obtain the manifest So using your own image registry is crucial when deploying linkery in production We did encounter problems while pulling images from the public github repository Which led to fail deployments in our femoral environments So to address this we decided to host the images in Google's container Registry pulling them from the public registry and pushing them to ours After further evaluation we discovered that the most efficient way to specify the new images for each linkery component Was to update the values.yaml file instead of trying to use patches This is really important Enable high availability mode some important features of this mode are it replicates critical control pane components and along with Anti-affinity rules you can distribute those components across multiple nodes and even different zones and Also sets resource requests and limits on control pane components and data plane proxies It also requires the proxy injector to be functional for any pods to be scheduled So h.a. mode adds this restriction in order to guarantee that all application pods have access to MTLS So it is very important to add a label to the cube system namespace So the Kubernetes API server will not call the proxy injector during the admission phase of workloads in that namespace Allowing system pods to be scheduled even in the absence of a functioning proxy injector So to enable h.a. mode you can pass the values dash h.a. yaml Also, the linkery viz extension supports enabling an h.a. mode to with similar characteristics to the control plane one Finally, this one is super straightforward The linkery control pane has its own data plane proxies and should not be injected to avoid this You can annotate the namespace to decide disable injection Okay, so to make sure we were successful managing Linkery for the long term the most important step here was making sure that we had set up automatic rotation for the web hook TLS credentials We already had automated the control plane TLS credentials using search manager So we base this task on what we had done previously Next I needed to make sure that everyone else knew how our third management strategy worked including the duration of each certificate and also I Documented how to debug linkery with links to the official documentation for more in depth knowledge For monitoring and alerting we were combining the usage of data dog along with the linkery viz extension But we were looking forward into improving this although it was not a priority at the moment Okay Now I want to focus on the main bumps in the road that we transformed in stepping stones along our way as Bruce Lee says So the first one would it be figuring out the issue with the crown jobs Weirdly our test authorization policy setup was not working and search management Which for me it is the most important one mainly due to the consequences one of them being if you fail to rotate them in time Certificates expire and you will have done downtime, which is not good Crown jobs so this was a minor pebble we found in our way and it is not a linker the issue by itself It is more of a Kubernetes issue So what was it after we started meshing namespaces we noticed that the crown jobs We had were not being removed after running the reason is that jobs are supposed to terminate do their work and exit But if we have a sidecar container like the linkery proxy that continues to run The pod will never exit and the job will never finish So you will get an endless list of pods still running which uses up resources and spikes ups cost So how we worked around it after searching in linker D slack We found out that there are several options to consider But at that point we went for the more straightforward one Which was to add an annotation to disable the meshing of said resources On a hopeful note while I was preparing the talk a friend sent me an article that details a more definitive Solution is on the way and it might be available in a future Kubernetes version So if you're curious you can go check out the corresponding Kubernetes enhancement proposal This one is short but not sweet. It was a slippery stepping stone and a bit embarrassing We spent a lot of time figuring out why our test authorization policy setup was not working with the first request Succeeding and then all the rest failing so super weird Whether you do it imperatively with Qtl like in this example or declaratively with a manifest You need to set the container port for them to work specifically for the server and voila they work after that Okay certificate manager So there is another set of certificates to manage that was me when I realized while going through buoyance Linkery production runbook that besides the control plane TLS credentials There is another completely separate trust chain Used by linker D's control plane components named webhooks. So what does this mean? Well, basically linker D uses different sets of TLS credentials for securing communication The first ones I encountered and got familiar with were the control plane TLS credentials And those are used to secure pot-to-pot communication making possible identity management for mutual TLS Now the webhook TLS credentials secure communication between the Kubernetes API server and the linker D control plane components called webhooks As well as linker D this and Yeager webhooks You can see here the two trust chains and its components and next I will explain in a simple way Hopefully how each of them works? So let's zoom in the proxy containers here in linker D meshed workloads application containers communicate through the linker D proxies So this proxies used the sign certificates to establish the cure communication authenticating each other and encrypting the data in transit Now it's the turn of the webhooks TLS credentials So we have on the left the Kubernetes API server Which as we know is a central component that manages the overall state of the Kubernetes cluster and on the right our Linker D control plane which has different webhooks like the proxy injector the service profile and policy validators So in this example, we're gonna focus on the proxy injector Which is a mutating admission webhook that automatically injects the linker D proxy into Kubernetes workloads When the corresponding annotation exists in their manifest So the communication process involves the API server sending an admission request like in the example saying hey I am going to create a pod to the proxy injector which evaluates it and potentially modifies or mutates the workload and Sends back to the API server for deployment saying cool Make sure you add these two containers to the proxy need and the linker D proxy So now that we have clarity around why we need certificates and what they are securing We can have a look at how we automated their issuance renewal and rotation The central piece of this puzzle is cert manager and we already had it running in our cluster So we integrated it to bootstrap a custom certificate chain of trust So that we could make the linker D installation process more compatible with a get-ups style deployment For example, I'm referring to issuance I mean that when you install linker D via helm or customize you need to pass Certificates along for it to work and we wanted to avoid having to manually generate a certificate to pass it via arguments We just wanted a more declarative approach. So to have the certificate in the cluster and use it So besides the creating besides creating the resources, you can set their duration in the manifests Which depending on your security posture, you could set different values on each environment I am leaving you here also the workshop and the repository used to accomplish this task And again, I will proceed to explain with a diagram So by using cert manager and the trust manager plugin You can automate the management of the entire certificate life cycle in the cert manager name space It will create the linker D trust anchor, which is the root certificate authority for linker D Now using the trust manager plugin We can distribute only the trust anchors public key to the linker D name space Leaving the private key secure and constrained to the cert manager name space Now going over to the linker D namespace the identity service acts as an intermediate CA Holding the linker D identity issuer certificate, which in turn has also been signed by the trust anchor Thanks to the public key. We made available and Finally every 24 hours each linker D proxy Sends a certificate signing request to the identity service getting back a signed certificate So that they can authenticate themselves and participate in mutual TLS cool, right So, okay, that was a lot to cover just wanted to leave you with some final thoughts The main one being that linker D is made by super smart humans who make it so simple for us mortals to run a service mesh And not die trying Sorry about that. Just needed to say it now in a more serious note The adventure of getting linker D into production was super fun I learned a lot and had the luxury to invest a lot of time learning about how it worked Having a great manager in that sense is truly a blessing someone that trusts you to deliver and to have your own process of learning So if you're a manager here Give your peeps some space to explore learn and take their time I am sure they will appreciate it and deliver great work So I hope you're inspired now to try linker D in your own cluster or even locally and that you can approach it Even better than I did just by being aware of some coaches and tackling certain challenges like certain management Early on so remember to add yourselves to the adopters list as you Have linker D in production. It felt amazing and last but not least a huge Thank you to Brandon Gabby Megan clean and Catherine that helped me get here and a special mention to Marito Who is my very first mentor is sitting in the front row? Here I leave you a lot of resources that I used to get these to work You will get the slides too so you can just go through each one of the links And yeah, thank you for bearing with me for almost half an hour. You're amazing Once again, we do have a few minutes for questions. Anybody have anything? Yep, right there Hi, not really a question. I was recently asked to look into why jobs and cron jobs were hanging to In the mesh and they didn't allow me to remove them from the mesh like you did which would have been preferable But there's this thing called linker the await Just give you feedback it works. You can fix it with it Sorry, I did not get the question that well Yeah, it wasn't a question. I was just elaborating on the issue you had with jobs and cron jobs Okay, yeah, just saying if you don't want to remove them from the mesh There's this thing called linker the await Right, you can wrap your job with it. Yeah, that would be a great option. And yeah, thank you for Anyone else