 All right. Hello, everyone. Thanks for coming. Before I begin, did anyone lose a large roll of money wrapped in a rubber band? Because I found the rubber band. Terrible joke. As bad as the first time Dr. Octavius told it. Anyways, this talk is about how we optimize developer productivity with telepresence. Some black magic made by folks there at Ambassador Labs. My name is Frank Gu. So first of all, a little bit about myself. I have about 10 years of full stack experience. By full stack, I started from the printed circuit boards as an electrical engineer, worked my way through up the stack with embedded systems, networking, et cetera, et cetera, software development, and finally I made my way to the cloud. And that's where I found my calling. Today I worked in engineering management and I run a team. One fact about myself, I'm a licensed pilot. And my role today is the director of engineering at voice flow. We are a post series A startup that specializes in building an IDE for building chat bots involving gen AI technology. As a startup, one of the key characteristics is that I have to play many roles. So I'm also a hack collector. I've played software developer, DevOps, business relations, as well as various fundraising activities and helping out with marketing. But the one area that interests me the most is the one related to developer productivity. Any improvements we make in developer productivity will have multiplicative effects on the efficacy of my team. So naturally this topic interests me greatly. How can I augment the productivity of our team? Who is this talk for? Well, this talk is mainly for people who just interact with Kubernetes and want to develop with very fast iteration cycles. I won't get into super deep tech, but if you want more details about what telepresence looks like under the hood, feel free to ping me after the presentation. I'm more happy to discuss and dive right in. The agenda today will first go over what the problem that haunts voice flow and some of the dev community around Kubernetes is. Some of the potential solutions that we've tried and figured out painfully that it didn't work. What is telepresence and how do you telepresence into Kubernetes? And there will be a live demo where I can showcase how this would work in practice. And finally, I'll go over some key considerations in deploying this technology into your workloads. So let's start off with the container workflow. Traditionally, we just code, build, reload, make sure it works, hello world, things work great to commit it, and that's probably one iteration. When we started moving into containers, though, we had to add this extra step in the container build, upload, and deploy cycle. If you're lucky and your build cycles are very lightweight, this might cost four minutes, it's an additional tax, but in voice flows case, we had a bunch of linting, a lot of source control operations that we need to run. That's about two minutes. If we want to preview a particular branch, the CI CD pipeline has to do a full code rebuild and then container rebuild. That takes about 10 minutes, upload to our container registry, deploy on Kubernetes, reload the deployment another minute before it hits our design reviewers. So altogether, it takes at least 15 minutes per iteration before code is live for our developers to see and for our design reviewers to review and give approval before we can merge into the trunk. Some of the other problems that we've encountered, I want to see if you guys can resonate with this. First of all, but this worked on my machine. Of course it did. It worked on all of our machines until we have to deploy this into production. Can you send me that config file or secret? Well, if you do that, our security guys will be very upset with you over Slack. So these are problems related to dev prod parity that are still somewhat difficult to solve. Can you give me a demo of your PR? Of course, designers want to make sure that we're not releasing code that doesn't meet quality standards, but I'm waiting for the CI pipeline to deploy my changes and as discussed before, that takes a while. So the change review process becomes slogged down. Finally, for startup folks out here, we might not have every service mocked and my computer just can't run all the services locally and that points to a problem with resource limitation. So these are all the problems that plagued us when we were looking for solutions to optimize our dev cycle. Some of the potential solutions we looked at, I mean, these are not unique problems, right? We thought about using a vendor's code preview solution and there are a lot out there and a lot of successful vendor solutions out there. But for our use case, we use a very custom WebSocket-based transport. We had a lot of different applications that we had to... services that we had to deploy into our environment. So there was insufficient customization. We could run a cluster locally using a solution like Minikube, but others can't easily access the instance and if you're opening up your local computer again to the public internet, that's a little bit sketchy. And at the time, two years ago when we were evaluating the solution, the recommended way was to run a dummy container with a sleep command or something that will keep the container alive for extended periods of time. Use SSHFS to map your local files into the container, SSH into or connect into the container, and then run your workloads there. So it kind of works, but it requires a lot of developer and DevOps coupling and you really need to understand what you're doing before you can really leverage the system effectively. Feels bad, man. So what do we really need at the time? We wanted to run a service locally with a debugger attached and we want the service that is running on my computer to be able to communicate with the other cloud services. This is starting to sound a lot like a VPN and we did try that with OpenVPN to limited success, but more importantly, we wanted to make sure that we could map the cloud configurations into our local service. This includes environment variables, configuration files, secret files, so that I can pretend my local machine is the pod that's running my workload. If only there was a solution, enter telepresence. What is telepresence? At a glance, telepresence is a service that will use the CUBE API to allow connection between your local machine into your Kubernetes cluster. This means that I can start flowing traffic from the Kubernetes cluster services to my local computer and my local computer can connect to other Kubernetes services. We're getting closer to that ideal solution that we were looking for. One of the core features of telepresence is this idea of intercepting traffic. When you establish a telepresence intercept, what's happening is that a traffic agent running on your podside will listen on the same application port as your application that you're trying to debug and all traffic sent to your application will be forwarded to your local computer that is running this intercept. So now we have this ability to forward traffic from the Kubernetes cluster onto our local machine. Not only that, because the traffic agent is running in the pod itself, as we see here, it has full access to environment variables, all the file mounts, secret mounts into your running pod, and we can mount all of these into our local machine and inject environment variables into any running process. The telepresence architecture is quite simple. There is a laptop side configuration and a cluster side configuration. The laptop side runs a daemon that's responsible for handling all the low-level networking, and under the hood, it establishes a lightweight VPN-esque solution running on Layer 3, which means that TCP and UDP connections will be possible. On the cluster side, you have one component you have to deploy called the traffic manager, as you can see here. The traffic manager will be responsible for injecting this traffic agent into whichever workload that you want to intercept and coordinate the traffic between the agent, the manager, and all the connected telepresence daemons running on the laptop side. But wait, there's more. When you connect to the traffic manager, because you've established connectivity between your local computer and your cluster, there's Kubernetes-aware DNS. You can suddenly start to use your svc.cluster.local addresses to access your Kubernetes services on your laptop, which I'll demonstrate. Volume mounts are now possible, and more importantly, environment variables are now injected into the running process on your local machine. So now we can really begin to replicate behavior in the cloud on my local computer. Without further ado, I'll give a very quick and easy demo. Feel free to follow along if you want to. Here is the GitHub repository, franku968 slash kubecon 2023. For this presentation, I've created a very simple toy problem. I have a billing gateway written in Node.js, and the purpose of this service is just to execute billing. I can send a billing request to charge a user that's persisted in a PostgreSQL database demonstrating a TCP connection between the billing gateway service and the database. This billing gateway service has one dependency on a converter service that simply does currency conversion. In this case, Canadian dollars to US dollars. To make this extra spicy, I've written this service in Python. All of this billing gateway service is exposed in a world-accessible ingress, so if you want to start hitting this address in this demo, you'll be able to see the changes that I'll be making live. So for the initial deployment, we have an app container with a config map and a secret mounted onto this application that describes secrets, as well as a secret file containing an API key that will read for simple authentication. On the converter side, we just have a config map that's mounted onto the app container so that we can store some metadata which will demonstrate how to override. Keep in mind, the billing gateway is running on port 8080, converter on 8081. It might get a little confusing in a bit. So first of all, let's make sure that all the services are running correctly. So you can see there are two pods running and there is one container for each service. Can everyone see this clearly? Yes, perfect. Okay, I'll make sure that the services are deployed running on port 7000 and 7000 and one and make sure that the ingress is running correctly as well. And you can see we have a publicly available address at demo.cubecon.development.voiceflow.com. So let's go to this address and see if we can get some metadata. And you can see billing gateway v0.0.1 is running in production environment. Great, this is the cloud configuration. So at this point, we can start to run the first telepresence command, telepresence connect. And what this command does is it establishes a connection between my computer here to the Kubernetes cluster that I have running on EKS. And voila, you can see this context has been connected successfully. At this point, if I go to Postman and try to ping a cluster service local address, you can see that I am able to run this network command on my local computer talking to my cluster without me doing anything else. And that is how you connect to cluster services by just simply running telepresence connect. So let's do something a little bit more interesting here. I'll demonstrate how to generate an intercept and begin to run a debug workflow on my local computer. So the command you want to run is telepresence, intercept, and the billing gateway is the deployment. Any workloads, whether it's demon sets, replica sets, or deployments work in this case in the namespace kubecon 2023. I'm intercepting the app named port so that I can forward the traffic onto port 8080 on my laptop. And here this is something I'll cover in a bit where I am not only injecting the environment variables into my command that I'll run, but also creating an end file that I can load from a different process. And I'm mounting my secrets file as well as any other configuration files on this pod on a local directory. Before we even do that, let me make sure that this directory exists and we will run this command. So now the intercept is doing its magic in the background, and voila, you can see that the intercept now has entered an active state and is intercepting all TCP requests. And if we try to get pods again, you can see that a new deployment has been created with an additional container. And this additional container is the automatically injected traffic agent container that will take care of all the traffic forwarding. So here if I take a look at the environment, you can see that our cloud environment has also been injected into the shell that I'm running. So if I just try to run my Node.js service locally, you can see that now my server is running in a local environment. And if I go back to the ingress, voila, it's now running in, the traffic is going to my local computer and showing up on a world accessible site. Don't believe me, I can go into the code here and see, okay, we have a version string, change this to a kubecon 2023, make sure that our system has reloaded, and if I refresh this page again, you can see the changes are previewed live. So what does this mean now? If we want to run a debug workload, for example, here I can try to get a balance of the user. I send the message, oops, there's an unauthorized, but I already sent the headers of the super secret value. What is causing this? I can go to, oops, I can go to my VS code, find the unauthorized command here, okay, so some sort of authorizer function has problems, set a breakpoint, and I've configured a launch.json in the VS code so that I can step through the code with a debugger. And most crucially, there's an m-file setting here that I have added, which is the .m configuration from before. With this setting, now I can go to my application and begin running it. Oops, I have to stop the other process. So now the debugger is attached. If we try to run this workload again, you can see that, okay, we hit the breakpoint, and it seems like we have misnamed the headers. So I'll change this into authorization and then save it, reload it, and if we try to do this again, boom, we get some sort of user balance that's $1,000. And that is how you can run debug workloads with telepresence using one intercept. But as I mentioned earlier, we have two different services, a building gateway and a converter service. So let's go back to the application diagram. What's really happening behind the scenes when we established an intercept instead of the traffic going directly from the service through port 8080 on the deployment to the app container, the injected telepresence agent begins listening on port 8080, and all traffic goes to a telepresence agent. Note that there is no longer an arrow connecting the telepresence agent to the app because all the communication is being forwarded to my computer. So if nothing is running right now on this endpoint and I try to run the postman request again, there's something else running here, it is not possible to get a response since there's nothing on my computer listening to this request. So now for the final part of this demo, I will present two intercepts. Let's drag this over here. So another way you can run this environment variable injection with all the different configurations mounted onto your application is by simply specifying the command you want to run after the telepresence intercept command. So here I will be intercepting our converter service and here I will start our billing service again. Everything is running and you can see that we are running on port 8081. So what happened now is we've gone from this application diagram to this where again, if we stopped the first intercept on billing gateway, the app container in the cloud will make a request to converter that's on our local machine. But since both intercepts are active, no traffic is actually flowing to your application container in the cloud and all traffic is going from my local computer to the cloud and then back to my local computer to hit this converter service. So let's try to send a post request of charge and the currency is CAD $100 using some exchange rate. I send the request and you can see that we've successfully charged the user balance and it's using this endpoint and we're not using any local endpoints and we charge the user 72 US dollars after conversion with $928 remaining. If we go back to the console, you can see indeed that all the traffic is hitting the services running on my computer with the billing gateway receiving a charge request of CAD $100 and then sending this to my local instance of the converter service running in Python to run the conversion 200 no problem and then returning back to the client. So with that concludes the live demo portion and to exit the services and stop the intercepts so that your service still works in production in the cloud I can simply just terminate these commands and then now we should be back to the production environment nice and easy and I can just run a telepresence quit to fully disconnect my computer from the Kubernetes cluster. Oh and did I forget to mention this whole time I was running this in a VPN so telepresence is fully VPN compatible with certain exceptions but it works quite well in all sorts of networking environments. So what I've demonstrated is the ability for you to very quickly preview your changes on your local machine and have your application exposed to some publicly accessible URL for other people to review and what has this done for us? Well it's tightened down our dev loop down to just 10 seconds. We frequently have our designers and our developers sitting on a live call where the designers might suggest certain changes in the UX and the behavior and mind you our developers might be somewhere in Europe whereas our designers are in North America and because we have a publicly accessible URL we can make these changes live on the call and iterate very quickly so we've shortened this dev loop from 15 minutes down to just 10 seconds. This might sound really nice and it is a really fascinating and powerful tool to use but with great power comes great responsibility so there are some key considerations that I'll outline for everyone here just in case you want to deploy this into your own workloads. The first one is the deployment model for telepresence. It requires a cluster side component which is the traffic manager. You need to ensure that you have sufficient privileges to actually deploy this component and more crucially the traffic manager has additional Kubernetes requirements in terms of the RBAC and permissioning in order for it to successfully inject the traffic agent. As for scale, I'm happy to announce that we've unintentionally helped stress test the absolute limits of telepresence and currently we run telepresence on a multi-tenant cluster with all of our developers connecting to it and we have about 15 to 20 environments running at any given time with one to three intercepts running in every single environment. With so many intercepts and so much traffic flowing through the traffic manager only consumes half a core CPU and one gigabyte-ish of RAM so the overhead is pretty negligible in the grand scheme of things. The security model of telepresence is quite straightforward because it uses the CUBE APIs under the hood whatever trust model you have with your cluster is the same when you use telepresence. The one caveat here is that you should definitely stay away from using telepresence in production workloads because whoever has telepresence access to your cluster you should assume that they have access to all of your secrets environment variables, configurations, whatnot. In terms of robustness and support telepresence is maintained by the Ambassador Labs community the open source community behind telepresence is also very vibrant. There's a Slack channel and about 10 to 15 commits per week into the trunk. Additionally, there's a paid tier from Ambassador Labs if you choose to need more support. And the last consideration is that of latency. So latency was a big problem for us especially if you have many services that have many layers. So if you have service A, calling service B, C, D then every single time your local computer makes a call to the downstream service you incur that round-trip latency of your computer to the cluster and back to your computer. We at Voiceflow got around this issue by simply doing some DNS tricks on the local host by modifying the ETC hosts file and there are also other ways such as configuring your environment variables to override whatever the injected environment variables are for your particular service discovery mechanism and simply just if you're intercepting more than one service have all these configurations point from one local service to another local service on your computer without going into the cloud. And with that, I conclude this presentation. In conclusion, dev prod parity and resource limitations plus deployment and review velocity are major concerns in both the Kubernetes development workflow and also workflows for Voiceflow. Telepresence is a tool that proxies traffic and configurations between your local machine and the Kubernetes cluster. And by using telepresence we've managed to reduce the build upload deploy overhead and greatly enhance our developer experience by shortening each iteration cycle from around 15 minutes down to just 10 seconds. Thank you very much for attending the session. I'll open up the floor for any questions at this time. So the session was super interesting. A question about how you deal with multi-tenancy. You have two, three developers working on the same project and both of them will try to intercept the same deployment. Yeah, so at Voiceflow we deploy namespaces. We have one namespace for one developer, one namespace for one particular workload you want to preview. So if you have multiple developers they should be intercepting their own services. There is one exception where you might have one really large feature that requires, say, a full-stack team and a fully backend team working together, in which case when they want to come together and preview something they are still intercepting different deployments but in the same namespace. So that's how a Voiceflow got around the multi-tenancy issue. But just remember, when you intercept you're only intercepting a deployment workload. So you can go pretty crazy with that. Thanks for clarifying. Do you know if telepresence works well with service meshes? I believe they do. We currently don't run any service mesh in our development environment with telepresence so I have not been able to test that. But I do see that there was some work in the commit log regarding linker D service meshes. Thank you. Hi, thanks for the presentation. It was pretty good. Two questions. One was how do you deal with the application configuration files? I think in the demo there was an environment file or something that was referenced. But does the developer handcraft that? Or is it something... Like I asked this question in the context of like sometimes you have backend services that you have provision and there are parameters for that that you would want to work with. Yeah, sure. So the file that you saw that got mounted, which is this one, right? TMP billing gateway slash env. This is automatically generated by telepresence. So all the environment variables in the pod... Let me see if I can just show this. All the environment variables that the pod sees gets written into this dot env file and then it gets created in your local file system. But sometimes I mention in the latency consideration part you might want to override these environment variables. In that case, what we can do is use a dot env file in your runtime and basically I run this code... I run this code with the dot env package with override mode true. So if it sees this config file, I can override whatever the environment is. Yeah, makes sense. Thank you. The second question was how does this approach compare with something like scaffolders if you have tried that, like where the idea is that you instead lively load your code in a remote cluster? Like, have you looked at something like that? I'm not familiar with scaffold in particular, but one of the... It sounds really similar to kind of the SSHFS solution I mentioned earlier. I believe the solution has more dev property because you're literally not reloading code. You're just running the application exactly as you would in a container, just minus the container itself, and you're not trying to reload code remotely, but rather locally. So this has the added benefit, too, that you can run your debugger and attach to it on your local machine instead of doing some sort of remote debugging session. Okay, thanks. Hi, thanks. Question on... So you talked about having a config map and a secret that you can then kind of pull that information locally. What about file mounts and things like that? Are those also shared? Ah, yes. I did not fully showcase the file mount portion, but if you take a look at the command earlier here, there is a file mount. So if we see the pod here, the pod has a secret called apik.txt mounted as a file mount on the pod. So what that does is that when we specify a mount point on telepresence intercept, the mount point is seen as like the container root. So if I actually go to this directory, you can... Sorry, the intercept is no longer active, but you'll actually see the root file system and another TMP directory, which is why you have to do some soft linking to make sure that the directories match up between your local and cloud environment. But file mounts are fully supported, which is one of the key reasons why we chose telepresence, because we don't have to do any sort of extraneous file mapping or SSHFS on top of that. Awesome, thanks a lot. Hi, thanks for the talk. In terms of access for telepresence from your client laptop out to the Kubernetes server, so A, network-wise, you need access just to the API server, right? Correct. And B, in terms of like Kubernetes authorization, are you authenticated to Kubernetes just as you're kind of named account in Kubernetes? Is that what telepresence is using? I am actually not super sure about that. Underneath the hood, basically when you try to connect to the Kubernetes traffic manager, you're just creating almost like a port-forward-esque connection to your Kubernetes environment, and then from there on, the network traffic gets taken over by the telepresence daemon. So I'm not too sure about what account that is. The Ambassador Labs team is here. That might be a better question for them. So maybe it is a better question for them. So telepresence connect is is it talking to Kubernetes or is it talking to the telepresence daemon in Kubernetes? It's talking to both the traffic manager and the Kubernetes API from the telepresence daemon on your laptop. Okay, thank you. Quick question on the security aspect. One thing is that, so your laptop, is that exposed to internet? Is that how the communication happening from the telepresence running in the cluster to the instance running in your local? Not necessarily. So your computer needs internet access or not necessarily internet access, but your computer needs to be able to connect to your Kubernetes cluster. The reason why you can see the demo URL, the publicly available URL and access it is because I've exposed that as an ingress. So without an ingress resource, technically speaking, all the communications is just between your laptop and the Kubernetes cluster itself. I understood about the ingress, but the telepresence which you started in your local machine, is that connecting to the telepresence in the cloud in an outbound way or I'm speaking from a security because if you want to take this in our request, they will say, are you opening up connectivity from internet to your machine or you are making a tunnel to one-way tunnel? Yeah, so the telepresence daemon on the laptop side is making that tunnel connection to the telepresence traffic manager in the cluster. And then any command you are issuing, you are basically sending that and then that is responding in that same channel back instead of connecting to your local machine. Correct, it creates a tunnel connection basically forwarding TCP and UDP packets. Perfect, thank you. Thanks for the talk, really cool. Quick question, so you mentioned don't use this in production, so could you speak a little bit more about the actual cluster environment that you are intersecting? Is it like a QA dev environment? You mentioned devs have each have a namespace. Yeah, that's correct. So at Voiceflow, we have a development cluster that has all of the developers and their own developer environments in a namespace manner. And the key consideration here, as I mentioned, it's mainly because when you run an intercept or telepresence connect, you are opening up your computer's connection to your Kubernetes cluster and on top of that, if you try to intercept a pod, all the environment variables, so like config maps, secrets, all of that will be forwarded to your local machine. So if you have sensitive information such as like certificates, API keys, then whoever intercepts this, whether it's a dev or a QA security analyst on your team, they'll be able to see those information, which depending on your company's governance policies could be a big problem. Right, okay, thank you. And one more question. If you're running this in a QA environment and you have multiple apps as you demonstrated, if you just want to intercept one of them and leave the other one running with the QA version that might be pinned to some release compared to your local changes, can you have them talk to each other? Yes, so in my first demo, it was just one intercept running if we take a look at this diagram, right? So when my service tries to hit the converter, Python service, it is not hitting the local version of the converter service, but rather hitting the cluster, like service.cluster.local address of the converter service itself. And then that service, whatever is backing it, whether it's another telepresence agent or pod depends on your intercept configuration. So if you only have one intercept running, then you will be hitting the pinned version in the cloud with your local service. But you can also do multiple intercepts should you wish. Got it, okay, thanks. That wasn't immediately clear. So I guess that means you do not want to mix your local version with an environment that's actually important because you could break that dependency, right? That is also a really good consideration. Yes, thank you. Great, thank you so much for attending the talk. If you have any feedback, please let me know. Otherwise, have a great evening.