 It's great to be here virtually at Cloud Native DevEx Day, especially during this unprecedented times. Myself and Pat are here to tell you about how our team have gone from villains to heroes and how we managed to improve our developers experience and therefore make our devs happy. So, who am I? My name is Anna Kaleen and I'm a software engineer at Interflux Data and I've been working there, working on maintaining our platform built on top of Kubernetes and Cloud and also making some of our other software engineers happy-ish. Before we talk more about our superhero-ish capabilities, let's meet my friend, my safe person and my manager, Pat. Oh, thank you, Anna. All right, hi, I'm Pat Kahn. I work with Anna at Infflux Data. I'm an engineering manager. I'm the engineering manager of our deployments team. So, give me the next one. So, what we're gonna tell you about today, we're gonna tell you a little bit about our story, we're gonna tell you about the company, we're gonna tell you briefly about what we do, and it's gonna describe our users. We're gonna talk to you about what we tried to do from a development environment perspective and then where we've ended up. So, next one. So, first, before we do that, yeah. So, first, Infflux Data. We are a company that, we're the company behind the time series database called Infflux DB. So, about a couple of years ago, we started a SaaS offering. And so, since data has gravity, people care where it is. We're doing this multi-cloud. So, across Amazon, Google, Azure, multi-region. And so, we're all told we have about, you counted it, Anna. I think it's 17, but I'm gonna say approximately 15 environments running across the clouds and the regions. And they're all, each of those are running on top of a Kubernetes cluster. And so, all told within each of those Kubernetes clusters within each of those environments, we're running 30 plus services. All these services together make the database and how you interact with that data coming in and going out. So, very quickly. So, what our deployments team does is we are in charge of the whole development and delivery pipeline. So, our team runs our whole CI CD pipeline and we continuously deliver. So, an engineer writes the code, commits it and it enters CI. Once it completes that, if it's successful, it goes into our deployment pipeline and it's across all three of our staging clusters, if that's successful, it's automatically promoted to internal production. And then finally, it's then promoted to external production where then it's available to all of our customers. To make that happen, we needed to provide an environment for our development team to, as the writing code, have a development as much as they can have a production-like experience. And so, I wanted to quickly, before I hand it over to Anna, I wanted to mention who our team's customers are. Our team's customers, primarily, are the development teams. So, I'm gonna let Anna take it from here. So, now that you do have an idea of what our team does, we have pictured here two personas to represent our different engineers that we serve. So, they are somewhat similar with a few exceptions. Our front-end engineer that sometimes is called a UI developer expects an instant feedback loop, expects realistic, product-like environment, and doesn't need to be concerned with the Kubernetes details and it has a main focus on the UI. On the other hand, we have the back-end engineer who expects a fast-dash feedback loop. So, like, you know, slightly... A minute, a minute, Anna. That's a wait for a build, maybe. Exactly, yeah. They also want a realistic, product-like environment, and whether they like it or not, sometimes they have to care about Kubernetes. They also need some service transparency because they usually work on more than one service, so they need some sort of monitoring for that. And they tend to deal with the widespread net of issues. And we do have, in both of our front-end and back-end teams, engineers that are very good with Kubernetes, but most of them don't like to touch it. Okay, so what is the solution that we came up with? In turn 20, we came up with an approach that looked a bit like this. So on each one of... Using Kubernetes in Docker and Docker Desktop, our team created a set of make targets that our engineers could use to deploy a slimmed-down local Kubernetes cluster that contained our application. As part of the cluster, we had the same setup that we simply referred to as kind. So on an engineer machine, you would have kind plus Docker, then you'd have the app deploying its own namespace on a kind node and plus other namespaces used for the maintenance of the whole cluster. And this was somewhat functional, but we had a few issues with it. But before we go into those issues, Pat, would you like to play a game with me? You know I'm always in for a game. Okay, so then let's do this guessing game. I'm gonna play to you to sound clips. And I want you, don't cover your eyes, it's okay. I want you to listen to them and try to guess what they're of. And there might be a clue in the picture, but maybe not. Technical difficulties, that's all right. Okay, one. Something bad was about to happen. Okay, so what do you think these sounds are of? Airplanes, the first one's like a propeller plane that's about to go down. The other one is like, I'm sitting on one of the big planes headed across the ocean. And I've got the window seat right over the engine and I'm trying to sleep and I hear that sound. It's terrible. Airplane. Both of them. Very specific. Yes, very specific. Especially the second one. So you're right. The first one is some sort of propeller plane. But the second one is actually a computer overheating. And... I should know. It doesn't sound too dissimilar to how our MacBooks would overheat when running kind and docker. So let's talk about chance. So true. I should know this. So challenges. So give it a click. And some of the challenges we had with kind will of course it jumps through. That's fine, we'll keep going. So people didn't have necessarily resources, consistent resources to get it dedicated to kind. And so like they try to use it, the machine starts sending like an airplane and they can't work on code while they're running kind. Sounds like it's dying. Sometimes people install software from different places. And then we're also supporting several different operating systems, operating system versions or distro preferences, all these sorts of things. Give me another click. Give me another click. And then, you know, people don't know if it's them or it's kind of, it felt very brittle, very fragile. They'd go to use it one day, it would work that day. They'd use it the next day. It wouldn't. It felt hard. It felt really painful. But so I think that we all at the time were kind of left at the question of why is this so hard? Like it just felt hard. So yeah, just going back. So our developers didn't change, but we realized we needed to change something like what we'd offered just didn't fit the demands and what they needed from us, still the same. Nothing changed here. So these are still our developers still wanting something better. So what we came up with is remote goal. And that comes from the combination of remote and local. And Anna is gonna tell you a little bit more why it's not just fully remote. So as much as we can, we're pushing things off to a remote cluster, getting things off of people's system. So I'll let you take it from here, Anna. Okay. So we came up with a dedicated GKE cluster in which by running a couple of CLI commands, a developer can spin up their own namespace containing a slimmed down version of our application and all of its component. So what that meant is that none of these developers had to run a docker for desktop and any more kind. Okay. So the way the cluster looks is quite similar to how it looked locally with the exception that you have many different namespaces. But let's go to our workflow. So let's say you are a Golang developer on our backend team and you want to apply a change to a backend service. So after deploying the app in your dedicated namespace using make deploy remocal, what you need to do is simply run this one command here which is make remocal dev and then you pass the name of your service, any backend service as a variable to app. And what that does is garden would sync the source code from the backend repository to the remote deployment or stateful set that matches your, let's say you are the person whose namespace is the one namespace. That's triggering a build of a new binary and restarting the binary on any change. So what that means is that now you can continue interacting with your repository so that's on your local machine and every time new code is built, a pod will restart and that will contain all of the changes that you made. So Anna, before you click, I wanted to add one thing. So this is very similar to what we're doing in CI. The only difference is we don't have that garden system namespace because we're spinning at the clusters fresh each time versus a developer making changes and we're wanting to update their service within the environment. So, okay, now you click. Well, before I click, I also wanna say that in CI, we're also using kind. So when we moved, we moved both the CI and the developers while using. Yeah, we stopped using, we had been using kind before. So we moved away from kind for both CI and developers. You're right, yes. So if you're a front-end developer, your flow is a bit different. So you still would run, make, deploy, remote call to spin up your namespace and then when you start developing the UI locally against Kubernetes, you are running a web back and all of your normal DevTools is normal, but your code is not running in Kubernetes. So your code lives on your local machine and is talking directly using telepresence which acts as a reverse proxy that sends all requests to the web back web back Dev server on your computer. So when your browser talks to your instance of the app, requests for UI are sent to the Kubernetes cluster and back to your computer and all of these processes are likely fast enough to make the development a comfortable experience. Making those front-end devs get their instantaneous development environment sort of thing. Yeah, yeah. The other thing I wanna say is, yeah, for front-end developer on your local machine, you have a telepresence reverse proxy and a web back Dev server in addition to all of the other things that you need as a back-end developer and the commands to run this would be very similar to the other command except that your point is a front-end service app rather than a back-end. Cool. So we mentioned in our talk-up stack that we're gonna talk about the four development experience pillars and how we managed to add those into our solution. First of all, the first pillar for good developer experience is function. And really it's what is settled on the team, what we promised we delivered. We asked for feedback early and often during the development process and there are things that we kept and there are things that changed completely. The next developer experience pillar that you have to keep in mind when providing developer experience is charity. And what that meant for us is that it took the guessing away from whenever a developer will come to us to say, hey, this is not working, whether it was a user error or whether it was an actual concern. So by virtue of our engineers having their kind set up locally, what that meant for me and my teammates was that the troubleshooting with someone could take up to a day because you'd go through things like, okay, so is it working for anyone else? Can I recreate the issue locally on my machine? If not, can I go with the person that has an issue? Can I go to every single one of their steps? There are so many different components in play that it just wasn't a great experience. It did not give us enough transparency to be able to troubleshoot properly. Whereas now, if any engineer comes to us to say something's not working, all we have to ask is, okay, what's your namespace? And myself or my colleagues can just log into the GKE cluster into that namespace and see what the issue could be. Next pillar is stability. And I think Pat knows a bit more about this. I mean, I know things too. Yeah, oh, yes, you do, you do. So I think this is like, to me, this feels like kind of a journey when it comes to software in the cloud. But we, so like through the development process, like as we were working on this, like as we like move from kind to removal for like CI, we were running the new way in parallel to the old way. And so like the whole time we were getting feedback, like are we getting good results? Are things looking good? Launching hundreds of instances of it, making sure, launching hundreds of namespaces, sorry, not instances, to make sure that like kind of like working out the kink, seeing if we can make it work better, seeing if we can work faster. We also are monitoring it. So we're keeping track of like long running namespaces, seeing what's going on memory CPU utilization and all of that. So. So as we started with this new solution, we did have some challenges, mostly all of the software that you use, specifically Garden IEO and Telepresence did what we wanted, but there were some features missing. So to give you an example with Garden, so Garden doesn't support out of the box, Jsonet, but we would dynamically generate YAML using our Jsonet code, and then Garden would manage that YAML code. So we managed to find the solution around that. The other issue with Garden was that out of the box when we started using it, it was only able to manage deployment or demo set resources. And some of our services were managed by CRDs, Custom Resource Definitions. So a way that Garden has worked with us to address this was by adding a feature that lets you specify a pod selector instead of the resource type, so that Garden can manage a resource by pod selector. We also had, as any experience, there was a hill to climb and we didn't have anyone who had specific expertise with Telepresence in our team, so we all, we had to learn as we went. And then there were some early robustness issues with Telepresence. Telegraph is one of our little components part of the product. So if someone tried a new version of Telepresence in the whole, in the, using the removal cluster, that would bring down the whole cluster for everyone. So that was quite the experience, but that's no longer an issue now. Can I take it over from here? Okay, cool. All right. And so I think the next one, we didn't give ourselves a check mark on because like it's not fully gone. We haven't, we have not completely gotten rid of, like we still have a local development environment, like a local developer system surface area. We've greatly reduced it by moving most things to the remote cluster where we still have Telepresence running locally. And so, we had to do some stuff in the code to make sure that the agent and the Telepresence were always on the same version, but we got through that, so. So, I just wanted to, when we launched Remocal, it was actually a different experience than we had kind. When we lost kind, we like had to kind of convince users that they had to use it. There was none of that here. You know what I'm talking about, Anna? They got on it. And so, right away, we were getting a lot of like, okay, I don't, you decide if it's a lot. I felt like, oh, it's like, it was really, I was really proud of the people I was working with when we got all this feedback. I mean, just like really good positive feedback. Did it stay that way? It was probably best. We did name the presentation Happy-ish because this was a good day and not all the rest of them are. But it's been, I would say overall, the feedback's been really good. We've had some challenges and we're gonna share some of the things we're still working on in the next slide. I just wanna mention on this path that I do specifically remember that a couple of our developers were using Remocal before it was fully available to them. Yeah, they were liking it and they were telling their friends. You know it's good when they're telling their friends. Okay, so some of the things were, oh, it went one too far, but that's okay. So some of the things we're still working on is, like I mentioned Happy-ish, we had a situation where we made a change. We didn't think anyone cared. We like got rid of a service to try to reduce memory utilization. Turns out people were using it. So we need to do better communicating. We're working on being better at communicating, especially about things we don't think anyone cares about, but they do and also messaging it in the right way. So they know like why they might maybe care. And then we're also always continuing to work on improving the stability. I think we've come a long way, but it's something we watch, we're measuring, we're keeping track of. We're also, you know, users find bugs. We're prioritizing that and making sure that those like, it's the top of our sprint load is to address feedback. And then we're trying to add features just to keep our engineers happy-ish. So specifically we're a database company. So we're currently adding a feature that lets them preload data and having kind of a data catalog so they can load different types of data based on what they're trying to get done. So next, finally, Anna mentioned that it was something she was doing with her colleagues. Opiemi and Wojciech are two of the like developers that did a lot of the heavy lifting on this project. And I can't say, I mean, the whole team, it's been a whole team effort, but I felt like we couldn't, neither of us felt we could get through this presentation without specifically thanking these two. So thank you. They deserve special thanks for the best project. Yes, and that brings us to the end. So thank you so much. This has been an adventure, Anna. Sorry we weren't able to be there. Thank you and we'll take some questions.