 Okay, let's get started. I'd like to welcome everyone for joining us for today's CNCF webinar fmurl.run a full application environment for every PR before you merge to master. I'm Jerry Fallon and I will be moderating today's webinar. We'd like to welcome our presenters, Vishil Bihani, CTO at InfoCloud and Jono Spiro, staff software engineer of engineering operations at OpenGov. Just a few housekeeping items before we get started. During the webinar, you are not able to talk as an attendee. There is a Q&A box at the bottom of your screen, so please feel free to drop your questions in there and we'll get to as many as we can at the end. This is an official webinar of the CNCF and as such is subject to the CNCF Code of Conduct. Please do not add anything to the chat or questions that are in violation of the Code of Conduct. Please be respectful of your fellow participants and presenters. Please also note that the recording and slides will be posted later today to the CNCF webinar page at cncf.io slash webinars. And with that, I will hand it over to Vishal and Jono for today's presentation. Hey everyone. By the way, I just wanted to add a small change to our tagline. It's now before you merge to main. We're no longer going with master. Cool. All right. Welcome everyone to the talk and demo on fmurl.run. So quick introductions. My name is Vishal. I am CTO and founder of Inferior Technologies, mostly interested in Kubernetes serverless and making Kubernetes easier for developers and enterprises. And I'm Jono. I work on the engineering operations team at OpenGov and my focus for the last few years has been on developer experience, productivity, tooling, and automation. So let's take a look at the agenda. I'll give you a moment to review it. There will be plenty of time for questions at the end. All right. So who needs an ephemeral environment? An ephemeral environment is one which is brought up as quickly as it can be destroyed and their purpose built for development and testing. They're usually miniature versions of integration or production. As a developer or quality engineer, you want to be able to spin up a whole environment on Kubernetes. Run all your tests and be sure your code is golden before you merge to the main branch or go to production, ideally at least. As the operations team, you want to be able to provide your organization with cost effective, easy to manage right sized micro environments. There are many use cases for ephemeral environments, including mixed local to remote development and debugging with personal environments, ideally one per branch, integration, regression, automation, and manual testing, early feedback from design and product teams, running your CI tests inside real-ish systems instead of mocks. So let's talk about the development life cycle, and then we'll go over the criteria by which we selected technology to build ephemeral environments. Developers spend most of their lives within the inner loop of development. Make a change, build, deploy, test, repeat. We want this loop to be as fast as possible, spending most of our time iterating, not waiting. Our tests should be as realistic and close to production as possible so that after we commit, merge, and release, we don't have to come back and fix issues that were missed during development. In the inner loop, we use IDEs and text editors, debuggers, unit tests, but there aren't as many solutions to testing our changes as we'd like, and every team and company has different needs. We tend to hack things together that aren't reflective of production, just enough to load localhost8080 and get our work done. For instance, Docker compose, make files, or complex runners in our IDEs. Our support teams of QA, product, design, and our peers excitedly, but nervously wait for us to come up for error in the outer loop, ready to test our new code, get it to production as quickly and defect free as possible. Every defect is another iteration of the inner loop. We want to avoid these contexts, which is as much as possible. And while bugs are inevitable, when we discover them is in our control. The worst case is for a customer to report a defect. So we want to left shift the when as much as possible. We want to fail fast and fail early. At the extreme left, developers often pair up, pair to catch issues as they're created. What we need are production reflective development and testing environments that are fast, easy, and useful as early as possible. Now that we know what we want, how do we pick a technology? Can your company afford to set up an entire cluster for each engineer? That may not be too hard if you can afford it, but it's either very hard and or you absolutely cannot afford it. So you need a simplification. Is the solution convenient? While MiniCube is cheap and somewhat convenient, it has no ability to branch your infrastructure like you wish it could if it worked like it. You can't test new Helm charts without impacting or compromising other work you're doing. You can't reset to head and start over quickly if you mess up. You can't switch branches and easily redeploy the changes. You probably don't even have a powerful enough laptop to run your cluster and your development tools at the same time, let alone a web browser. Does everyone on your team understand Cube CTL? Can they use MiniCube? Maybe your developers do, but probably not well enough to stand up everything. And in any given solution, is your ops team matured and experienced enough to manage ephemeral or shared environments? Probably not. Is the solution reflective of production? Can you certify or release with it? What data is it seated with? Is it repeatable? And can your team bring it up without knowing how it's built? Faking Kubernetes with pure container solutions like Docker compose, swarm or Ansible or Chef isn't really enough. And it's not real enough. And finally, can you share new features with the designer product teams or prospective customers before the code is merged and released? Most, if not all, existing solutions don't stack up well in all of the areas. That's our challenge. So what are our options? Now, since everything in Kubernetes must be notically themed by law, production is a battle group. It's a massively impossibly expensive fleet of services requiring full-time, highly trained operational crews. They're deployed in a protected, secret, or otherwise inconvenient VPN location. And the fleet composition and course are decided months or years ahead and cannot change easily. As the most interesting man in the world once said, I don't always test my code, but when I do, I do it in production. Let's not do that anymore. Existing solutions are sailing vessels, for instance, scaffold, garden, Microsoft draft. They're bespoke and don't tend to share reusable parts or designs with each other, nor your production fleet. They're difficult to build, expensive to maintain, slow to turn, and require extensive training and manual operation to be proficient at running them. I've been on many, and while they share the principles of wind, tiller, and sail, many, if not most of the other design features are unique to each. Where we're going, we want speed boats. Speed boats are generally cheap, at least by comparison, or they're rentable by the hour, so everyone can share a small fleet or have their own. They're fast to deploy, change directions quickly, and require almost no training or skill to operate, if you've played Grand Theft Auto at least. And yet, they operate on the same principles as our battleship, fuel, engine, and propeller. So they're actually reflective of production, unlike, say, a simulation on sailing, running a docker, or a book on how to sail, written by Chef Ansible. And yeah, I grew up bouncing in the round, I grew up bouncing around in the back of this one. It was loud, wet, and I cried every time, but I loved it, or so my dad tells me. A femoral dot run is our speedboat, and Vishal will now share how we designed and built it. Cool. Thank you, Jana. All right, so as we were talking earlier, scaffolding is just one part of the puzzle. It doesn't solve the entire problem, so to speak, and we'll look at how we design and build the solution end to end. So what we do is there are a few components beyond the scaffolding, and I'll explain the inner loop and outer loop and the working of both the components. We'll talk about how do we integrate external systems, like maybe SaaS-based logging platform, or a SaaS-based IAM platform, and even use services like S3 or Outfit 3 effectively. So as an inner loop, what developer does is from his own machine runs a femoral run, or a femoral dev, if he wants a real-time feedback. And that creates an entire namespace along with his changes, along with the rest of the stable services in the real cluster. Now, once it deployed in the cluster, and you get its own small namespace within the cluster, you can actually integrate with things like Outfit 3 or S3 bucket, or an external logging platform, or anything else. The tools we use primarily here, of course, are a scaffold and a helm, but it gives you a good running environment, which is pretty close to production, as it can be. Moving on to the outer loop, which is what we're going to demo today and talk a little more in detail in a few slides and the demo, as soon as the developer is happy with his own changes on his local machine, that person can raise a PR. And once the PR is raised, within that PR, either the developer, or a product manager, or a QAT member, can actually use that PR to deploy those changes with the rest of the stable services. So it's like I change my own microservice, but I also want to use other microservices, which are tagged at a specific version. Now, by doing this, again, you get a pretty realistic environment where you can use and test alongside something like AWS certificate manager, or a logging platform, or an IM platform, which may not be possible on the local machine, or you'll try to market, but it's not still perfect. Now, to enable these things, we do a couple of things. So first thing is the cost control. So for all of the things that are deployed in the ephemeral run environment, we use spot instances. That means we're not spending a fortune on provisioning these instances on demand, or keeping them alive beyond their lifecycle. The second aspect is TTL. So every environment that comes up either comes up with a default TTL policy, or you as a team can override the policy and say, I need this environment for slightly longer duration. That means the environments are cleaned up as they're expiring their TTL policies. The last part is autoscaling. So if you don't have any cluster or any ephemeral environments running, you're not consuming any capacity. And only when you're running an ephemeral environment, that's when only you add the actual capacity. The second aspect, integrating with third-party systems, for example, SumoLogic, or similar services, as I mentioned earlier, you don't want to mock these, and it's not perfect to test without these as well, experience as you can in a real environment. Lastly, want to also monitor your ephemeral environments which are deployed, how they are doing, are the created, deleted, and stuff like that. And we use BotCube for that. So BotCube allows you for your specific teams to get notification of ephemeral environment created, deleted, or being deleted before it gets leaked. We're going to demo. I just want to quickly talk about you understand what we are showing here. So I have a repository called Frontend. Frontend for a SOC shop, and it has a bunch of backend services. For our demo, we won't really change the backend services. We'll only change the Frontend service. As soon as I change the frontend service, it will take off a docker image build. Then I actually tag the P for ephemeral deploy. I tag the PR for ephemeral deploy that sends an event to the actual ephemeral run repository, which in turn talks to the actual ephemeral run platform. Now what it does is it picks up the latest image from the Frontend repository, and then does that with the rest of the backend services to the cluster. And once the actual environment is running, it gives me back the URL on which I can test. I can get some more details. And then eventually, after the TTA, it'll delete the environment. Let me now go back to... So these are the two repos. The Frontend repo is a Frontend service, which you are going to modify and test out today. The ephemeral run repo is the repository in which all of the platform code and the scaffolding and helm and everything is housed so that it can talk to a cluster and orchestrate the end-to-end process. Now, before I actually go and do that, let me show you that right now I'm having only two nodes in my cluster. As you can see, one of them is on-demand, which actually houses the platform or ephemeral run platform. And the second is a spot instance, because I have already one environment running in my cluster. As I add a new environment, you will see one more node getting provision on the fly. All right. So let's go to the Frontend repository. Before I go there, let me show you a previous newly deployed application and how it looks like. So this is a previous change that has been deployed. It's up and running. And as a chain, what I want to do is instead of using this price check algorithm, which is trained with monkeys at the office, I want to use a machine learning algorithm based on blockchain, let's say, the most fanciest of all technologies in the world. So let me go ahead and change that in the code and then we'll see how that rolls out in a real environment. It turned out that it was better for our fundraising to add it. Right. We price check your socks with that's all the change. And that's all we need to raise our next round of funding, I believe. So I'm going to clear a PR out of that. So I propose a change. Now, as soon as I raise PR, we will see the build getting kicked off, which will create a Docker image. Now, before we actually deploy it to the cluster, we will of course need the image because if the image is not ready, we can't really pull the image and deploy it to the cluster. I'm sorry. One of the things that ephemeral.run does is you're going to have more than one service that you're testing. You're going to have maybe dozens of services. So your PR is going to build a subset of them, zero or more. We have to wait for the build. It'll get pushed and it'll get tagged and pushed to our registry. And then ephemeral run is going to check there to see, is there a service build for this PR for this service? If there is, it's going to prioritize that one. And that's what we're going to launch with. That's how we get our changes in. If not, it's going to use one from our baseline. In this case, the main branch or it could be latest. We tend to prefer tags with the name of the branch on them since they're not ambiguous. Great. So the build is finished. And I want to deploy this change with the rest of the services. So what I'm going to do is I'm simply add a label for ephemeral deploy. And our ephemeral run bot should respond to us in a minute or so. So what's happening right now is GitHub actions has picked up our label change and it has dispatched a request to the ephemeral.run repo. You can just think of it as the main or central repo. It doesn't really have any code in it. It has the configuration for all of our services. And that repo is actually going to receive the request and it's going to start putting together our cluster. Well, technically our namespace. The cluster is already running. Right. And as you can see, it has already responded with the status and the link to the actual job which does the deployment. I also want to show you while we are waiting on that, we will look at the nodes. And there are two nodes right now. And there are these many namespaces, one of which belongs to the previous environment, which is running. And we'll watch as the build progresses on the GitHub actions. We will see more parts or rather first namespace gets stated and then over right there. So we can already see that a namespace has been created. And I'm going to watch the parts in that namespace and also nodes at the same time. Still working on that, I believe. That's interesting. You type nine and it's showing eight. Is that? Look at the error from server, not found end-eight and you're typing nine. We can see this also in Slack from BotCube. All right. Show the new environment. Yes. There it is. I can see a new namespace has been created. It's a little concerning when the source of truth is a Slack channel and a bot and not keep CTL, but we'll roll with it. What we're doing is we're putting everything into one namespace. This even works for teams that have their services across multiple namespaces as long as you don't have any overlap. At OpenGov, we actually have lots of different namespaces, but we had no trouble moving everybody into the same one for the purpose of these environments. Yeah. So as we can notice, it first deployed the entire application along with our change front-end service. And initially they went in the pending state and that probably kicked off adding a new node to the cluster. So we had 110 and 252 and then we see 130 getting added. So net we have now three nodes. The additional node is fmrl on the spot instance. And this should be running in probably another 20 seconds or so maximum. And so should our job kind of get finished on the same time. And just to verify the front-end service that we deployed is the latest one. I just want to describe and quickly check it out. We can see the pr9 tagged with the image. Let me go back to the job and see how it is doing. And once the job itself is finished, we should see on this the details of the latest environment that is ready for testing. This is actually the longest it's taken. We got unlucky and needed a new node. Cool. So one thing to note about the comment here, it looks like boilerplate, but there's actually a lot of useful information in it that's customized for your organization. So we have information about bopcube. We have commands to work with keepsetl to connect to your EKS cluster in this case. And we even tell you which images we used to put together your service. In this case, you can see here open gov info cloud. That's our registry. Front-end is the service. PR9 is the tag. Great. Let's see, do we have our latest machine learning blockchain-based algorithm live? There we go. Of course, for the purpose of demo, this was a small change that we did. But in practice, you can imagine as a developer, I want to change my services with the rest of the services, with integrations and some real-time systems. Everything becomes so much more smoother, not just for developers, but also for quality engineers and product managers and designers potentially. Now we can update these changes very easily and relatively much more quickly. Should we add an exclamation to the end of that sentence? It might make it more exciting. Absolutely. Any other buzzwords come to mind? I'm going to add three. Three exclamations for $3 signs. And while it builds the image, yes. So what's going to happen now is we're going to build, we'll push the new tag, ephemeral run, we'll do the exact same thing. Because we are using a scaffolded part under the hood, it's going to notice, oh, there's just a small change to the environment. It'll top it up, it'll update it, it'll redeploy the service, and we won't have to do anything else that we've already done. Really, most of the time waiting in this case is just waiting for GitHub actions to bring up an environment. Yesterday, it did not actually refresh on that page. It'll speed through this. I'll take a little bit of slowness. It is working. This is a live demo, so not so bad. Cool. Right. I removed my label earlier so that I can again label it and deploy again. So I'm going to again attach the label. This is admittedly a little bit ineligible, but this is how we're using it today. Now, if you were going to launch these from your laptop, you would use our command line tools to get it going. You'd be pushing the images to the registry from your laptop manually, and you'd be starting the environment from your laptop. So you would lose a little bit of convenience like BotKube wouldn't be, actually BotKube would still be monitoring the environment, but you wouldn't have GitHub's UI or anything to work with. You would be driving it from your machine. Yeah. This one should take slightly or almost half of the time that it took last time. Now all the environments will still be adhering to your TTL. So even if you run it locally, no matter where you start one from, we're tracking when the namespace is created for the TTL purposes. It is on you to destroy it if you want early or you can just wait and it will cycle out on its own. And if we're lucky, we will actually see one of our services cycle out before the end of this talk. Now even though this is still taking a few minutes, if you think about it as a developer, it's short enough that you may actually still remember what you were working on by the time the environment comes up. In practice, it's about, you know, it'll be like two or three minutes, which is not so bad, especially if you're used to committing code, waiting for CI to build it, deploy it to a shared environment. By that time, you've already forgotten what you were working on and your coffee breaks over. Looks like it has a small turn back to the URL. Let us see how that goes. Huh. Something is broken again. I don't know if it is the image pull policy on the specific talent chart. I don't know maybe. It doesn't pull the latest image. It will be, you know, use this if not present. Well, it is a live demo. Something did have to go wrong. The important part is that we did get it working the first time. Essentially, this is the update workflow. In practice, it really does work. The main reason that we're having issues here is, you know, we've built a demo for you, but we're actually using it for real development in another environment. And that gets a lot more testing than the demo. Now, we can tear this down by issuing one more command to it. We can do the ephemeral destroy label. And within about a minute, the environment will be gone. Now, in this case, it's going to dispatch a destroy command to the ephemeral run repo. And that is going to run a scaffold destroy or delete. Under the hood, it's going to tear down the DNS and all of the other plumbing. And then we'll get a confirmation, both in the PR and in Slackbot that the namespace has been destroyed. One thing to note, for those that are using Helm 2, like us in this case, Cube janitor is cleaning up our namespaces after the TTL expires. In Helm 2, though, not all of the resources for your deployment are in your namespace. Some of them are in the global namespace. So not everything gets cleaned up. When the environment is TTLed out and you don't destroy it intentionally, you have a little bit of trouble bringing it up later. In Helm 3, all the resources are in your namespace. When it's cleaned up, your deployment is cleaned up as well. And Helm is perfectly happy. It's a limitation of Cube janitor and of Helm 2. Great. And the environment has been cleaned up now. And we should have, like before it started, before it started to be demoed. Yeah. And the URL should no longer work either. And we'll talk about how this all went at Open Gov. Cool. That's one of those times that a 404 or a 500 error is exactly what you're looking for. All right. So great. So in 2020, Open Gov engineers started an average of only one legacy environment per week using Chef, despite there being over 100 engineers across teams. Most changes could only be tested post-merge in our integration clusters due to the inherent complexity of our polyglot microservices architecture. They were not containerized. They're limited to only a few of Open Gov's core services. None of our new application suites. They were unreliable. They took our start. And they were in no way representative of production anymore, now that we're on Kubernetes. At $150 per environment per day, and literally thousands of dollars in monthly fixed costs, legacy environments were extremely costly to run both in control plane, compute, and in snapshots. As of the day we launched ephemeral.run at Open Gov, no one ever again started a legacy environment. Now, in contrast with the one environment per week per 100 person ratio before, we saw 50 ephemeral run environments started in the first month by one team alone of only a dozen people or so. Nearly 10x increase in usage overnight, more or less. In contrast with the limited legacy environments, ephemeral run starts many database types with seeded data and migrations and dozens of services comprising most of Open Gov's core platform at this point and our budgeting and planning suite. And they start in minutes, not hours. It's easy to imagine running hundreds of environments each month once our entire organization and all services are on board. In contrast with the thousands of dollars we spent before at $200 fixed control plane costs for EKS and only $15 per environment per day billed by the minute, these are dirt sheep to run compared to the cost of people. They pay for themselves quickly. We haven't even begun to optimize our tooling, not cost nor performance. And yet our metrics are generally about 10x better out of the gate. The greatest value in these environments are that they are representative of production and they can be used for development, debugging, certification and many types of testing before hitting the merge button. Our teams started finding defects almost immediately and fixing them before they even made it to our integration environment. We're planning to expand our usage of ephemeral run for local development, remote debugging, load testing, automated verification in CI, and even testing hotfixes in an ephemeral copy of production. So to give you an idea of where we think this tool will take us, these are the most requested features that we think have the broadest appeal. A generic fork friendly framework gives us a lightweight configuration instead of scaffold. It's powerful but it's for both and it's easy to miss wire. Not everyone needs that flexibility. We want to be able to get you started much more quickly just by simply enumerating your services and how they wire up to each other. A loving and proactive run bot, like GitHub's dependent bot, we want you to interact with ephemeral run using a chatbot instead of labels. This gives a lot richer experience for the user. Suspend and resume for compute. This will help us save costs and avoid the dreaded TTL. We'd like you to be able to scale down your environments and compute indefinitely while retaining data. Dynamic TTLs on cluster resources allow different environments to live for different lengths of time. This is especially important for global teams, handing environments off to each other for testing. Local to remote telepresence connects a locally running service in an IDE to a remote cluster for debugging and development. CI integration could do many things like waiting for CI to finish building your PR and its images before starting your environment. Today, there is a race condition when the environment is starting to get those services into your environment. Or it could let you launch and manage an environment from a CI pipeline like Jenkins. Smarter pod scheduling helps us run the fewest number of nodes required. Scheduling with, for instance, the most requested priority, for example, or other affinity tricks. The default spread behavior of Kubernetes may keep many nodes scaled up for only one environment, increasing the cost for environment by a lot. Ideally, all environments would fit on a single node or as close to a multiple as possible with a high utilization per node. This way, when the environment is brought down, the node can be reaped. BotCube integration and chat ops would let you interact with your environment and the control plane from Team Chat. A centralized control plane UI pulls together all the configuration and monitoring to make things easier for your operations team. Similarly, usage reporting and analytics help make the case for adding more nodes to the cluster as teams adopt a femoral run. Adding those does cost money. We don't like spending money, but we like spending money when analytics say we should. And finally, budgeting policies would help dynamically control a number of simultaneously supported environments and auto scaling groups based on cost spent and budgeted rather than keeping those stuck and resetting them each month. There is something here for everyone. So Vishal, how do people join us? Yeah, thanks, Juno. So we are on GitHub, open girl slash ephemeral.run. We also have a web page ephemeral.run. Beyond ephemeral.run, we can also interact with us at InfectLoud, Vishal Bihani, open girl, join us Bureau, Twitter handles, and also similar GitHub handles as well. And we can discover more stuff of what we are doing as well. And we are looking for great engineers as well. All right. We're ready for Q&A. Okay, thank you both for a wonderful presentation. We have a few questions here. And we have about 17 minutes before the end of the hour. So please, everyone, feel free to drop your questions into the Q&A. First question here. How do you get the dependencies from one microservice to the others? Let's say resources, microservices depends on users microservice as well as external gateway microservices. If they're all declared with different Helm charts and dependencies is not declared in there? So if I understand correctly, whether you have a mono repo or multi repo, your Helm charts are stored somewhere. Your scaffold configuration in this case, and in the future, maybe something simpler, will say where these Helm charts are. What our central repo does, if you're in a multi repo setup, is it will clone all of the repos with your sources. And that's where it'll get its help charts from. And it clones it in a deterministic place so that scaffold can easily find them. Okay. How do you deal with DNS routing? Vishal, do you want to take that? Yeah, so right now we have configured increased controller and then subdomain a subdomain created on the hosted zone. And for every environment we create, like another subdomain, so to speak, that is wired up to the environment creation process. And in this case, we are delegating the entire demo subdomain of the femoral.run to Route 53 in Amazon and updating it using their APIs. Okay. I believe this is in response to the previous question. Will this work on prem solutions such as RKE or VMware PKS? I believe it will depend on, like, can you really auto-skill there? And it technically can work, irrespective of the underlying auto-skilling not being there. But the benefit that you might get might vary. We have not actually tested it in the solutions. We have been using it in EKS up until now. Okay. Okay. Quick heads up and remind us, everyone, please drop questions into the Q&A section and not into the chat. Thank you. We will ensure that your question gets read. Next question. How do you handle failed deployments? For example, a situation where the cluster does not have enough resources and the image can't be pulled, et cetera? Luckily or unluckily enough, we didn't get to demo that today. Okay. That does happen frequently. It usually happens, though, because, you know, you wrote some bad code. It didn't build. It wouldn't start. Health check failed. In that case, we output a different error message with a lot of health debugging tips, things to try. There are sometimes transient errors. This is an automated system. It's not production. You don't have people there. So sometimes all you have to do is remove a thermal deploy and re-add it and just let it try again. We do leave the clusters up, though, right now, so that you can go into it, see which services aren't starting, probe them, figure out why. Worst case, you can destroy the environment and re-create it if it's completely borked. In the future, we'd like to have separate TTL policies for failed environments, because we really want to restore those resources to the cluster as soon as possible so that other people can start environments. If you forget about your PR, the failed environment will be consuming resources for 24 hours or whatever you said. Okay. This is a bit of a loaded question here. How robust and what features are available for pre-deployed dependencies? Can you run any task, i.e. DB seeds, pull-on-v-vars, create, defect at topics, et cetera? What tool are you using for cluster auto-scaling? Are you using AWS service operators or just Docker containers? Yeah, there's a bunch of questions. Do you want to take the first one, Jono, around the seed data? So at OpenGov, we provide the framework, but teams provided shell scripts that we run to do database seeding, migrations. We're not actually creating Kafka topics, but that is actually one of the things that teams would like to be doing during this process. So if you can do it in a shell script, you can do it during startup. Yeah, cluster auto-scaling is done by the open source cluster auto-scaler, and it works for EKS with a few settings and configurations. For AWS service operators, I think we are using a mix, I believe. In some cases, we kind of use Postgres root of RDS, but for things like S3 buckets or R53, we use the real thing and not really mark them. And additionally, we're using, because we're using spot instances, and the instances themselves, the nodes themselves may go away. We have the termination manager watching over everything. How do you handle multi tenacity? Our namespace is appropriately isolated from each other. Okay. We are just separating on namespace. Nothing harder than that right now. We don't really have a use case for it. We assume that this is a development cluster. It's safe to run everything side by side. Everyone's cooperating with each other. This just hasn't come up yet, actually. Do we have any other questions at all? We have about nine minutes left. So if you have any questions, please feel free to drop them into the Q&A box. So when you guys go to the repo today, you will see an active demo. It has all the code. We have Terraform that we use to provision the clusters. We have a helm chart for our base cluster. We have the helm charts for our services. We have all the scaffold configuration files. Everything's wired up for this demo. It's not a framework today. When we need to set this up for a different team, we essentially will copy and paste the whole thing, and then we'll update the scaffold file. We'll update some of the helm charts, things like that. So it's not yet a general purpose framework. That's why that's the number one requested feature that we want to work on next. You want to take them? How do you handle external resources, creation, and deletion? I mean, we create stuff in here using Terraform, and it's not part of our CICD for the application deployment. It is in a separated CICD thing. That's what we call it too, a CICD thing. How to connect a custom deployment with a custom AWS resources? Your own context, not how we would do it. So one of the simplifications in our environments is that we will start stateful resources in the cluster. So, yes, we will start Postgres in the cluster instead of RDS. There's nothing preventing you from pre-creating a bunch of them and wiring up each environment to one of them. But we found that there's usually an analog for everything. Even S3, you can start up an S3 service. It's stateful, and when it goes away, your bucket is gone. We are using Terraform. However, we don't have Terraform built into this workflow, although I don't see any reason that we couldn't do it. What happens when spotted since it goes down? Can we get a node back up without manual intervention? You want to take that, Vishal? I don't think I have tried that, but I believe when it's scheduled on some other node or a new node will be provisioned on the fly, it could be a spot or if spot is not available, it could probably go on demand. But I think the way we have configured it mostly usually goes on spot. I was going to say on this. One of the things that's really challenging with Amazon is there's actually no way to test your spot instance termination. We've talked to them about it. I don't know how somebody developed the spot instance termination manager since you can't test Amazon ripping it away from you, but we do rely on it that it works, and we rely on Kubernetes to bring up the new node and then to move the workload over relatively quickly. Do we have anyone else who have questions? We have about five minutes left, so please feel free to drop them in and we'll get to as many as we can. I think it's a good sign that we don't have too many questions. I think we covered a lot. Did you guys ever try garden.io or some other similar tool, Vishal? Yes, sure. I really liked garden.io when I did try. One thing I really strongly didn't like about garden.io was if you had a multi-module or a multi or a monorepo project, at every level of your repo, you had to create configurations. For example, if I had a Java multi-module Maven project, I would need to create garden configuration at every module level. That to me seemed a little too intrusive, at least when I tried last, and I didn't want to deeply combine my source code repo with an external configuration language. I think we did evaluate garden.io as well as Scaffold when we initially started this project. Scaffold seemed like better of the other choices. Yeah, we looked at actually quite a lot of tools. A lot of them are dead at this point. The source code is still on GitHub, but they haven't been maintained in months, years at this point. That was one of our criteria. It needs to be, well, we wanted an actively maintained project, and there wasn't one, so we had to build one for ourselves. That's why we used Scaffold that is very much actively developed. We're also not hoping to replace Scaffold. We're just hoping to provide a simpler layer in front of us. Anybody have any last minute questions at all? What about telepresence? Sawed on the roadmap. Yeah, we very much want to get telepresence working. Our developers want to be able to live, they want to be able to debug a running service running in their ephemeral environment. Now, there's a few different ways to do that. We haven't yet gotten this to work. We haven't yet tried them. You should be able to connect a remote debugger from your IDE to the cluster with enough port mapping stuff like that. Another way of doing it though would be you start up a cluster without the services that you want to debug. You'll start them locally on your laptop, and you'll connect your local laptop service to the cluster, hence telepresence, or you'll bring it to you. And then you should be able to step debug and work with it. We haven't gotten to it yet, but it'll be one of the next things that we work on. Anyone else? We hope that you will head to ephemeral.run, start the repo to let us know that you're into the project and to get updates for releases and things like that. Fork it, try to make it work for yourself, open a PR, or at least open an issue. We want to be devoting more resources to this, or hoping that the community shows, if they're interested, shows their interest, and that'll help justify spending more time on it, and getting the framework available for everyone to use in their own projects. Excellent. We're at the top of the hour, bottom of the hour. Probably know this. Just about. We're wrapping up here, but I wanted to thank our presenters today for a wonderful presentation. Thank you both so much for taking time out of your days to join us. And I also want to thank everyone for attending today's CNCF webinar. As I said before, today's recording and slides will be available on the CNCF webinar page at cncf.io slash webinars. Thank you all for your time today, everyone. Take care, stay safe, and we will see you at the next CNCF webinar. Thanks, everyone.