 Hey, everyone. Thanks for coming to our talk. It's the last talk of the day, the last talk of the conference. So I appreciate you sticking around and listening to us talk about migrating VMs, or from VMs to Kubernetes. And the message of the talk that we want to pass along to everyone is, we did it, and so can you. I know a lot of times when we start talking about technology migrations, people get scared. And we're here to show you some things you can do to mitigate risk and how you can be successful at it. Yeah, cool. So I'll just start off by introducing myself. My name is Alan Nelson. I'm a full stack developer at Knowledge Hound. I was, prior to that, mostly on the back end. I did some DevOps stuff and some server development. And now I'm just kind of all over the place. So with the background I had, I had a big role in what we've done so far at Knowledge Hound, migrating over to Kubernetes. And in addition to that, I do a lot of front end development, JavaScript, and back end development in Python. So just kind of a generalist. And yeah, that's me. Yeah, I'm Nick Petrovitz. I'm not that different from Alan. My title's VP of Engineering. Just means I get a code a little less. But I'm also a full stack developer. I've done a lot of web development, a lot of back end development, lately a lot of data development, because my previous role was at a company dealing with a lot of big data and working on their data lake. I've done all sorts of stuff. I've been a product owner for a little while. So I've tried all sorts of different roles in the tech space. So where does our adventure start? And why did we want to come here and talk to you about this today? Again, I know a lot of us see things that we want to try out, and oftentimes don't know where to start. And this is kind of a simplified version of our architecture. We've ripped out things like web application for our walls and whatnot, just to try to make the picture look simple. But to give you a really high level overview of our architecture, we have an app that's a software as a service application. It helps market researchers do some research. And we have a load balancer that sits in the front of some EC2 instances. We serve all of our web content in S3, front and by cloud front, and there's some other caching layers that are hidden in there. Our web apps talk to Elasticsearch, Mongo. You know the typical kind of things most of our applications talk to. In our talk today, we're going to focus only on the stuff that's in EC2 and how to migrate that to Kubernetes. Yeah, Elasticsearch is hosted for us on EC2 as well. We're not going to talk about that. There's other talks and things out there on the internet already that give you kind of how-to guides of how to migrate other tools out there. One other thing I want to mention is our services are built in Python using Django. Our web apps mostly react. We have some other things like Angular too. And there's legacy in our system as well, such as little PHP. I don't want to talk about that. So a lot of us kind of feel like, yeah, containers. That's cool. We could try containers. But everything already works. And I hear Kubernetes is really hard. And there's all these different things. Should I do EKS, ECS, some of these tools in Amazon or Google and Microsoft? I don't really know where to start. If you're like us, there's six engineers in our development team and our startup. And none of us are actually DevOps people. Our primary role is writing our application that our customers use. The business is never going to prove us just stopping everything we're doing and migrating to Kubernetes. So we're not going to be able to do that. And like I said again, we're a small startup. We're cash strapped. Like every moment of Dev time really matters. So I can't stop working to invest in this. And there was Murphy's law that exists. Maybe one user one day brought down our whole website. There was one service that ran in our EC2 instances. And first they crashed it on one node by gaining its analytics processes to use too much memory. And they didn't press a five a bunch of times and cause it to happen again. So Murphy never strikes twice. I don't really need to worry about that. We changed some Apache sayings when theory it shouldn't happen again. And I hadn't for a while. So yeah, we should just stay on our platform and not move to containers, right? And we can make this talk early and you can all go home. No? All right. So yeah, we ultimately decided that we should still go serverless. And one of the cool things that it does for us is it allows us to replicate our services and scale easier as well, which is one of the things we hadn't figured out. The way we were deploying to EC2 was really complicated and prevented us from being able to easily do auto scaling and manage our deployments and the way we were doing deployments. It improved our capacity a lot and our resiliency because, hey, now we can do auto scaling because Kubernetes actually made that a lot easier for us. It allowed us to put a lot more control in the memory limits between our applications because the way our infrastructure worked, we had two EC2 instances that all of our applications were deployed to in Apache. And Apache doesn't give you as much fine-grained control as we needed across our applications and to allow them to scale at times when load was higher on certain areas of our app. And the circuit picker pattern within Kubernetes is pretty powerful to help prevent some of the failures that keep happening again and again and again and to start throttling some of that so we don't thrash around as much. So yes, we were able to migrate to Kubernetes despite all the limitations. And you can do it, too. And don't think that we were able to do it only because we're a small team and we can do it whatever we want. And I run the engineering team and decided to prioritization. No, that's not true. Prior to this company, I was actually at a much larger company, and I know at least one person in the audience was out there with me. And they were a big, big organization and really didn't like to make change. And we were able to also get our application deployed in Kubernetes out there. Very different solution than what we're going to talk to you about today. But no matter what the size of the organization is, you can definitely get your applications migrated onto Kubernetes. There's a piecemeal way to do it, and we're going to give you some tools and methodologies to help you figure out how to decompose that and break it down. And one more thing, again, just to remind you, and I think we said this before, is neither Alan or I are formally trained in any infrastructure. Any DevOps work we've done has just been because someone had to do it. And yeah, we kind of picked it up on the fly and we're using AWS EKS because it does make Kubernetes a little bit easier for us. And yeah, it's been fun so far. If you're like us, you could use AKS, GKE, or many other products. So with that, I think I have one more slide before I turn it over. So let's take a deeper look at our application and what's there. As I said, we had a whole bunch of services running in EC2 containers that were load balanced. And they all run in Apache. So unfortunately, the number one way to bring down our website is this little thing that's a second box from the left on the bottom there called the analytics service. And that thing can really use a ton of memory. And it can use it up really fast if a lot of people are interested in the same type of data. And it does on the fly stats calculations statistical significance, all sorts of things that are really cool and powerful for our customers and our differentiators in our product. But it has the potential of really using a lot of resources in our environment. And when we have a couple shared servers, they're a bit over-provisioned, and they try to help us manage through those times. But at times, we can hit those memory limits. So it really also became hard to automate an auto scaling group when you're in this situation. Because we'd have to bring in that VM, automate our deployment, and everything else just in time. And that really wasn't how we had set up our infrastructure. We did use Terraform to build everything. A lot of the other dependencies are brought into the system with Chef. But our CI system is what's pushing all the code out to the environment. And that's usually done for us through CircleCI. Or some automated scripts that get started from the shell and then run things out on the machines and cause them to pull code. But with that, it makes it hard to auto scale. And that's one of the benefits of us actually publishing images now to Docker. We'll be able to solve some of these problems. So there was a lot of options for us to decide how to start this migration. And ultimately, we decided that the best way to go forward with this would be to do something small and simple. Let us test the waters, make sure Kubernetes really was for us, find something that was low risk, that if it went down, because it would probably go down, because none of us know what we're doing, at least that was the assumption we went with being humble. And figured it might go down. Let's pick a low risk service that gets to the least amount of use. It really allowed us to weigh the pros and cons of Kubernetes and start to get some real learnings that we would hope would build momentum later into our project that would allow us to move more and more services later and feel more confident as we went forward and be successful. Because if we failed on this very first migration, we'd get a lot of pressure from the rest of the company, small. We all sit in one room. There's like 25 of us not to keep investing in this. So with that, I'm going to turn it over to Alan. All right, cool. So I'm going to talk in a little bit more detail about some of the criteria that you might use to decide how to start your transition, what approach you should take, what service or what type of service you might want to start with. So let's talk a little bit about SOA. So Kubernetes is really going to be helpful to you if you have a service-oriented architecture. That can be like any degree of microservices or just individualized services. But the goal is to have something which is more or less loosely coupled, does not make a whole bunch of assumptions about state. If you are trying to migrate a single giant monolith app, you're probably not going to want to start off by trying to migrate that into Kubernetes. You're going to run into a whole bunch of difficulties because of the complexity of the app. And you want to also provide yourself with an opportunity to really take advantage of what Kubernetes offers. And if you're doing that with, say, a big monolithic service, then you're going to make the process difficult, and you also might just not see that much benefit from it, because it's not going to be independently scalable, et cetera. So another thing to talk about is especially the first time that you start using Kubernetes, you're going to run into a lot of difficulties. You're going to miss certain steps that you didn't realize were necessary. You're going to run into issues setting environment variables or getting configuration in there. There's going to be networks, which you can't reach because you forgot that you had to set up some kind of route. And that's one of the reasons why you want to start off with an app which is the first criteria, I think, to talk about is that the app should be small. It should have relatively few ways. There are a few things that it does. It shouldn't be too complicated. You should be able to understand its operation more or less in a vacuum. It should be able to, maybe it's an HTTP server that's able to make and respond to a few requests. And its behavior is more or less easy to digest. That's going to allow you to separate difficulties with Kubernetes from difficulties with the app that you're trying to convert over. The next thing is that it shouldn't be critical. This kind of goes without saying, but you don't want to try and start off by converting your authentication service that everything talks to. Because if that goes down, then suddenly your entire site's inaccessible or you might have some other major outage, you want to pick something which is not going to be the linchpin of your site. The next thing is that it should be stateless. So stateless doesn't just include whether or not it reads the local file system or something like that, which is obviously very important. It also includes, does it do a lot of database migrations when it gets deployed? Does it need to have a certain site up and running before it can start up? Is it going to be reading off of a queue and potentially causing problems if its behavior is executing and you don't realize it? So the more stateless your app is, the easier it'll be to go through this process. And again, a simple REST server is kind of ideal for that. The next thing is being new. So this is just an easy way to encapsulate some of the other criteria. It's going to be small because it's new. It's going to be non-critical because it doesn't exist yet. It's going to be stateless because since it's new, you have the opportunity to make it as stateless as possible. So if you start off with a new app, then you're going to just save yourself a lot of headaches. But you don't have to. Like, you can certainly start off with something that already exists. It's just a new thing is going to provide a lot of that out of the gate. The next thing is just having it be representative. So you don't want to deploy an app, which is the very first time you're using Node.js, and you've only done Python so far, or deploy an app which adopts an entirely new paradigm. Kubernetes gives you that ability, and that's certainly an end goal of using Kubernetes, is that flexibility. But in the beginning, you want something which is going to pay the path forward for other apps. So you want it to just not differ too much from other apps that you're going to follow up with. All right. So one of the big goals that we, one of the values that we have at Knowledge Hound, we actually have a whole bunch of these flags up in a room, but I zoomed in on one of them because it's very relevant to what we wanted to do, which is being team first. So when you are deploying code, that is something that everybody has to be involved with to some degree or another. So you want to make sure that your developers are happy. In our case, we are all developers, and we're all using this at the same time. And either way, if you are using this, or if you are affected by it, you want your life to be minimally disrupted as a developer. You also want to know how the thing works. You don't necessarily need to be an expert, but you want to be on top of things. So having scripts is very helpful for that. You just tell people, run these scripts, and that automates a lot of things. They don't have to learn a whole lot. You want to be able to deploy things darkly so that you can make sure that they're up and running before they start to impact other services, which might cause trouble for the rest of your team. You want to make sure that you don't have a big, long running, isolated branch, which has a ton of development on it that nobody really knows about, and needs to be could potentially conflict with other things, or grows to a large degree of complexity that only maybe one or two developers are aware of how it works. And then suddenly, it needs to be communicated back to everybody, and there's this giant information dump. To the extent possible, you want to make sure that merging and rebasing, and both in the get sense and in the sense of the knowledge that you've gained, is a constant process. And some of the specific goals that we had with the project that we wanted to achieve is that we wanted to make it easy to flip between versions. We wanted to make it so that this was actually solving problems, rather than creating new problems, and making it so that the dev team was all on board with everything. So I mentioned briefly here documentation. I'm going to go into that more specifically now. So the big thing with documentation, I think that you can obviously write whole books on documentation. There have been great talks on documentation so far. We had a fairly straightforward principle when we set about how to do our documentation. And the main thing was that there are two types of documentation that we wanted. One was very low level. It explained all of the detailed step-by-step, like here's the decisions that we made. Here's how we are setting up our Kubernetes cluster. Here's how you configure EKS. Here's how networking is going to work. The whole thing went down and needed to be redone from scratch. Another developer could go through this document and at least have what they need, the bare minimum of what they need to go on. But then there's another kind of documentation, which is just kind of Cliff's Nose style. I'm a developer. I want to deploy my code. How do I do that? I am like something is wrong in production. How do I look at logs? Or how do I SSH onto it if that's something that you want to support? Just like quick hits that a developer is going to want to know. I think that both of these types of documentation are very important because they serve different use cases. And so don't think that there's just one set of documentation that you're going to have that covers everything. Think about who is reading the documentation and what kind of problems are they trying to solve with it? So I'm going to talk a little bit about how we develop right now. What we do right now is a hybrid between Vagrant and Docker for local development. So we used to exclusively use Vagrant. And as we introduced the concept of containers and Kubernetes into production, for local development, our approach was Docker compose. And there are some pitfalls with Docker compose because it's different than Kubernetes. You have a different set of configuration files. In our specific case, the amount of configuration that we needed for Kubernetes was not particularly complicated. We basically just needed to set some environment variables and maybe a few other things. And accomplishing that in Docker compose was not particularly difficult. So you might actually find that this approach works for you OK, but of course, your mileage may vary. Just a few other notes, we have each of our services run in different repos, which means that we need to sort of split configuration across the app-specific configuration which lives in its repo and a sort of general configuration which lives in our DevOps repo. Each developer is responsible for starting up and tearing down services manually. So that applies to your virtual machine as well as to Docker compose. So there are solutions where you run your service in a cloud. You can even have a miniature Kubernetes or you can run MiniCube on a host or on your local workstation. There's all these different approaches, but we went with something which was very straightforward, which is that a developer types in a command to start up their Docker containers, types Docker compose essentially, and they are responsible for running all of that. So we didn't want to over-automate because the developers can take care of a lot of that themselves. OK, so if we look at the two different sort of approaches here, on the left you have a Tesla dashboard. It is super, super minimal. There is a steering wheel and a touchscreen and some pedals and that's about it. And on the right you have a Jet cockpit which has a million switches and dials and knobs and they all do very important things. But if you're not trained in this, then you're going to have no idea what you're doing. So Kubernetes is kind of the same way. It can be very, very daunting. It can look a lot like the cockpit on the right, but as a developer what's most useful to you is to have something like that on the left. Like you can already kind of figure out how to use it even if you don't necessarily know all the complexity that's hiding behind that touchscreen or under the hood of that car. So what we wanted to do is provide enough knowledge and tooling to the developer to be able to accomplish what they need to know how to do and provide a simple interface to do that without necessarily going into every single possible use case of Kubernetes and all the different things that we could potentially do. We want to just kind of stay simple and stay with specific goals. I want to be able to do X. How do I do X? All right, so I'm going to talk a little bit about DevOps scripts. Another thing that you could have a whole conference on probably, but what extent do you do scripting? How do you approach it? And what are some of the kind of pitfalls? So I think that scripts should be as simple as possible as a general rule. Like when you are scripting something for your organization you're not building up a general purpose product to be made into open source software or to a tool that you want to sell to people. Maybe you are, in which case that's fine, more power to you, but you probably aren't. You probably are going to benefit more from having a small script with a simple interface. The goals of the script should be pretty clear. Like we have a deploy script which executes the steps required to deploy something. We have a pod script which basically executes like how do I SSH onto something? And then some things we didn't actually write ourselves. So Cubetail, for example, is a fantastic tool that somebody else wrote. All we had to do was say like, hey, there's this cool tool. You can go and use it. So the most common things that a developer is going to do, you want to have scripts around, but they don't need to be scripts that you wrote yourself. So the most important thing with a script is to, if there's a sequence of operations that is complicated and needs to be done right, you want to have those things scripted out. But if it gets too complicated, then suddenly it becomes really, really hard to manage. So I'm sure that a lot of people have seen something like this where you have this big script that is very, very complicated and nobody really knows how it works. And every once in a while, somebody adds some new feature to it and it just gets really complicated. And when it breaks, you have no idea why this thing happened because it's doing so many different things and it's trying to do everything and it just kind of fails. So the one Kly to rule them all kind of thing, I think is a very dangerous pattern. It's really easy to do because it's like, oh, I've already got this like Kly. I'm just going to add some more things to it because I realized I want to be able to like, you know, scrape the logs off of something. So I'll just add that to my Kly. But now all of a sudden, it's just getting really, really complicated and its behavior is difficult to predict. And also nobody wants to work on it because it's so big and so complicated and when it fails, it's a giant mess. So at Knowledge Hound, we have like, basically our average script size, we have a directory full of these things. Our average script size is 47 lines long, including comments and white space. That's of course a completely arbitrary in a lot of ways. It depends on the language that you're using. We primarily use Python and Bash for this, but you just want to keep things relatively straightforward. So that somebody can open it, see what it's doing, and if it doesn't work for them, they might be able to fix it like momentarily. And then as I mentioned before, don't like invent everything. So if you make a command like, tail my service and all it does is call it a cube tail on my service, well now you've created a new command that people have to know, when instead they could just use this awesome tool which already exists that you don't have to write yourself and you don't have to provide an opinionated way on how to use it, you can just say like, hey, there's some documentation out there. Go and look at how you might use that and provide that guidance rather than providing the tooling. Okay, so another comparison here. Telephone systems have evolved a whole lot over the years. On the left you have a switchboard and on the right you have a cell phone. They both do basically the same thing though. For somebody who wants to make a call, you pick up your phone, you call a number, it doesn't really matter how the nuts and bolts work under the hood, you just want to be able to talk to the other person on the other end. Now apps are the same in a lot of ways. You are going to, your app is gonna be a lot simpler, the less assumptions it needs to make about how its communication works. This is especially important when you're doing something like transitioning to Kubernetes because the way that your apps are talking to each other is gonna impact how easy this transition is. So for example, if you can do all of your application communication over just HTTP that goes through DNS, if you change the DNS entry that your service is talking to, you've suddenly deployed a new service and it can be that simple. So the more that you can make your application agnostic, the easier it's going to be to facilitate a transition like this. And again, that's one of the reasons why I was talking about restful services because they are just really good at maintaining a sort of zero state, very abstract interface for communication. And that's the decision to go that route made it so that our process of switching over was a whole lot easier. We basically spun up a DNS cluster, spun up a Kubernetes cluster. We, when we were confident that it worked the way that we thought, we changed our, we used Terraform to create a new DNS entry that routed traffic from the old address to the new one. And now all of a sudden, the new apps are receiving traffic and it's all very, it's not that there weren't problems but it certainly made things a lot easier. So there's a few gotchas with this that are worth mentioning. One is DNS caching. So if your apps are using DNS to communicate and they have a stale address, then there might be a period of time in which you send traffic to the wrong address. So be aware of what the TTL is on your DNS and just another approach that you can do to solve that is by making a route on the machine that they used to be talked to which proxies traffic to the new service. And this is actually one of the things that we did as just sort of fail safe is that we have a whole bunch of virtual host files that look like this and basically all they do is say I'm gonna listen on port one, two, three, four. Any traffic I receive is going to proxy to in this case the local host because if you ever use VM where that IP address points back to the local host and it's gonna go to local host and then the port that my Docker container is listening on. And we have similar things in production and other environments where if we didn't catch something then this will catch it and if the traffic is being sent to the wrong place then it will get forwarded through. So something as simple as a tiny little virtual host like this can really save you. And I recommend looking into that. Cool. All right, so I'm gonna hand it back to Nick now and he's gonna talk about what's next. Yeah. So as Alan talked about, we got that low risk service migrated over into Kubernetes and that was really awesome. The whole team felt excited about it. The service was this low risk service that manages the ability to connect other media and stuff that's not even hosted in our own platform. So it's really this third party interconnection service. It was really low risk. If it went down for a while, no one would have cared, but hey, we were out there now, it's successful and we had to pick up our two along the way but cool, it works. So the whole team's happy and now they're all like, so when are we gonna do the rest? We gotta get everything into Kubernetes now and the same problem comes up of, oh yeah, we can't just stop all the work we're doing and migrate everything, we'll have to find time between projects now to migrate the rest of this stuff. But we started looking at, well, we did something easy. Now what's important to migrate over? And I mentioned earlier that, yeah, Murphy's Law, we can ignore that, right? Because the person who crashed our website, they can't do that again. Well, yeah, they probably could. So that was our logical next step. This analytics service that was seen in Apache still and our application gateway also seen in Apache on the same instances and the same EC2 nodes meant that it was still possible for someone pretty easily to mistakenly bring down our whole, or bring down a whole EC2 instance and obviously once they did that, they'd be like, man, my web page won't reload, hit refresh a bunch of times and then they'd bring back down the rest of them before the first one would come back, which is something we didn't like. So we wanted to remove the ability of users to do that and we decided that we'd start moving that high-risk service that had the ability to crash the rest of our infrastructure into EKS and we did that. It went really smooth because we learned a lot from the first time we did it and we started to build more momentum after we did that and the team was like, okay, cool, that service actually has a lot of state. It connects to some of the more complicated resources. It uses a lot of memory. If that thing can run in Kubernetes, we should be able to get everything into Kubernetes. Some of the benefits that we did by getting, I think, in there as well is now we had four pods running on this thing before we only had two instances of it. The pods recover a lot faster than a whole EC2 instance does and because we have more instances of this service, when someone runs a really big query or two or three or four of our users or hundreds of our users do it all at the same time, that load gets spread between more machines, the memories, a lot more managed and we can set higher limits on it and not have to worry about it bringing down the entire shared Apache instance because that's gone. And when someone does crash it, like I said, it comes back quicker. We have more instances, it's a lot less likely to happen, so that's really cool. Then our authentication service was another thing that shared those same problems so that's a single point of service. Every service basically on our platform uses that so we migrated that over there now too. One of the things that I probably forgot to realize I did forget to mention when I went through our slides earlier was in our world in the EC2 instances, all the services talked to each other kind of directly after they, or I'm sorry, they talked to each other through the application gateway which was really inefficient and was something we were looking to resolve and because of the load balancer and EKS, it made us a lot easier for us to have the services just talk through EKS to each other and not have to keep her lying on the application gateway as we move them over. So what we were able to do is you'll see like there's a search service in that diagram. That thing, if it has to talk to anything in EKS, it'll still go through the application gateway which will do the communication with EKS and run all the services and because we did that, we didn't have to retouch or rework any of the services that were already deployed in our EC2 instances and have to worry about a plan to modify things and touch multiple locations as we migrated a single service and that allowed us to really minimize the risk and minimize the impacts other parts of our application as things migrated over, which was really pretty cool and that was another part of why we had those virtual hosts that are running there to help make sure that traffic gets migrated if it doesn't use a DNS entry. I think if I have anything else so I cover on the side, I don't. So what do we have to do next? Our whole infrastructure isn't in Kubernetes yet. We're still working on that. Probably the highest priority for me is to get the application gateway out of the EC2 container. That's our last single point of failure that exists in EC2 and once we do that, it'll also greatly simplify our networking and the communication across our applications because there's technically a second internal hop that happens over the network that we'll be able to get back a little bit of time from on every request and that'll be really cool and I think with that, I'm passing it back to Alan. Yeah, that's right. Okay, cool. So I'm just gonna kind of talk about a few of the, I mean, there's a whole bunch of topics within Kubernetes and within that, there's a whole bunch that you could talk about specifically as regards resource usage and stability but I'm just gonna give a few quick hits of what we had. So one of the big problems that we were trying to solve with Kubernetes as Nick talked about earlier was that we had these memory spikes where the application would get really large in terms of resource usage and other applications did not do that and we wanted to be able to manage like on a per application basis what is the expected use of memory? What's the minimum memory? What's the maximum memory that we should expect to use? And obviously Kubernetes, that's a big win with Kubernetes is that you can set what are called requests meaning what the app needs at a minimum and limits meaning the most resources that an app is allowed to consume before it gets killed. And what I would recommend with that is that you don't need to be exact in your initial measurements. You should start from some kind of a metrics collection that you've used over time like get a sense of what the maximum is that your application has ever used under normal or allowed loads, get a sense of what its resting rate is and then just kind of be a little bit conservative and fuzzy with that and pick numbers that just kind of make sense with that and then make sure that you have a once it's deployed make sure that you have monitoring on those metrics so you can just see does it fall within what I expect? Is this still the, or do these numbers still make sense with a new deployment model? Could be that the instance you're running it on or the load balancing that Kubernetes offers or some other thing changes those. So you just wanna be aware that things might be different either better or worse than you expected and just be able to respond to those. But in our experience so far we haven't found that there's been a huge amount of tuning required on these. You can kind of just set some initial defaults and it'll probably work for you or at least it did for us but certainly just use the past behavior of your app as a guide there. So one of the things that Kubernetes makes really easy is replication. You can just say I wanna run this many replicas of this service. You can have it scale easily. Just make sure that you use that. Like a lot of times at least if you're coming from the deployment model that we had like we pretty much only ran one copy of things per instance and we rely on Apache to scale those out and all that. But once they become containers they're like full units where you can replicate the actual container itself in Kubernetes and get a lot of redundancy that way and you can get some really low hanging fruit wins of making your app more resistant to failure. So it's very general advice but I would just say make sure that you use it. So as just an example of what we kind of have in our starting point Kubernetes config is that we have a two instance EKS cluster at this point so we have two EC2s that the pods are spread across that will probably grow in the future. What we start off with is four replicas per deployment so that would be two per instance. And when we do an update we specify that it's a rolling update meaning that it will sort of tear down one, add a new one in, tear down another one, add a new one in so that you always have listeners on it and we say that we give it that it can have a maximum of 50% unavailable which means that you'll have always at least two pods up and running at any given point in time. So barring like some crazy stampeding herd of elephants hitting our servers we can be pretty confident that things are gonna stay up and running. And that's one of the big reasons why we wanted to go with Kubernetes and it's worked out well for us. So if that's something that you're experiencing then I'd recommend just making sure that you use as many replicas as makes sense because they're cheaper than you think. So one note I wanna talk about CICD. So this is another giant topic but CICD is going to be very important in how your deployment process works. There's like whether you're on circle CI Jenkins or whatever you can have this manage or how you build your containers, how you deploy your containers. You can have a completely automated system that is entirely in Jenkins or circle CI or whatever. Or you can have individual commands that developers run on their workstations. There's advantages and disadvantages to each one. When you're developing and when you're playing around with this at the beginning it's trying to set it up all through Docker or through circle or Jenkins or Travis or whatever might be a giant pain. And there's gonna be a lot of things that you don't realize when you're initially setting these things up that are probably just gonna be easier to handle when you are working on your locally. And going back to the scripts that we were talking about before, if those scripts that you develop as you are figuring out how to do Kubernetes for your team, if those scripts are general enough and provide the correct tooling then it should be fairly straightforward to port those over to run in your CI system. So I would say don't think that from the get go you need to do everything through CI but it absolutely is like the, it's the goal that you should be approaching in our opinion because it allows you to lock down the process and to have a series of and have a reproducible environment for builds to occur in. So just be aware of the trade-offs there. Certainly setting up the actual Docker infrastructure can be a challenge. Like how do I get the Docker to run in such a way that I can set up all my secrets and be able to execute all the commands I need? Do I have to set up an account in AWS or whatever that's locked down that can run these things? It introduces a whole lot of potential difficulties which is one of the reasons I would maybe suggest in the beginning that you don't worry about it but certainly it's where that you should be aiming to go as an organization. Let's see, so I guess I'm gonna pass it back to Nick to talk about our sort of future work and what we've learned so far. Yeah, so for us we're aiming to get to a victory point which would be everything's in Kubernetes and we kill off the, at least for our application services that are the code we wrote, we kill off all of our EC2 instances. Obviously we could do things with like Elasticsearch, there's a weird reason why we can't use a hosted Elasticseolution for that and we can migrate things like that to Kubernetes as well but I don't wanna talk about that as much today. But our really big goal is to get all those services to migrate in to Kubernetes and keep using that momentum we've built up with the past migrations we've done to roll that stuff forward. And as I said, we're a small team, there's just a few of us and we're gonna do it probably one a month hopefully and that's gonna be the pace we move at. Yeah, it's a lot longer timeframe than all of us on the team would want to move in but it's a timeframe where we can absorb it, or continue to ship our product, continue to innovate and bring value to our customers and at the same time, we're bringing value to the dev team that's gonna save us time in the future both on operational support and our ability to move a little faster as we go to make changes and to unlock other problems we've had. For instance, if we wanna upgrade our version of Python on our EC2 instance, that's a little challenging whenever it's a shared machine. So being able to manage our dependencies a little bit more separately and make changes that are a little bit bigger and more important and more strategic to our individual applications is really gonna be nice for us because today we're often slowed down by, oh, we'd like to do this but it's gonna break this other application and all right, we don't really care to touch that right now. So it'll really provide a lot of value as more things migrate over and who knows, maybe we'll leave a couple things in EC2 for a while and shrink the machine sizes down even further and just not care about them because they work and if they do break, we won't care but ideally it will be nice to unify and I think it's as much as we can we should try to unify our infrastructures and our technologies, tool sets and minimize a number of unique things we have because it really shrinks down what our teams and what we need to be familiar with. Earlier in the talk, for instance, I mentioned we have some PHP and we're also trying to eliminate that from our stack and almost all that's gone there's only one more thing really left to resolve. So that whole mentality of reducing the footprint that's out there and making less tools for us to support is really important and there's a couple things we didn't discuss here today. One of the things that's really important that we didn't discuss is database migrations. We're lucky that we use Django and Django has some powerful tools that allow us to automate our database migrations that we were doing them and we do spin up a ephemeral pod that will run the migration and just a one-time job as the deployment happens and that's really powerful and fortunately there wasn't enough time to go into that. Hosted databases, we're using RDS right now because it makes it easy. If we wanna move our elastic search into the Kubernetes cluster, there's a lot of additional challenges there because it does have persistent storage and the way our Kubernetes infrastructure is set up right now is everything's ephemeral and if it goes away we don't really care but when you start to move a database into Kubernetes it's a lot more complicated because you have to think about, all right I'm gonna be using some sort of block storage and how is that gonna be managed? What's the latency gonna be there? There's gonna be some potential performance implications to that and then configuration management is a really complicated topic we didn't get into and one thing within that that maybe we've heard at some of the talks today is just think about the consequences of Terraform and security and how everything gets written out to the logs and you don't want to accidentally make any security mistakes there so we kinda glanced over that at the talk maybe so that at the beginning but just be sure to read some best practices as you go and make sure that you're following them as best you can. So thanks again for listening to us and our story about how we migrated knowledge hounds or began migrating knowledge hounds infrastructure from EC2 into Kubernetes. It's really helped us deliver a more stable software as a service platform for our clients and help them get the experience they want from us. We are a cloud-first organization we're lucky because we don't have any on-prem infrastructure everything is in the cloud it's all in AWS for the most part and we've been able to continue to build tools that help our clients automate their work and it's really also going to do a lot for us in the future to be in Kubernetes now it's gonna reduce some of the pain points for us to spam new infrastructure again we don't have a DevOps team so it wasn't possible or it was very hard for us to figure out Terraform, spin up new EC2 instances to do other things. So if we wanted to make a Spark cluster in AWS yeah you can do that and we'd have to learn a whole bunch of new tooling but now there's a lot of recipes and whatnot available for us to reuse within Kubernetes so it's gonna help us as we start doing more ML driven search automation it's gonna help us as we do more data ops automation and as we start to try to do some really cool things with data like build normative models across survey research so that's what we're after that's why we undertook Kubernetes it was more than just it's cool and we really think it's gonna unlock doors for us and it gives us new tools so thanks.