 My name is Aaron. I work on the Containers Team at Microsoft Azure. I'm also a lead on the Service Catalog Special Interest Group. So let's get into it. First of all, why are we all here? So first of all, obviously, we're at this conference because we're interested in Kubernetes or Cloud-native technologies in general. But also specific to Kubernetes, we know that Kubernetes is moving blazingly fast, in a bunch of different dimensions. Its usage is growing day-by-day like gangbusters. If you follow Kubernetes development, both in the core and adjacent projects, that's moving incredibly fast and increasing. The number of projects both in and out of Kubernetes are exploding as well. So basically, this speed of development, this speed of features is outpacing the knowledge that we've built in order to figure out how to use this stuff. That's why we're here today at this talk. So put a little Wild West reference in there. So Clint Eastwood, you imagine, he's got like a Kubernetes cluster in the saloon behind him. He's trying to figure out how to build an app for that. So what does that really look like? If you were at Brandon Burns' keynote at the end of the day yesterday, you probably saw something a little bit like this. He said what I'm about to say. You got to figure out all this stuff, all the first two rows in order to just get your app into let's just say staging, not even production. You got a new development test cycle. You got to figure out what is a container, how do I containerize my app, what does that even mean? You got to figure out what a registry is and how do I push to it. You got to figure out how to change your CI CD or build a brand new CI CD so that you can test this stuff on the target platform, which is Kubernetes. Then you've got to figure out what is your staging, prod, pre-prod, whatever the stuff you do before you get to prod, what does that look like. Then you might want to take a breath, but your app's in prod and you've got to figure out all the stuff in the bottom row. How do you do monitoring in this new world? How do you trace requests in this new world? How do you do logging, observability, resilience, so on and so forth, auto-scaling is another big one. These are all new things in the cloud native and Kubernetes world that you have to figure out. So it's no wonder that a lot of times I see people coming into Kubernetes that are super excited about it as they should be, but this is a real turnoff. It's really difficult to get started because there's this massive wall in front of you. Needless to say, there's a lot to figure out, but we've got another layer to this issue, to this problem, this challenge that we need to solve, which is that everybody's different. So we can't just publish a document or have a bunch of consultants go out and tell everybody the same thing. Can't say, this is how you do X, this is how you do Dev, this is how you do test, this is what every deployment should look like, and you're done because every organization has different needs, and these are wildly different needs. I've gone and I've talked to quite a few different organizations and seen how different the needs are. There's things like security and topology, there's failover, there's disaster recovery, the list goes on. So this is why if you read the abstract, I called this talk and what I want to get out of this talk, the North Star. So this is an aspirational goal. This is what we want to tell people our best practices for building a CloudNative app. So this is what I see is the North Star. What should we get after we say we have a North Star for Kubernetes? So we want some kind of a document that helps app operators and app developers, those could be the same person or the same group. We want to help those people decide what to do. Should they adopt a new technology? Should they use that new beta API in Kubernetes? How should they scale their app and so on? We know already that the technology is moving blazingly fast. So we want this North Star to stand up to that speed. So in a month when features X, Y, and Z are out and stable, we don't want this document to be obsolete. And then finally one that's near and dear to my heart as a SIG lead is I want this thing and I think we all probably want this thing to guide the SIGs so that they build the right stuff for us, for us being app developers, app operators. And again I want to stress that I don't think these things should be rules. I think they should be best practices so that organizations can pick and choose what they want to bring into their app, what kind of practices they want to bring into their app and massage them a little bit to fit their use case. So let's look at what we have now so far. There's a lot of strong opinions inside of the Kubernetes development community and outside and that's a great thing. We've got a ton of evidence because people are actually using this stuff. We see how people are using, we see the best practices that they have inside their organization. And also, as I said, since everybody's doing something a little bit different, we've got fragmentation. And that's actually a good thing. I'll touch on that at the end. But in general, we might see this word fragmentation and it might be, we associate it with doing garbage collection on our disk or something like that. In this case, that's not a bad thing. So why am I the person to say what these best practices are? Well, I'm not. I'm not the definitive guide. But I have seen deployments here and there all over the place. I've seen good deployments. I've seen bad deployments. I've seen good deployments that have some bad practices in their app. I've seen vice versa, right? So this is kind of the, I've seen some things style slide. And I am here to propose some ideas. We're gonna get into the ideas right after the slide, I believe. So I'm here to propose some ideas for these best practices based on what I've seen. So here's the first idea. This is all about observability. So did anyone go so far in this conference to service mesh talk, Envoy, Istio, something like that? Cool. That's kind of the so hot right now deal, right? Service meshes, this is the year or at least half a year of the service mesh. We've got Envoy. I forgot the name, but that new thing that starts with a C. We've got Istio, Linkerdee, so on. The list goes on. That is a big part of the observability puzzle, but there's more, right? So first of all, some obvious things on the top two little stars there. We know that Kubernetes is this always on system that always is looking at your container, your pods and scheduling them, right? And in order to do that, it has to figure out what's going on with them. Are they crashing? Are they using too much memory? Are they swapping, so on, right? And that's what it needs to do is job. But in order for you to do your job, you need to observe those containers in production as well, right? So time and time again, I've seen apps and specifically a few pods in the app just kind of limp along, right? They keep restarting, they keep using too much memory, they keep pegging CPU, right? You don't really know why. You can't really attach a debugger to a pod, okay? You don't really know why. So you need to be proactive in what kinds of information are coming out of that pod, right? Okay, so let's cover the first half. Kubernetes observing your app, right? Here are some concrete things that you can do, fairly simple things that you can do to tell Kubernetes what your app needs so that it can observe your pods more effectively and therefore schedule your pods more effectively, okay? So you can give resource limits in your manifests. That one is a super easy win because if you tell Kubernetes, I need this much CPU, I need this much memory, it has a lot more insight into where it can put your pod in the cluster, right? Even better, you can give Kubernetes readiness and liveness checks. There's a little bit of implementation in your app but that allows Kubernetes to introspect your app way more deeply and that's way more powerful than just telling Kubernetes statically, this is what I need, right? And those things kind of enable horizontal pod autoscaling, right? So if Kubernetes knows exactly what's happening with your pods and it knows exactly the criteria you want to automatically scale horizontally those pods, then you just flip a switch and Kubernetes now takes over your app and now all of a sudden your app can tolerate like the massive spike during Black Friday, let's say something like that, okay? Now this is the other side of the coin. Once Kubernetes is taking action on your pods, how do you know what the hell's going on, right? Okay, now the reason that we have this rich ecosystem, NCNCF with all these awesome projects that do great observability things is because we've got this new paradigm of bunches of pods running and you don't really know what's going on inside those pods. Sometimes you don't even know how many there are, right? So we've got tools for logging, we've got tools, service mesh tools, we all know about that, right? We've got tools for tracing. Did anybody go to a tracing workshop? Only a few? It would be sort of the year of tracing. But lean to say these are all sort of tools that you should proactively, at least it's the best practice to proactively put into your deployments so that at any time during your pod or your app's existence you can introspect exactly what's going on, okay? So you can see are there network calls that are too slow? You can see is there a circuit breaker open? Is there a service that's down? Is data getting corrupted? Are crazy logs coming out of one of your pods, okay? And by the way, there are NCNCF projects to cover each one of these things, logging, service mesh and tracing. Okay, next idea. When all else fails, crash. So has anybody heard of or written Erlang before? The programming language? Cool, that's a lot more than I thought. Awesome. So Erlang has this principle called crash only, right? Crash only means that if you crash your software, something goes wrong and something else will restart it. So in Erlang there's this thing called a supervisor that runs above your software and it always is watching, right? This might sound familiar because Kubernetes kinda does the same thing. So the supervisor's always watching and the best practice in Erlang is if something goes wrong, just crash. And the supervisor will give you another chance because it'll restart you, okay? Now in a distributed system that runs in Kubernetes, things happen, right? You'll probably deploy a bug at some point in your app's life cycle. What I've learned in running apps on the cloud is that the network is always out. The network is always down. Murphy's Law will take effect and the minute you've run your app for maybe a month, the network will go down. Same thing with disk. You will effectively swap after about a month, at least on one of your pods, okay? Every deployment I've seen after a month, on average that happens at least six times. I actually did the math, okay? Kubernetes is that supervisor, right? It is the thing that's always on, always watching and I have written probably 10, maybe 15 times, a retry loop, right? I've tried to retry a connection pool to the database. I've tried retrying connections to microservices, retrying connections to cloud services. I'm just done, right? Writing those loops is extra logic in your app that you have to test, right? It's in order to give a little piece of your app another chance to function, you have to write a bunch of code and test a bunch of code. So instead of that, I've given up, right? I've passed the work off to someone else, which is a best practice in software development, right? Don't do the work, don't write the code, right? Instead, if you're writing a program, just exit one, right, and Kubernetes will know, okay, that's not a zero status code, restart, okay? That is a best practice. So fairly simple example here, but I've put in some pieces of observability as well, okay? App connects to a database, tries to connect to a database, it can't because the network's down because all clouds don't have a network. Kubernetes sees that and it restarts it, right? But in the middle here, we've got tracing that sees that connection failure, okay? We've got monitoring and log aggregation that capture those logs that say there was a connection failure and that capture that network traffic that tried to egress, right? And if you've got an alerting system, it's watching how many of those restarts happen in a given time window, and it might alert you if you've got over some watermark, right? It might be super sophisticated, it might have sliding windows and moving averages and so on, right? Okay, next idea. Unordered is better than ordered, right? So has anybody done any investigation into what ordering means in a distributed system? One, okay. Yeah, he said it's impossible. Sort of, yeah, it's pretty close to impossible. There's been so much research that's gone into ordering and distributed systems that there are several kind of sub-definitions for ordering and distributed systems. So if your app needs to rely on some kind of ordering in Kubernetes in your app, you have to first decide which kind of sub-definition you want, then you have to implement it, which is no small feat, then you have to test it to make sure that you adhere to that definition of ordering, right? That's hard. Ordering is hard in distributed systems. I made that point bold on purpose, right? So again, don't do it. If you can, try not to do it, okay? Now, that's not realistic for all apps, of course. Especially data store apps. You might not be able to just say, I'm not gonna do ordering. But if you do, Kubernetes has built-in primitives that are really well tested to do that ordering for you, right? So again, Brendan's talk yesterday, he talked about using a sidecar to do distributed locking and distributed leader election, okay? And I'll provide links to that afterwards, right? Use that. If you need to do ordering, if you need to lock, do mutual exclusion, you need to have a leader be elected, right? Those are ordering problems. Use the sidecar, right? You don't have to write any code that way. Another one that I've used pretty extensively for preventing race conditions is using resource versions on the Kubernetes resources themselves, right? Has anyone seen that field on all the resources in the object metadata? It's a resource version. Okay, well, I'm here to tell you that it's wonderful, okay? If you've got 10 pods running and you need to manage some data, all the pods pull down that resource. They all have resource version zero, let's say, right? And then one of them does the put, the upload of the new data. And now the resource version globally is one. Now all nine other pods try to do the put with resource version zero and they all fail, right? So they all race and that's not always the best thing but there's no data corruption because the first one just wins and the second ones are left out in the cold, right? So that can be very useful depending on your app. That's a tool in your tool belt that you don't have to implement, okay? And then the last one is init containers, okay? So init containers, if you don't know, they basically run before your pod and they do some initialization logic but they can be used to make sure that something else is running before your thing, right? So for example, anybody know that there's an open-source Microsoft SQL Server, by the way? There is, it's in Docker Hub, right? It's SQL Server Community Edition and on a seven node Azure Container Service cluster, I work for Microsoft, we get free ones. So on a seven node Azure Container Service cluster, it takes about six minutes to start up in the common case with no other pods in the cluster, right? Six minutes. So by that time, if you have an app that is relying on that database, it's already in crash loop back off and then you have to wait like another six minutes or 10 minutes or something to have it start up. So the solution is having an init container in your app that keeps pinging the database and just trying to see are you up, are you up, are you up? It's like your kid poking you on the shoulder at six in the morning to see if you're awake, right? Eventually the database is awake and then your app starts right up, okay? So that's ordering, right? You are depending on something to be available before you can be available. Now it's important to note that init containers can be abused horribly. If you find yourself writing a huge script or writing some big program that executes in an init container, that might be a recipe for disaster, but it's really up to your use case, right? Again, this is a best practice, not a rule. The next concept here and the next idea is that loose coupling is better than tight coupling. Now this is taken right out of like a CS102 book, right? Software development, tight coupled code is hard to evolve, hard to refactor, hard to test, right? This means something similar in Kubernetes, okay? Now that first bullet point I've said before and I will probably, I think I'll say it two more times. Kubernetes is always on and it's always watching. It's a dynamic system that has basically undefined behavior over a large period of time and that's a good thing for you because then you don't have to wake up at 3 a.m. and then you'll have undefined behavior because you're gonna be tired and not have had your coffee, okay? But the other side of that is that your app has to tolerate that dynamic environment, okay? What does that really mean on the ground, right? Here's the first best practice and if you take nothing away from this talk, nothing else away from this talk, take this away. Please do not talk from one pod directly to another pod, right, because eventually Murphy's Law, again in action here, the other pod, the target pod is gonna go down over a month, two months, whatever it is. So instead, communicate between pods using a Kubernetes service or a service mesh or something similar, right, a proxy of some kind, right? Now going back to crash only, sometimes your other service, your target microservice is gonna be down or unavailable or like I said, the network's gonna be down, right? Crash and just have Kubernetes and do it retry, right? So I have seen in one of the apps that I used to administer at a previous job on Kubernetes, it had a 700 line retry loop. There was a bunch of logic in the loop to do things like exponential backoff and custom logging, custom stats and metrics and stuff. One instead that could have been erased and replaced by os.exit one, right? So back to crash only, I'm a huge proponent of that, okay? Now finally, I've seen a couple of apps that use the Kubernetes API to look up resources, right? That's not an uncommon thing to do. This particular one looked up pods, right? Tried to get a list of pods and then iterate the list and then talk to each one. So that's already violating point number one because it was talking directly to pods but there's another issue there, right? If I loop through a bunch of pods and then do something with them, the loop is probably already expired, right? Eventually over another month or so of writing your app, you're gonna get the list and before you can do anything to that list, one of the pods is gone, okay? So instead of operating on individual resources, a better practice would be to operate on labels, right? So get all of the pods, all of these services, all the whatever that have a specific label and that way the list will always be up to date because Kubernetes will aggregate all those things together that conform to that label and give you it back. Okay, counterpoint. The next idea is that type coupling isn't always wrong. There is a specific way for you to do type coupling in Kubernetes, right? And that is with the pod abstraction itself, right? The pod, not the container, but the pod is the atomic smallest unit of deployment in Kubernetes and it has more than one container in it or it can have one or more containers in it on purpose, okay? And that is because when Google originally designed this concept, they recognized that apps or at least individual pieces of an app might need to have multiple things sitting together, right? And this can be the case for a legacy or a big monolithic app, but it can also be the case for a very, very modern app, right? And we've seen some of those things at this conference already. Envoy is one, service meshes, right? The proxy generally runs on local host, which means you've got your app container and you've got your proxy container and your app container talks to the proxy on local host, right? Those two things would run in the same pod. Fluent D, another CNCF project. There is a Fluent D logging driver that also runs on local host. You send your logs to it and then it aggregates them before it forwards them down the chain, right? And most metric systems as well. Most metric systems will aggregate metrics, sample them and then send them on. Same thing, your app would talk to that metric system via local host and you'd have the metric system be a sidecar, right? And that's what this pattern is called for all three of these cases and more. It's called a sidecar. Anybody heard that term before? Sidecar, yeah, sweet. You guys are great. All right, next and most near and dear to my heart. This idea called record your configuration is very important. It's probably one of the most important best practices that I can convey today, okay? So here's I think my last question of the crowd. I'm doing this to keep you guys awake. Who has heard of a declarative API? Okay, for those who haven't, who has used SQL? Come on, I know everybody has used SQL. Who's talked to a database before? Okay, you guys can raise your hands, it's okay. Okay, so SQL basically, you tell the database what data you want back, you tell it how to group it, you tell it how to sort it, or you give it some data and you say put it in this row in this database, or you say update it this way or delete it, right? You do crowd operations, but the important bit there is that you don't tell the database what exactly it should do. You don't tell it how to do that, right? And that abstraction is super powerful because SQL has scaled from an in-memory SQLite database to massive databases in the cloud that have hundreds of nodes, thousands or hundreds of thousands of nodes, right? And the Kubernetes API is the same. You tell it what the end state should be and then Kubernetes figures out how to do it behind the scenes and that has scaled from mini-cube up to 5,000 plus node clusters, right? So why did I tell you all that stuff? The bottom line is that you should just keep your manifests in your VCS, in GitHub, in whatever else, right? Just keep them there and make those represent the latest working configuration of your app. And that means that at any time you can submit those to Kubernetes and it'll take the current state of the cluster and make it into your app's working state, right, and that action is called reconciliation. How do you do that? Well, I'm a huge proponent of Helm. People heard of Helm before? Cool. Yeah. So Helm is basically it does this, right? You have this thing called a chart. It's a group of the manifests that describe your app. And then with one command, doing a Helm install, it'll send those things up to the cluster and it'll make the cluster reconcile from the current state all the way up to the state that you want your app to be in, right? OK, second to last, I believe. Ask for the least. I'm going to go through this one a little bit more quickly. This is very similar to the principle of least privilege, OK? So what we're talking about here in Kubernetes is that you can ask Kubernetes for a ton of stuff. You can ask it for permissions. You can ask it for storage, disk, memory, CPU, compute, et cetera, right? The point here is that you should leave as much of that for Kubernetes to manage as possible, right? Because the more you leave to Kubernetes, the less you take for your app, the more Kubernetes has inflexibility to schedule your app later, right? And that translates to fewer wake-up calls and three in the morning and so forth, OK? So here's some examples of what that might look like. I'm not going to go through them here. But the point is that you should basically take your app or components of your app and reduce its resource usage or permissions as much as possible a little bit more so that your app can't run, and then increase them a little bit so that you basically have the least amount of resources dedicated to your app at one time, OK? Last point. Now, this one is basically don't repeat yourself, OK? Again, back to this theme of don't do more work than you have to, right? Build on the shoulders of giants means that there is a ton of code. There are a ton of man hours. There's a ton of research that has gone into building Kubernetes, right? There's a ton of infrastructure, too, that tests Kubernetes. And the API of Kubernetes is publicly accessible, of course, at least when you're inside your cluster. And it abstracts away tons of functionality, of distributed systems functionality, that is super hard to get right, including ordering, OK? So whenever possible, call the API. If you find yourself building some complicated distributed system inside of Kubernetes, try to just call the API instead to do what you need. In the case that it doesn't do what you need, maybe the community has something. Maybe cncf.io has a project, or somewhere else in the community has a project. And in the case where you don't see something that fits your exact needs, maybe try to build something on top, OK? And it's as simple as that, right? Here's some examples. I'm a big fan of Helm, of course. But if you can't use Helm directly, maybe use the Helm API. I'm a huge fan of traffic, right? It's a great ingress controller. But maybe if you can't use traffic directly, maybe put a service mesh or a proxy in between traffic and your app, right? And then Fluent D, right? We talked about Fluent D already. So I have used Fluent D to send 1 million log messages in a second, right? And I did that over a period of an hour. I did that before Microsoft. It was on a GKE cluster, and it crashed the cluster. It was a 300 node cluster, crashed it, right? So Fluent D is pretty good, right? Probably better than something that you can write in a weekend. And you probably don't want to write it in a weekend because you've got an app to build, right? And the same goes for the others, too. And plenty of the other projects that are in the CNCF. So some parting thoughts. Where should we go from here, right? I gave some ideas. I threw some things out here to start a conversation, right? But as I said at the beginning, we've got that fragmentation, right? And what that really means is that we all have a wealth of experience and a wealth of ideas on what a good app should look like, a cloud-native, good Kubernetes app should look like. So we need to share that. And this is a call to action, OK? I've put a description of all these ideas into that GitHub repository. So I would encourage you to go into that repository, submit a pull request, submit an issue, do whatever it is that you want to contribute up to that repository to add your ideas. And it might look like a new best practice. It might look like fixing a typo. Either way, totally cool. But I want to solicit your ideas to begin compiling this list of best practices for a cloud-native app, OK? So with that, I want to say thank you very much. I really appreciate you coming. There's my contact information up there. And I think we've got about five or so minutes for questions. So thank you very much. Yeah, so the question was the crash-only kind of architecture. If we've got too many pods crashing at once, is that going to overload Kubernetes? So the answer to that, well, that's a very good question. The answer to that is probably not. You might have a million pods running. And of course, that's a corner case. That's an edge case if your cluster is too small. But if your cluster is big and you've got a million pods running on it, that's OK. Because the Kubernetes API isn't going to get overloaded by that, because the kubelets are going to try the restarts first. So that restart logic is distributed across all the nodes and not all in a single point of failure. Does that answer? Cool, yeah. So I missed the very last part. I know that you were asking basically about thundering herd. No, actually I don't, because the supervisor processes kind of know about ordering and dependencies. Kubernetes, of course, doesn't. So I think your question was kind of about how you avoid that thundering herd. When a downstream dependency goes down, then all the upstream dependencies also go down. Is that about right? Yeah, yep, yep. Yeah, and that's a very tough problem, of course. So what I would say is the current state of the art in Kubernetes is primitive. If you crash, if your database crashes, then everything above it is going to crash. Of course, that's right. However, when you bring it back up, then that might be a manual process to bring it back up. But when you do bring it back up, you can bring it up slowly in a different namespace. And that would mean that not everybody is going to go and talk to the database and thundering herd basically DOS your database. If you do bring it up in another namespace, you can use that service mesh or you can manually route just a trickle of application traffic over to the new deployment and thereby at least avoid that thundering herd problem or the DOS problem later. So you still have that cascading crash. But when it comes back up, you don't necessarily have the, I call it the DOS problem, right? You don't necessarily have that. So half of the problem is solved with possibly some manual steps. But you could also automate that too. We just don't have a good mechanism for doing that yet. Maybe you can put one up on GitHub. Yeah, yeah, well, you're absolutely right. If the comment slash question was about, is Kubernetes going to have some sort of ordering primitive in deployments or pause in the future? I've probably only seen the same issue, the GitHub issue that you've seen. I don't know the status of that. But I do know that it's not going to be in 1.9. That's all I can tell you. Is it alpha right now? OK, scheduling priority. Thank you. Any other questions? Yeah. Well, yes and no. Exactly. So if any container had some kind of an exponential back off, that might work. But if you bring up your app, let's say you've got upstream dependencies, they're going to take that much longer to come up if you go an automated approach. So it still might be better for you to have some outside automated tool or take some manual steps to kind of manually bring your app up. Because it may happen faster. It may get back online faster than if you just tried to let Kubernetes do its thing. Other questions? Yeah. Yeah. Yeah, so the question was, how do you configure your app at the app layer to do things like connect to databases, connect to your queue, whatever else? Is that about? Yeah. So right now we've got this thing that one of my colleagues has called the Post-it Note Flow. That's where your operator provisions a database by clicking a button on the cloud console and then gives you the credentials on a Post-it. And then that's it. That's what you have. And you have to figure out how to get it into your app. This is a little bit of a shameless plug for the SIG that I work in. It's called Service Catalog. And the point of Service Catalog, the reason it exists, is to solve that exact problem. So what you would do is, instead of embedding credentials into your app, you would embed a new Kubernetes resource called an instance into your app. The instance would be the instance resource would be in charge of going out, hitting a cloud resource, for example, creating the database, and then dumping the database credentials into a secret. And then instead of having your credentials baked into your app config, you bake in the binding of the secret to your app config. Does that make some sense? Yeah. So if you Google Service Catalog Kubernetes, it'll pop up. You can kind of see how it works in way more detail, of course. Cool. I think we are at time. So I'm going to be around. Feel free to ask me questions offline. So thanks again, everybody. Cool. Thank you. You