 Hello, everyone. So we are coming up right on the tail end of the day here. It is 4.35, and I have 35 minutes to go, so let's get this show on the road. So to jump into things, I'm Redbeard. I'm with CoreOS. I talk about weird things. Apparently we're talking about making cloud native weird, so I'm in the right place. Jumping into it, I am here to tell you a whole bunch of lies. Now these are going to be lies in the sense of they are lies like high school physics, where I tell you that the smallest particle in the universe is an atom. And then you learn that there are quarks and anti-quarks and positive spin and anti-positive and all of that. So as with any good presentation, we should be able to answer a number of questions by the end of this, and this presentation will not be any different. If you can answer all of these questions by the time we're done, then I've succeeded. If you cannot answer them, then I have failed. So first off, who? Who is the target audience of this? This is going to be organizations who are looking to run Kubernetes in some redundant manner. What are we going to be talking about? In the words of Mike Tyson, everybody has a plan until they get punched in the mouth. So what we are going to talk about are the considerations that you have to make to be able to build out these types of multi-environment, possibly multi-cloud systems, and making sure that you've kind of planned the steps and that the execution makes some sense. And what are the common misunderstandings there? So, when are we going to do this? Well, we're going to let you know when everything is going to be generally available within Kubernetes. I will be talking at one small point about Tectonic, but this largely has very little to do with Tectonic. I am a huge believer in making sure that, part of the reason why you're looking at Kubernetes is to make sure that things are as agnostic as possible within some realm of general influence. So we are going to make sure that these are applicable across whatever district of Kubernetes you happen to be using. The where of this are, what are the considerations kind of broken down by the compute environment? Are there specific things that you have to worry about on DigitalOcean versus packet.net? Is there anything that's exclusive to bare metal versus AWS? And finally, getting into the why, can you answer why you want to do this? Because is not an answer to that, or at least a valid answer. So we want to make sure that you're able to break some of this down. And then with the last one, how do you actually achieve these goals? So, as I like to say, kind of in a from the trenches type perspective, let's jump step one into the past. So we need to make sure that we don't forget the past. And one of my colleagues coined the term cloud kids as a pejorative for himself. He, back when I ran the infrastructure team at CoreOS, he said that there were a whole lot of things that just weren't possible. And that he as a cloud kid, because he had ran or he was working on an infrastructure team and had never touched a physical server. That blew my mind. An infrastructure SRE type individual who had never grabbed a piece of metal and kind of worked with it. So for the cloud kids in the room, this tidbit is for you, for everybody else, we'll move through it real fast. So to understand where we're at, let's analyze an example from the past, which for this one, I will pick Oracle real application clusters. So to use Oracle Rack, you had to have a sand. And to understand what that starts to look like, let's look at this very pretty picture. So in this picture, we have two compute nodes. We have two disks. You'll notice that each of those disks is connected to both of the servers concurrently. This is not something where you're kind of doing an EBS kind of magically attaching and detaching. Both servers can access both of the disks at the same time. That's part of the basis behind a sand. And it is up to the things that consume those resources as block devices to be able to intelligently consume them. Now, for Oracle Rack, they understood how to partition out the use of that disk by machine to make sure that if one machine failed, another machine was able to take over the consumption and utilization of the rest of that disk to be able to serve resources. This was for high performance more than, or it's for high availability, more than high performance. But the reason why folks began to think of this as a pattern that you couldn't do was because the way that most people consume block devices is not directly attaching them to some piece of software like Oracle Database, but instead they put file systems on them. And if you don't have a file system that is cluster aware, like GFS2, OCFS2, or Luster, that's L-U-S-T-R-E, not Gluster, then you are going to end up in a situation where when you try to concurrently utilize that file system on two hosts, read, write, you'll probably corrupt the file system. Now, for what it's worth, there are still useful things that we can do with this, but in the end, using a SAN also meant dealing with what are called worldwide names. So for a real quick thing of welcome to the world of worldwide names, this ugly thing is a worldwide name. So to make it just a little bit more understandable, we're gonna reformat it, and we'll reformat it again, and we'll keep reformatting it. And as you go, you notice that it starts to look like a MAC address. The reason for that is, is because it's basically a MAC address for a disk. The IEEE issues what is called an organizationally unique identifier, and those OUIs are used for hardware manufacturers to be able to produce what are called media access control numbers that are used with network cards, and storage vendors use them to give individual serial numbers to dynamically provisioned disks. So in this sense right here, this gives us the sort of high level overview of what it starts to look like, because now you refer to a disk by the worldwide name within the operating system so that if you have multiple machines concurrently accessing the same disk, they're able to access it by a stable ID. This will make more sense in a moment as to why I'm talking about this, but this lack of knowledge about things like SANS and everything was largely driven by the cloud. And nobody needed to really know what a LUN was anymore or what it meant to be zoned into a host, but your SANS doesn't limit you to those things as well. And as I said, we should not get bogged down by what the cloud can't do, because we will come back to that. So moving into a bit more of the present, let's talk about the planning and any time you want to develop a plan, you've got to map some of your needs. So first and foremost, know the problem that you are solving for. Are you trying to do multi-cloud or multi-envirement things to be able to achieve higher latency? Hopefully not. Hopefully you're trying to get to lower latency, but is it that you are trying to increase throughput? Is it that you're trying to ensure that your data locality puts your services to as close to your end users as possible? Thus, a user accessing your resource in Singapore gets sent to a data center in Singapore, and a user accessing your resource in London gets sent to a data center in London. Are you trying to optimize for other types of resource utilization, like RAM or CPU? Are you trying to avoid the next five nines attack? Do you not want to be e-corp, kind of responsible for having all of your data destroyed because it was in a single location? Or especially present to the world that we live in right now, are you trying to avoid disaster through global thermonuclear war? Hopefully this is not something that you are trying to protect against, but in other words, what we are talking about here is you need to know what your failure domains are. You need to understand whether you're trying to protect against disk loss or data center loss. Because sure, you may be provisioning things at an Equinix data center in Ashburn, Virginia, and then a tornado comes through and takes out your resources and S3. So if you're not thinking about some of those things like region locality and kind of how your resources play out across different providers as well, you may not be protecting against your failure domains. So the big question is, are you prepared to answer these questions? I have a hint for you. So now that you have at least thought through this and you're starting to map out what your reasons for wanting to do this are, this means that you are going to have to define your environment via some configuration manifest. What this means is that you're probably going to do things in the realm of configuration as code. So how do I get this done? This is where things start getting actually interesting for folks because I dig into a little bit more concrete ideas. This means your configuration files must be in some version control system. Even if in a strange way this version control system is you are stuffing them as objects into Etsy D where with Etsy D you can then go back through and play back the index over time to be able to recover things. Now, in reality, you probably are not going to do that because we have a much better system for versioning text documents and that is git, but you must have something like this. If you do not have this, you are doing it wrong. On top of that, I use this beloved piece of software to react to the changes in those repositories and monitor and take action when things are occurring. Now, if I'm doing this with a bunch of Terraform definitions, there's probably going to be secrets involved in some form. Now, because, like I said, I used to run the infrastructure team at CoreOS, we've been doing this for quite a while and all of the pieces of software that are available today were not necessarily available in the past. So the way that we historically got this done was using a utility called gitcrypt, and I'm going to actually give you a snippet of Jenkins Pipeline showing exactly how we use it. It's not complex, but gitcrypt gives you a GPG-based mechanism for pulling secrets out of encrypted at rest storage, using them while you're done and then purging them from memory. But just because I am doing it with these tools does not mean that you have to do it using these tools. There are other things that can be used in similar fashions, like you can use Terraform, you can use things like cops, you can use Kubespray. All of these are mechanisms for kind of bringing up your underlying compute, which we'll also talk about here in a moment. You can also use things like Helm charts to actually manage the lifecycle of your applications and for anybody who sat through Vic's talk right before this, you kind of were kind of handed some direct design patterns on how to successfully use these. But again, the tooling is less important than committing to the process. Everybody loves to dig into tools and be like, well, what's the thing that I can just install that does this for me? Well, I've given you a list of things that you can just install and just do that for you. But if you use Perforce, yeah, there's a whole lot of feelings I have about that, but just use Perforce. If you are one of those dyed in the wool like Jenkins is garbage and we have successfully migrated everything to drone.io or to Travis, great. You have a solution. Use the solution you have and the one you understand. These are ideas and these are design patterns that you use to build these systems in this way. So continuing to move on. This is one that you have to get burned by it to know that it's important, but this is to say, if you deploy three clusters with Kubernetes, you should have three distinct ranges of addresses that are used for those clusters. Yes, Amazon will let you use the same 10.0.0.0 slash 16 in every VPC you set up. It does not mean you should do that. Back in the day, the kind of state-of-the-art system for IP address management or IPAMS was an Excel spreadsheet that sat on a shared drive. The synchronization problems with that were definitely non-trivial. If you have nothing, Google Sheets is good enough. If you're paying for info blocks, right on. But it does not matter whether you are in your data center or whether you're referencing the assets on a cloud provider, make sure your network ranges are non-overlapping. That now means that you can start to do things much, much easier where you have just dynamic site-to-site VPN links to allow your users to just access the resources that are on a cluster and firewall them at layer three and possibly layer two rather than having to get into NAT translations and everybody goes through one address and then you have to have proxy filtering and everything else. So this does not seem like a big deal, but just do it. And if you need some help with this, I had an intern last summer write a utility to make this easier to do. And I'm going to move on to an example of this, but we will have this back up here in a moment. So let's actually look at this. And for this I'm going to try to refresh it. There we go. So what we've got going on is I run the CLI utility and I have two different failure domains and an arbitrary address range that I pick. I pick it saying that we have a slash 20 that we want to use for the whole environment and it breaks down what the ranges should be to have a public, a private and a protected network as well as extras. I change it to a different subnet size. It does all the math for me and fixes everything. I then decide that, hey, I want to have a different number of failure domains. So I can just do that and it takes that slash 21 and now cuts it down into smaller chunks to make sure that I have a reasonable breakdown of all of this. Now maybe I don't want it just in a table. Maybe I need a Terraform definition for bringing this up on AWS. So it renders the Terraform definitions giving me the NAT gateways and all of these other things. Gives me the routing tables. And yet I know that everybody is not using Terraform so maybe you want a cloud formation and we can just do this and render a cloud formation as well. These are common design patterns that everybody should not be reinventing the same wheel. And that is the value of open source utilities like this. So if you need something like this, we have the ability to make something like this available. So continuing on here. There we go, perfect. So the next step, knowing what your storage is. Stop freaking out about storage. Seriously, I have sat through enough talks here and enough lightning talks and everything of being folks going, what are we going to do about storage? When is storage going to be a solved problem in Kubernetes? Guess what? If you are running any reasonable size, it is a solved problem. So let's talk about it. What is your platform? Are you using AWS? Cool, you've got EBS volumes. Are you using GCP? You've got persistent disks. Are you on Azure? Well, they have a whole matrix of managed and unmanaged and premium and standard and all of that. But it's more important to think about what's generically happening here under the hood. So we'll walk through an example. So I have two servers, server green and server blue. And I have an EBS volume attached to server green. And then that tornado strikes and server green goes down. It comes unavailable. And the whole point of my cluster scheduler in knowing about these remote storage objects is that intelligently de-schedules that disk from being attached to server red and attaches it to server blue. And to da, my stateful application still has all of its data and is able to migrate to a different system. Now, the whole point here is that Kubernetes is giving you shoulders to stand on. The what are we going to do about storage is a concept of what are we going to do without the concepts of persistent volumes or persistent volume claims or storage classes or all of these other things? By using these primitives that are built in and saying I have defined a storage class that is the default for this cluster, meaning that if I'm on AWS, my default storage class is just use EBS volumes. And if I'm on packet.net, it is please just give me their remote block storage. Then you are able to make better and more intelligent decisions here. That whole concept that I just showed you is the read write once mechanism. And you'll notice that that process of migrating a disk from one machine to another, that is the missing piece in historic compute systems when you have a SAN. Because you needed some mechanism to intelligently go, hey, this host has gone down, reschedule that workload on a different host and attach the disk. There were entire mechanisms like, what are the tools? There is CLVMD, which is like a whole clustered LVM system that uses multicast addresses to be able to dynamically change a disk that's mounted from being read only to read write. But that brings up a very good point. These are design patterns from the past that folks are now just relearning. And Kubernetes being built by a number of individuals who have also interacted with these design patterns in the past knows that. So you're able to have a disk that is simultaneously mounted into multiple machines and is available read only. And you're able to have a disk that is simultaneously mounted into multiple machines read write, and that is read write many. But just because you've never done it on the cloud, again, it doesn't mean it's not possible. So let's talk about bare metal for a second. Because hint, hint, it works extremely similarly. You have some concept of a disk that is remote, possibly it's via a SAN using iSCSI or Fiber Channel. And you've got HBAs in place that know how to use that. Possibly it's a Cinder volume because you already have some OpenStack service running and you just wanna be able to use that. Possibly you've set up Ceph and you have either CephFS, which if you have CephFS now means that you can have that file system concurrently mounted on multiple machines read write. And when you make a sync call on a file down on machine number one, you're able to immediately see that data on machine number two. Or possibly you just use the RBD mechanism from Ceph where you're still back to just the vanilla kind EBS type mechanism of I have some storage object. I want it to be available as a block device. Please make that available. But the big thing is you just need an API for your storage engine. And if you're already paying for NatApp, they give that to you. If you're already paying for EMC, they give that to you. And if you're not paying for any of those things, then guess what? Other APIs do exist for storage and it doesn't have to be expensive. So I am a huge open source Rabbler. I started out my career doing a little bit of Windows and then I saw the light and I saw that draconian evil open license called the GPL and learned that the BSD license was far superior because it had more freedom. And then I learned the air of my ways. But regardless, free BSD is still a legit operating system and in FreeNAS you have the ability to use ZFS whereas the prospect of doing that on Linux is still a bit dodgier. So what does this look like? When you download that FreeNAS ISO and you decide you want to have this storage area network so that you can use your Kubernetes cluster on the cheap and still have all of these types of easy disk accesses. Well, you have some disk and you have some Meet API which is probably going to be you and you have some server and you install the software in the server that has the disks and then you are going to configure RAID so that you have redundant or even better a redundant array of inexpensive disks and then you just take the easy route and you export iSCSI and now Kubernetes which natively understands how to support iSCSI is able to just take these block devices and plug them in and you don't have to go to the effort of thinking about all of this. Now, getting back into this I promised some more thoughts about setup. So let's take a moment and let's put all of these pieces together. So what do we need to worry about here? Well, hint, they're the things that you should already be doing. So you have a single sign-on system in place, right? Because you don't want to be running local users on the machines that power your cluster and you don't want to be running local users on the cluster itself. So that means that you should have some component like DEX hooked up and you should then have that talking back to some identity provider like LDAP or like OIDC. You also are going to need some log aggregation mechanism. That is to say collecting logs from the host from Kubernetes scheduler from your applications in a central location. And if you don't already have Splunk or you don't already have something that you set up today, then I highly encourage you to go with this relatively easy thing to setup. And I say relatively easy because you will get it up and running and it will be perfectly fine. And then when you get to the point of having about a terabyte or two of logs and you have to start learning elastic search re-indexing, well, life is an adventure. Good luck with that. But when you are just at the realm of getting started here, this is going to work just fine. You are also going to be wanting to do reasonable monitoring and alerting because you need to measure the performance and alert on all of your problems. That means that you will want to have Prometheus running. You will want to get alert manager configured and you will want to have Yeager in place so that you can actually trace the statistics of your application. This is not rocket science. This is just making sure that you are doing the same things on all of your clusters so that you have central places of being able to go, I go to my tracing dashboard and I am able to trace components of the various applications that I have regardless of where they happen to be at. The big one here, it's always a DNS problem. I make every single person who works for me read the first three chapters of DNS in Bind and they laugh about that and then I tell them I'm serious and they will read the first three chapters of DNS in Bind because it is always a DNS problem. So this means properly federate your DNS. When I say that, I mean DNS has a global scope. Do not pretend that it does not have a global scope because when you do that you are doing yourself a disservice. When I logged into our developer, AWS account, and I saw 200 tectonic.local zones, come on folks, what are you doing? There's not 200 concepts of tectonic.local. That's the whole point of namespacing. We have this globally namespaced resource called DNS. That means that if you are doing things just think about it because if you have an LDAP system, for example, and you want to be able to talk to or fail through a number of LDAP systems, one, SRV records already handle that for you, but if you don't necessarily want to go that route, let's think about it this way. If you named the host names of your LDAP servers, LDAP.sfo.example.com, and you name your other LDAP servers, LDAP.nyc.example.com, and LDAP.ber for Berlin.example.com, and then you use the built-in mechanisms of DNS resolution like end-dots and like search suffixes where you are able to provide a priority of search suffix resolution, then that means that you can say, I wish for all of the San Francisco resources to automatically attempt to resolve whatever my name is.sfo.example.com, and if that does not work, then they may fall back to nyc.example.com and ber.example.com and justfinelyexample.com, and that now gives you the mechanism of saying when you are deploying an application to your cluster, the name that you are trying to request is just LDAP. And when you deploy that same application to your resources in New York, it will just go to the local resource. So this means that you can either utilize components like core DNS with your cluster to handle some of this type of federation and to be able to expose both the cluster.local type concept as well as federating that out to your external DNS systems, but it also means you're just going to use your external DNS infrastructure. Do not reinvent this wheel. One of the most beautiful things that OpenShift V0 and V1 did was that you defined a domain for your OpenShift cluster and then you delegated that domain there and then as they dynamically handed out host names or you requested a host name, it automatically appended that domain name to whatever your application name was called and because it was federated through the just vanilla DNS infrastructure, it was just available because especially when you start building microservice type applications, you don't really care what the host name is because it's always going to be behind some other API proxy or in the case of Kubernetes ingress controller or something like that. Finally, you're going to want to sync your RBAC configuration generally across your clusters. This means today our trusty Butler or Spinnaker can handle some of these things. It's the concept of continuous deployment where as you are making changes to those configuration files that you have in revision control that Jenkins is monitoring that it is able to pull down the credentials from GitCrypt or Vault and then be able to apply those changes to the cluster. Traffic distribution. Cluster traffic needs redundancy. This is to say that utilize either your Cloud load balancers. If you are you're paying for F5, it doesn't mean that you should just throw away your F5. Now, it does mean that you should be looking into BGP and equal cost multi-path. And if you already have a properly configured clause network or leaf spine, same thing basically just ones the fancy academic name, then you should be utilizing that. Even if you have a small number of cluster or a small number of machines in a cluster, it's not unreasonable to set aside address ranges and to use static routes to route that, again, equal cost multi-path to all of the nodes in the cluster. It does not mean that you will always have the most optimal traffic path resolution if you are doing static routes. You've got BGP, you can do that, but it means that now you don't have to worry about having separate pieces of hardware which you have to buy two of for everything and then setting up heartbeat connections to them and then learning their API and everything else. Because in this case, BGP is a simple enough API to do this. What this means is that when you are taking all these components, you need to demand having network APIs because in distributed systems everything becomes a network resource, whether it is storage or whether it is a database as a service or whether it is something else. So now we move into the execution of this and we discuss how we begin to take action. So step one, cluster provisioning. Understand that there are separate stages here. You deploy hosts and those hosts have some manifest and you have clusters and that cluster has some manifest configuration. It then means that as you do that you're going to normalize and templatize your host configs. That's very easy to do with container Linux. Same manifests can be used generally for both bare metal machines and cloud and if you're using something like Kickstart and cloud configs in separate systems it may, you just want to break things down to kind of the minimum state or kind of use Ansible to manage that and avoid static configs at all cost. If you think you need static IP addresses you probably don't. There are cases where you may need that but again if your answer is well it's because we've always done it that way. That's the wrong answer. Things like DHCP exist for a reason and things like DHCP have been made redundant for a reason. These are things that you need to begin exploring. So let's get into the cluster configuration. So the things that change over time are the cooblet flags. Ensure that everything is, as we like to call it at CoreOS, under management. That means that you have the ability to change the cooblet flags if new fields are deprecated or new fields are added. Also use robots to do your bidding and by robots, I know as I said that people love to hate on Jenkins but this butler does my bidding quite well. So this is an example of like the crazy matrix build that we have that deploys things to different clusters and different regions and does it automatically based on if it's a production resource or it's a development resource. And when you start getting into that let's talk about this step by step. So first you're going to unlock your credentials using grypt or vault. This is where we pull a key from escrow. It's pretty simple. It's literally a bash script that knows how to call grypt and is using variables embedded into Jenkins and some of the identity and secret management built into Jenkins. From there, it just gives us the ability to pull this in as a pipeline stage. So I can unlock my credentials, then I do a git clean and a terraform validate. Pretty straightforward. Again, it's calling these things as bash scripts in a Jenkins pipeline. The validation script itself runs terraform validate but then as a lot of you are also go developers and you know that you should be running go-fumped on your code, you should be running terraform-fumped and we have components here that exit non-zero because this is the Van Halen's brown M&M's moment. It shows whether or not you actually read through that contract and made sure that you were doing all of the things that were required. After that we plan, we send a message over Slack asking a human being to confirm that the work should be done and then we deploy it with terraform apply. So working in a cluster this way, adding a new cluster is as easy as defining an environment variable. Then you have to maintain what you've got. So we also use a similar set of Jenkins jobs to manage these environments. Again, we define the environments, we give them names, they have paths to the location of those configs within the various repositories and we use a similar set of processes. Get crypt, get clean, helm lint and helm upgrade. And thanks very much to Brad Eisen from the infrastructure team who did a lot of this work and really insisted on making that the way that we did everything. So then we have manual operations, don't do them. Seriously, don't do them. Okay, fine, if you're going to do them at least make sure that you export things. So thanks to Duffy Cooley for this one. I had no idea about this until he pointed out that there's an export flag that you can use with Kube CTL get. And this removes all specific information that is to that cluster from the resource. So then you're just a Kube CTL apply away from mucking up all the work that your robot has done. So finally disaster recovery, things will go wrong. This is LA yesterday. This is the 405 freeway going down by the Getty Center. So for all those folks who came from Los Angeles have fun going home, have a plan and get prepared to be punched in the mouth and test the plan, do things like backing up LCD and running multiple clusters in different failure domains and use certain purpose built tools like Heptio Arc which is a utility for managing disaster recovery specifically for your Kubernetes cluster. So who is you? What is making sure that you know how to build out a multi-cloud and multi-environment Kubernetes system? I'm 19 seconds over, which is why I'm rushing. So what are the common misunderstandings? Understanding or thinking that clouds are beautiful and unique snowflakes and you have to do things differently. They're not. You just have to think about them differently. All of these features are available now. This is not things that you have to wait on. You can take this and do it all today. So knowing why, yeah, how, yeah, we can do all this. So tectonic, you can free for up to 10 nodes. It has hooks for doing a lot of this stuff automatically. You should check it out and we're hiring and that's me, thanks.