 Hi everybody. Thank you for coming to my talk. I'm extremely nervous. I'm excited. So this talk is about MySQL and Kubernetes and my name is Patrick Galbraith. Something about me, I've had a long lasting relationship with MySQL. I've worked on MySQL for MySQL, various other companies working on MySQL. I've contributed to MySQL. I worked on various MySQL projects, Memcache, UDFs, DVD MySQL. I'm a pro driver. I still am a co-maintainer of that. Federated storage engine. I contributed the original example of running Galera on Kubernetes in the examples directory for Kubernetes. So I love learning languages, whether it's programming languages or spoken languages. I love being with my family outdoors. I love Chile. It's the best place I've ever been. And as you can see by this page, I'm not a designer. So I started working for this great company, Oracle Dine in September. And I'm having a great time. What we provide, and most of you probably already know this, is DNS. We provide DNS for some of the world's largest and most admired web properties. We have the largest, fastest, and most resilient DNS networks in the world. Over 3,500 enterprises, brands like Netflix, Twitter, LinkedIn. We also provide email for organizations with large sending needs. We also are building a cloud, Oracle Cloud Infrastructure, and Dine provides edge services for that cloud. And we're hiring. This is on the slide, so please do take a look. So what is this presentation about? Well, it's about running MySQL on Kubernetes. And some of those things, databases are extremely complex pieces of software. And Kubernetes actually makes it easier to run databases. We're going to talk about stateful applications, which the database is, a complex stateful application. And in that, we're going to talk about stateful sets. We're going to showcase this with Vitess by talking about Vitess and doing a demo. We're also going to talk about operators. And I'm going to be very excited to talk about the MySQL operator that Oracle has been developing and give a demo of that. With time permitting, I have some extra slides. I went crazy. I had 60 slides at one point. Somebody said this is a two-day seminar. So that's why I'm speaking with some cadence here. And please do save questions to the end. So it's a tale of two open source projects. We have MySQL, which has been in software years, been around for 20 years. That's quite a while. And it certainly is ubiquitous and everywhere and used so many places. And there are a lot of variations of MySQL permutations. I heard even heard somebody was telling me about the work that Alibaba has done. I know friends at Facebook, they've made their customization. And then there's Kubernetes, the Linux of the cloud, as we heard it at this conference. I got to say the fastest growing open source project. It's application deployment done right. Definitely community driven. Look at all the people here. 4,000 people. That's fantastic. And it's a moving target, rapidly developing. When I started this preparing for this talk, I had an understanding of when I had put together the original Galera example and things have changed. So I had to do a little bit of catching up. So with databases, it's a question of pets versus cattle. A database containers tend to be more pets than cattle. That seems obvious. You care greatly about them, like your pet. A pod of multiple databases or instances or one. The description of pod is we get to run a bunch of containers on one node. Well, we don't necessarily want that with databases. And the documentation it says, Kubernetes pods are mortal. Mortality is not something we necessarily like to see as an attribute in databases. A database is a complex stateful application. We want consistent access. We always want the same database to be there. Safety don't scale if it's unhealthy. That's a bad thing. And of course, persistent storage, which is kind of an obvious thing for that. So here's some ingredients. I like this picture of chats. I love chats. They're delicious. And so we have some delicious ingredients with Kubernetes. We have services which I'm not going to talk about what that is. Everybody knows what that is. Persistent volumes. Persistent volume claims and storage classes. And then of course, we have a newer resource on the block, stateful sets. And then within that there, we can talk about init containers if we have time volume claim templates. And then there are operators which allow you to extend Kubernetes and they utilize custom resource definition. Then we have sidecar containers to do some handy tasks for us. And node selector, I was going to talk about this. I probably won't have time, but it's definitely something to know about for databases if you want to run a database on a particular node. So first of all, we're going to talk about stateful sets. So stateful sets provide guarantees about ordering and uniqueness of pod resources and pods themselves and the way that they're named. They maintain persistent sticky identity for each of the pods across rescheduling. It also ensures ordinality. When you run pods within a stateful set, it's going to be zero through whatever number you're going to run and they're going to be named that way. They're also interchangeable and you won't run a pod with the same name. You won't have two of the same thing. They have a stable network identity. First of all, the name of them is whatever the service name is dash ordinal of how many are ever being run and then you create a headless service. If we have time, we can discuss this and that is the domain name for that pod and that's how you would access it. They have stable storage and stateful resources. Pods remount the same storage through restarts. So when there's a restart, it's the same thing as it was before. Safety, we don't scale if it's unhealthy. So we now have a, I'm going to show these two illustrations. They're really good in kind of bringing the point home as to what the differences are. So with a deployment replica set, they're not started in any specific order. The name is, you have a name and then you have a numeric value of the deployment or whatever idea there is and then you have a random alpha numeric and that changes from, for each pod, they're all different. It's not fun to try to log and figure out what containers are running because you have to look and see what they're called. And it'll still scale if there's a crash and it'll replace with another non-unique name. But with a stateful set what we get, we're started in order. You don't have to sit there and say, I want to run this one, this one, this one, like I did with my original example. The names are unique and distinct and it must replace, if there's a failed node, it's going to replace that same node and it's going to be that same node for all intents and purposes. And you don't scale the cluster when it's unhealthy. So then we have operators. It's an application-specific domain controller. It encodes application-specific domain knowledge through extending the Kubernetes API. What does this mean? This means we as operators, we do certain things. If I run the Galera cluster, I do it in a certain order. I maintain it in a certain way. With operators, you build that into it. You build that into the custom resource definition when you create your operator. It enables users to create, configure and manage stateful applications because you can add that functionality in there. And it builds upon resources and controller concepts and it leverages Kubernetes primitives like replica sets, stateful sets and services and other resources. An operator executes common application tasks. You design it to run whatever you want within your application and it does it for you and hides all the gory details. It's all installed as a deployment. Once it's running, you utilize it. And you run it with custom resource definition types. Some of these here, you see at CD cluster, Prometheus and MySQL cluster and MySQL backup, which I'm excited to talk about. So I like this graphic. This is one I copped from Corus's website where they discuss operators, but it's really nice. And I saw this morning in the keynote, there was something just like this. What an operator does is it observes the state of what you have running. And in this example, it shows, oh, I only have two running. Oh, and one of them happens to be 3.1.1. I really want 3.2.10. And that's where the analysis kicks in. Analyze, and it says it should be the version 3.2.10. And there should be three members. So now we're going to act. We're going to bring up that missing node. And we're going to also upgrade. So this graphic here, it was more about me wanting to talk about stateful sets, but it works well with operators as well. This is an example Anthony A did of a simple MySQL replication. And what it utilizes is you have a simple pod with two init containers. The first one is init MySQL. It sets up the configuration file for MySQL, whether it's a master or slave. And then there's clone MySQL. What it does is it receives a backup from the extra backup. If it's a master, it doesn't do that. And it does this based off of it being ordinal number zero. And then there are the app containers, with MySQL obviously. And then there's an extra backup container. Extra backup. And what it does is once MySQL is up and running as it starts a backup so that the next subsequent node that's running can receive the backup through the init container. And why do I have this up here? Well, there's some complexity to this. And this is the kind of thing that you would put into your operator. And that's what's really cool about operators. So the MySQL operator we have arrived at this point. I always wanted to do this is one of these things that I thought, oh, I'd be great to do this. Well, I happen to work at Oracle now and we have a group in Bristol who's developed this. And it's scheduled for release early 2018. You can deploy a highly available clustered MySQL instance onto Kubernetes with a single command. It watches the API server for custom resource definitions related to MySQL, MySQL cluster, MySQL backup, backup and restore are made simple with the MySQL backup resource. Utilizes persistent volumes, persistent volume claims. Under the hood it utilizes group replication. And you can have automated backup to restore to from different object storage, scheduled and on demand. So MySQL operator, some of the features, I've already mentioned some of them, but this is a nice graphic. You can see the operator here is doing all of the tasks with the MySQL cluster you would normally do as an operator. And the Kubernetes API is through custom resource definitions has it run those things. It provides configuration for your MySQL cluster with simple Kubernetes objects like secrets and config maps and backups. You can define what kind of backups you want, what the policies are to make sure your data is always protected on demand or scheduled, as I mentioned. Here's an example of... I'm trying to see where we are in time. So here's an example of creating a cluster. We define our persistent volume in its file here. And then we have the MySQL cluster that we define. We say we want three of them. We want to give it a name of example MySQL cluster with volume. We use a volume claim template. I love these. These are great. They just happen to work. And you say what storage class name you want and you submit it and it creates a cluster. And then with a backup, same simple situation, you have your cluster, MySQL-test cluster, and then you specify that's what cluster you want a backup. You say what your credentials are, OCI is our object storage, just like S3, and you specify that that's what you want a backup. And it backs it up. And the commands are... You use kubectl to create it, both your backup and your cluster. You can also do kubectl get MySQL backups to see what backups you have, and then you can get a specific backup. You can also describe backup. I don't have that in there, but you'll see that in the example. So now we're going to run a demo, drum roll. So the initial state, nothing is running? No, I can't. That's the way it was recorded, and I didn't do that particular recording. I tried getting as big as possible. Pardon? Come on, plus, plus. Okay. Nope. Okay. So we've created the operator, and now I got to focus on this. So now we're going to tail the logs. We should see that we have the operator running. We see that it's running, a replica set. So at this point, we can now utilize the operator. We're going to tail the logs of the operator and see what's going on. It's kind of... I like the way this was recorded that you can see this. Now we're going to go into the cluster manifest file. We're going to do three replicas. We're going to call it MySQL test cluster, kubectl create-f, the name of that. And at that point, you can see the logs that it's starting to bring up the cluster. Now we're going to watch the pods as they come up, observe each of the nodes coming up, and also go in the kubedash and watch this as well. So we see the first one starting to come up. And also we have a secret in here that's been created. This is how the database, you'll talk to the database once it's up and running. We now see that the first one and the second one are now running. You can see the messages. The third one is now starting. You see inside of kubedash that it's almost up. The third node is up. So at this point, we should have a running cluster. So now we're going to go into the first node, and we're going to connect to the database. We're going to find out what the database password is from the environment. We're going to use the MySQL iNoDB cluster made a data schema, and then we're going to find out what the group membership is from hosts. And there we have a three node cluster. That was easy, wasn't it? So the next thing I'm going to show you is a backup. This is even easier. So initially we have nothing in our object store. We have a running cluster. Now we have some credentials and a bucket name. We're going to look at the backup manifest, make sure it's what we want to back up. First thing we're going to do is we're going to actually put some data in there, and make sure that we have some kind of database to refer to when we back things up, create a simple table, put a row in there, just pretend it has billions of records in there. We're specifying what database we want to back up and the credentials. Now we submit that backup. We create that backup, I should say. Backup is scheduled. Backup created. We have a list of backups. Now this backup is a MySQL dump at this time, but we're adding functionality for enterprise backup and extra backup. That'll be configurable. Now we're going to describe the backup. And we see backup.file.target.ez and we should see it in our object store. Yes, it's there, isn't that great? That's exactly what we want to see for ease of backups. So now we're going to talk about the test. The test is sharding on a silver platter. I know it's not a silver platter, but I love the album. It doesn't have anything to do with my age. It's a database solution for deploying and scaling and managing large clusters of MySQL instances. It has built-in sharding. It's been YouTube's database store since 2011. It's what Slack uses. It looks like a MySQL database in terms of usage, but underneath it can be sharded in any way that you want it to be. It's cloud ready. The cloud is coming, as Sugu would say. It helps you scale with transparent dynamic sharding and the ability to continue sharding in or out, depending on what you want to do. It also has cluster management tools for backups, shards, and schema management. It talks MySQL client protocol, GRPC, and GRPC. It provides connection pooling. These are GO connection pool that talks to a small MySQL connection pool. It protects MySQL with query deduping, rewriting, sanitization, putting limits on unbounded causes, and a number of other things. There are monitoring tools that are built into all of the test binaries. If you want to migrate your MySQL application to the cloud, the test is an excellent vehicle to use. So here are the components of the test. We have, first of all, your application talks to VT gate. It's a lightweight proxy used for routing queries to the appropriate VT tablet. And it also reassembles the result sets from the different shards. So VT gate then talks to VT tablet. VT tablet pause or comprised of VT tablet itself, which itself is a proxy in MySQL. VT tablet is a proxy that does all of that query sanitization and blacklisting, deduping, putting limit on unbounded where it causes. Then next we have the topology server. Topology server is essentially a service, a lens in front of at CD, and all of the data, the coordination data, the key space graph used to determine where the shards, which shards you use, it's also used for initial discovery of what's there. And then you have VT CTLD that provides a web interface and it does all the cluster management work. And then another piece in here that I added that's not on the graphic on the website is orchestrator. Orchestrator pertains to reparenting and it talks to VT CTLD and through that it does the reparenting when that needs to occur. And then there's VT CTLD, that's the command line for interacting with VTES through the topology server. So sharding. A shard consists of, as I mentioned, a single master that handles rights and one or more slaves that handle reads. You can split horizontally, you can split or merge shards in a sharded key space. You can also do vertical sharding, moving tables from an unchartered key space to a different key space. You can do resharding, split, merge, add new cells and replicas. It does this through filtered replication using GTID, which is the key ingredient to ensuring that your data goes where you expect to go. There's also the concept of V schema. This is a VTES schema. This is essentially the schema for how you shard. Think of it that way. It ties together all the databases. Simply put, it contains information needed to make VTES look like a single database server. And then there's V index. This is a cross shard index that provides a way to map what, in your table, what would call them to your key space ID. And I'll show you an example of this. So right here we have, you know, on the left we have our regular user table schema definition. And then on the right we have our V schema. We list what V indexes we have. The first one is called hash, and it's of type hash. Then we say what tables the user, and we say what column we're going to shard by, and what the V index name is. And it happens to be hash, as we just saw up there. And this is what you submit to VTES, and that's how VTES knows how to shard your data. There's also the concept of reparenting. We can change our shards master tablet to another host. We can change a slave tablet to a different master. As I mentioned, GTID is the magic in all of this. There are different types of parenting. There's a planned and emergency. It kind of speaks for itself as to what it is. It also provides resharding. You can split, merge, or add new cells or replicas. And I've said this several times, but filtered replication is the key to this with the GTID. Backups. Backups are really simple with VTES. And there are a number of plugins for different types of object store, or NFS, Ceph, Gluster FS, I believe. And the thing to know is VTTablet has to have access to where you're backing up to, and use VTCTL to run these backup commands. And when a backup happens, minuscule instance is stopped and it copies the data. That's how it does it. And this works just fine because we have the master to still continue writing to. We also have restore. The VT tablet has started with the arguments of where the data is. And restore is made. And then here's an example of some of the commands you can look at in your spare time when you look at my slides. So now we get to the fun part. We're going to do a demo. And what we're going to do here is we're going to create two shards. We're going to load a database schema and vschema, scale one shard from two replicas to three. Then we're going to delete the stateful set. We're going to pull the carpet from the rug from under its feet, and then delete the master pod and then see VTES do some magic of reparenting. This demo is made possible. It's a demo by Anthony Yeh. You've heard his wonderful talk this morning, his keynote. And this is where you can go play with it yourself. So now what we're going to do to run VTES is we're going to install the Helm chart. There should be nothing here with VTGate status or VTCTLD. Helm chart is installed. So once this is installed, we should start being able to look at our stateful set and then see what pods are coming up. The zone one dash main. Those are the two shards. We should watch them come up. There'll be some errors. This is a bug. I won't go into this. This is a known issue. It's nothing to worry about. So now we're going to watch things come up. The initial state is there's no replication running. We have not set that up. Things are still coming up, as you can see in cube dash. You see the load going up. The graphs are showing things happening. So now we're going to initialize the shards. We're going to initialize both shards. We're going to apply a vschema. We're going to apply a regular schema and then we're going to back up both. So now you'll see replication get started and a master and a replica set up. You can see here it's already happened with the dash 80 shard. And now with the other shard it's coming up as well. Both shards should be ready at this point. We won't see any traffic on it yet. We're not running anything against it. That's going to change soon. No traffic. It's nice to have graphs there. You don't see anything in the query log either. There's a nice query log UI there that's built in to VTCTLD. I get that messed up. I'm a tongue twister here. So now we're loading. The script is loaded. We now have a master and a replica. We're doing a backup right now and it's not serving while we do that. We're going to watch the logs a little bit to see what's going on. The backup's almost complete. The status is backup. Right now that means it's not serving. Now the backup's complete and it is serving. It's now ready to use. So now the manuscript is completed and we're going to run a load test against it. This is really fun to watch because you get to see the graphs. Load test is now running. We now see traffic showing up against both shards. We're going to look at the master traffic and see what's going on there. We see a different, various statements running against the database. Write statements. Also going to look that, you know, we never did do this when we were initializing it, but we're going to look at that the schema is actually there. We're going to look at the query log, the query log UI. You can see all the different queries running against the master at this point. All write queries. Write statements, I mean. You see a color code in the commits. That's really nice. We're going to look at the traffic on the slave. All selects as you would expect. Look at the query log. All selects. Now we're going to see how many stateful sets we have. We're going to scale this one of the shards, 80x shard to three. We're going to watch the new replica come up. It's now scaled to three. We're going to look in the UI. It now has three replicas. They should be ready. Right now it's doing a restore, and the restore is getting completed, and now it should be serving traffic. The load test was running, so it immediately starts serving traffic. You can see in cube dash that it's becoming even more busy with the load test running. So now we have deleted the stateful set, and we have deleted the master. So this is where we get to see Vitess do some really interesting things here. We confirm that the master has been terminated. Now we're going to see promotion, and now what was once the replica is now the master. All without pain. Now we're going to tear it all down, because the demo is going to be over soon. So that's the end of the Vitess demo. What time is it? 4.21? So we have about, what is it? About four minutes for questions. I have these extra slides. Please, when you look at these slides, please look at collect beats. This is a great thing I wanted to talk about. There's a friend, Vijay Samuel, who works for eBay, who's written this. I wanted to cover it, because I wanted to talk more about monitoring. And I also have some other MySQL installation examples, because I wanted to do simple. I wanted to work up from simple to more complex, and so I have these in here. I have demos recorded for them, and I'd be glad to take questions now or later. Oh, there's a thank you. Thank you, Anthony. A extremely helpful, Sugu. The team I work for at Dine, Oane Lewis, who worked with his team to write the operator. Steve Lerner, who's in the audience, several nines who has a great Galera example. But anyway, questions? Yes. Unfortunately, I don't have much experience in that. A lot of what I've done has been kind of... I've worked with a lot of teams. I worked at HP and the Advanced Technology Group, and we did a lot of prototyping. And right now, I'm kind of in the same kind of role, so I don't have experience in that. But there might be people in the audience who are more than welcome to chime in on that having run it. I mean, whoever might be running Vitesse in the audience. Anthony Ye, or anybody else. But we'll take another question since I'm sorry I don't know the answer to that. Yes. Well, it's group replication, so it's not a simple master slave replication. It's a cluster. Yes. Yeah, I haven't used a lot of group replication, but it really is... It's not that simple type of master slave replication. Yes, you can. There's some of the same issues you have with Galera where you have to think about if you're going to have cross node locks, that situation, so pick one of them out to write to. I don't know. The person sitting next to you can answer that much better than I can. Probably. He liked the operator video when I showed it to him the first time. His talk was really cool this morning. Any other questions? Thank you very much for coming. I really appreciate it.