 Tervetuloa, minä olen Sakari Keski-Talo, olen tämän osuus operatiivista kodosti. Galera Cluster on hieman osallistuminen, josta on MySQL, MariaDP ja Percona data base. Se proteksi kiistoon, NOVA, Neutron dataa. Seuraavaksi on 24-7. Seuraavaksi on 3 eri MySQL-kodosti. 70 vuotta kodostimme kodosti kodosti, ja olemme aloittaneet galera-kodosti. Se on hienoa, että on todella yksi osallistuminen, multimasteri ja synkronoista. Tässä on 1000 osallistumia galera-kodosti. Yksi osallistuminen galera-kodosti on Myrantis. Se on hieman osallistuminen, josta on hieman osallistuminen. Tervetuloa, Myrantis. Kiitos. I don't have nearly nearly as great an accent as Sakari does. So I'm going to talk today about some of my experiences in the field in structuring the database schema, the data layer within OpenStack deployments, and I'm going to go really quickly, so at the end we can have some questions. All right, so we're going to talk about where to put data and where galera makes sense and where it does not make sense. The thing that you want to think about when you're designing your deployment and the database servers that are going to service Nova, Neutron, Keystone, Cinder, all of the OpenStack services is the patterns of both read and write patterns and the types of data that go into the different databases. The patterns that you identify will instruct you how to group data into different database clusters and different schemas. So I'm going to talk a little bit about this. We're going to talk about the frequency in which certain queries are done. The read to write ratio for particular types of data is very different between Neutron and Keystone and Cinder. Caching mechanisms are all different, so especially with Keystone, for instance, you can store tokens, authentication tokens in a completely different data store than your identity data. So understanding all those kind of things go into how you structure your database deployment. Keystone, let's go right into it. It's used by all the other components. Typically, you want the identity information, so assignments, roles, users, projects, used to be called tenants, no, it's a project. All that information, you want it to be shared globally across all of your deployment zones and regions. You can use, like I said earlier, different drivers for different types of data within Keystone. What we typically recommend is not storing the tokens in SQL databases. At AT&T, my previous position, I set up the memcast driver for tokens. It was a lot faster, and the token table in the database can grow substantially, especially if you're using UUID tokens and not public key infrastructure. The catalog information, which is the service catalog, service endpoints, that kind of thing, it's pretty much static read-only information. So you can actually set the templated driver for the service catalog, but you never even hit the database at all. Obviously, a lot of the information within Keystone is read heavy. We're not Facebook, right, and adding millions of users every hour, whatever it is. So typically you have a small number of writes to the Keystone user database and a whole lot of reads. Glance is the image service. It has a registry database that stores metadata about the images and snapshots that are stored within the back-end file or object storage. Today I'm not going to be talking about the storage of the block data. I'm just talking about the registry database, the metadata. It has some similar properties to the Keystone user role identity information, in that it's written very infrequently and read quite a bit that the metadata about images. It's also a really small amount of data, relatively. The one thing that, when you do a deployment of OpenStack that spans multiple regions or availability zones, you need to decide whether you're going to share the image registry database across regions. One of the reasons we chose to share the image registry across regions was so that someone could do a snapshot of their instance in region A and then the snapshot would show up in their list of images and snapshots in another region and they could launch that snapshot immediately. So it worked out very well to share the data across regions for Glance. Because Keystone's user and role information and Glance's registry database, they share very similar read-to-write ratios and the amount of data and type of data that's stored in those databases, we recommend actually to group them together, to put them in one single database cluster. Galera works out really, really well for the pairing of the Glance registry database and Keystone user information, especially across the WAN. So a lot of people think that, oh, this is high latency links across the WAN, the synchronously replicated database cluster is not a good solution for that. I actually thought that before we started using Galera cluster across the WAN. When I left AT&T, 12 multi writers, 12 Galera nodes serving the Glance registry and Keystone databases across data centers throughout the US, WAN replicated. Because it's very read intensive and not a whole lot of writes, it's actually a perfect use case for Galera cluster over WAN links. Great diagram, isn't it? We'll build on this. All right, so Nova is the compute layer within OpenStack. It has a vastly different data usage patterns and write frequency than Keystone or Glance. Within a Nova availability zone or the failure domain, you have an isolated database, right? So you're not sharing instances information from one availability zone to another because the failure domain is the availability zone. There are many, many more tables within Nova than Keystone or Glance and it's highly relational data, right? Let's see. It's not shared between the regions, like I said and all of the database queries are throttled through something called the Nova Conductor Service. So it used to be that all the Nova compute workers which are the daemons that actually help spin up and orchestrate the virtual machines on the resource provider hosts all of those hosts would connect into the databases directly. Well, once you get into thousands and thousands of compute nodes, each consuming one or more database connections that quickly becomes problematic. And so the Nova Conductor Service is the way that Nova throttles all of those distributed database queries through the message queue and then into the database from the Nova Conductor Service. In Nova, we delete things very rarely. We use something called soft delete. So when you delete or terminate an instance, we actually don't delete the row from the table. We update it to a deleted state and then during some periodic intervals we will archive those deleted records into a shadow table. So this type of pattern of not deleting things and instead updating rows causes some very interesting things to go on within the Nova database schema. One of those things is that the unique constraints on most tables are not just like a primary key. It's the primary key and then a deleted status column and then a deleted ID. It's really, really weird, but it causes a lot of issues with locking and joins. Neutron, kind of similar to Nova in a couple of ways that you don't share data across availability zones with the Neutron database. It has a few more tables than Glance and Keystone, not nearly as many as Nova. And the amount of data within the Neutron database isn't stunning. You're generally reading smaller data sets. What we did find is that there are certain tables within Neutron that are highly contentious with rights, specifically when you're doing IP address management and assignment. So there are a couple places within the Neutron database code that use something called Select for Update, which in normal MySQL world within NODB will take a bunch of locks on table rows, while with the way that Galera works, Galera only replicates changes to things. It doesn't actually replicate those internal to NODB record locks. And so Galera has issues with applications that use Select for Update. And if you want some more information on how that manifests itself in real world database benchmarks, feel free to check out my presentation with Peter Boros. We have the slides up, I believe, on prcona.com. But it's very interesting how the Neutron database performs compared to Nova. Cinder is very similar actually to Glantz. We have metadata that is stored about volumes in the database. But unlike Glantz, the Cinder database is local to a region. It's not shared. It's not shared data. MySQL Galera is actually a pretty good fit as well for Nova, Neutron and Cinder. At AT&T we had a four-node Galera cluster for each region that would service the Nova, Neutron and Cinder databases. It worked really well in multi-writer mode. A lot of people think, oh well, you have to run a single writer to avoid issues with lock contention and deadlocks. We ran it in production with multi-writers and counted one hand the amount of problems that we had with deadlocks. Overview of how I like to recommend setting things up is again at the top you have that globally replicated Galera cluster even across a WAN. It works really well because you have very high read, low write type data that's very similar in its access patterns. Then you've got the Nova, Cinder and Neutron databases in its own Galera cluster within each availability zone. Then you've got the message queue that's coordinating stuff between all of the Nova compute workers. Solometer. How many people have Solometer actually installed in production and running? One person. Do you happen to use the SQL driver? Yes, I don't recommend it. So Solometer is extremely write intensive. There's a couple things about the Solometer architecture where it writes a lot of duplicative data into the tables. If you try and use the SQL driver for Solometer instead of the MongoDB or HBase drivers, it quickly shows some issues on the database schema level and you can quickly get almost a non-functional Solometer API endpoint. There are some issues with the database schema in Solometer. I know the guys on the Solometer team are actively working to redesign some of this stuff, so I'm not going to bitch too much about that. So do not use MySQL Galera cluster for this type of workload. It just doesn't make much sense. It's extremely write intensive. We're talking like 2,000 compute nodes, let's say 10,000 VMs. You could easily be pumping 200,000 records an hour. And it's just not an ideal type of database system for that type of workload. MongoDB is actually a much better choice. I believe it's the default now. We kind of switched between SQL default and then back to Mongo for some licensing fears or FUD. But MongoDB or HBase are what I would recommend for Solometer and definitely not MySQL Galera cluster. I take it from me, I tried it, not a good fit. So what does it look like? Final thing, Solometer, we put that into a separate MongoDB store and this architecture is essentially marrying similar types of data to specific database clusters and servers that make sense for that type of data and that data write and read pattern. So once, so go back. This is one zone. The top is the globally distributed and then you just create more deployment zones. You're going to replicate over the WAN the Glants and Keystone stuff and within each of the zones you'll have your local neutron, nova, and cinder Galera cluster. So you have basically global cluster and then a Galera cluster for the neutron, nova, and cinder stuff within each availability zone because that's where the failure mode is. So that's really all I've got. I really want to leave some time for questions because I'm sure you have some. But remember to group your storage based on the pattern of data, reads and writes, and the types of data that are stored in that particular database. And don't be afraid to think outside the box and not just put everything in one single database and then try and replicate that either using mySQL master-slave standard replication or Galera cluster or whatever. Think in terms of grouping like data together. So that's the end of my slides. Would anyone like to ask some questions? Maybe how do you guys do your database setups? Do you have everything in a single database? Do you group Glants, Keystone, Nova, Neutron all together? Do you split them out into a global store? What the hell is that for Keystone? What kind of things have people run into? It's a tough crowd. Okay, well, does anyone have any questions for me? Yes? I'm sorry? Like data, right? So the question is, how do you actually determine what data is alike? What data is similar? Well, you can take my word for it. Right? But also, you can do some analysis on the database itself. Look at the types of queries that are frequently executed. So there's a great toolkit that Percona puts out called Percona Toolkit and PT Query Digest is a tool that you can use to analyze the MySQL slow log. So you can just use that type of tool to give you a report on the types of queries that are being done and then just do explain, select to see the query execution plans that are generated from those types of queries and then use describe or show table to see the structure of the database schema. And you can see, you can do some simple analysis like that. Otherwise, yeah, just basically trust folks that are writing code in those projects. Okay. Thank you for joining. Thank you, Che. If you want to try out Galera Cluster, go to galeracluster.com. It's open source, downloaded. If you prefer Percona server, go to percona.com. They have their own version of Galera Cluster in there. If you prefer using MariaDB, go to mariadb.com. There is also a Galera version of MariaDB. So thank you very much. It's one big happy Galera family. Yes, we are all. Thank you.