 Hi, I'm Saurabh. I work with Vipple on the Database Service Team, and I mostly deal with operations and automation. So what this talk is about, so we're going to talk about Trove, what Trove is a little bit. We're also going to touch on how to deploy Trove in HA mode, how we do configuration management at HP, some of the monitoring tools and processes that we've built, as well as touch on, at a much deeper detail, how we operate Trove in production today. So Trove, Trove is Database Service. It's today we support, we have support from MySQL, MongoDB, Postgres, Cassandra. These are all single instance databases today. We're adding replication and clustering support in Juno. So we're going to be able to do things where the actual database instances themselves have some level of HA. We are an integrated OpenStack project. Our first integrated release was in IceHouse. So the architecture, at a very high level, it looks very much like most of the other OpenStack services. We've got RabbitMQ, we've got a service database, and then we've got several control plane services that we run as well. We've got, obviously, the API server that handles all requests. We've got a back end conductor and a task manager. The interesting thing here is that we have a guest agent process that we run on every VM wherever the database lives. In Glance, we also have data store specific images that we build using disk image builder. So it's not too complicated. It's a very straightforward architecture. So which cloud do you choose to run Trove in? Trove is kind of unique. It's got only API dependencies to all other services. So we have a couple of options. So we can run it in the overcloud. And for those that may not know what overcloud is, it's essentially the triple O term for the bare metal control plane, OpenStack control plane deployment. We can also deploy it in in cloud just because we can actually just point to the public URLs for the various OpenStack services that we depend on. At HP, we've decided to actually run it in cloud. So we stand up OpenStack on bare metal. We stand up VMs. And we run the entirety of Trove within the cloud. This helps us solve some of the issues that we have. So as you saw, there was a guest agent in the customer instance. That needs connectivity back to our control plane. So having the control plane be at the same level as the database instance helps us have that connectivity. So what is HATrove? To get to an HATrove, you need some support for running HA workloads. We accomplish that by availability zones. These are not just tags that we place on hosts. We actually have physical separation of servers across a data center. And the control plane for Trove, we run that across all of the availability zones. We run a Galera cluster. So that allows us to essentially lose a minority number of nodes and still have a service up and running. We also run a RabbitMQ cluster in Merid Mode. We are able to provide all the RabbitMQ IPs to Trove. And using the Oslo messaging framework, it's able to pick the right one to talk to depending on whether it can connect to it or not. We run multiple API processes. We run multiple task manager processes, as well as the conductor. And we're able to scale these sort of independently. So this is what it's kind of busy, but this is kind of what it looks like in production today. We've got, on the very far right there, we've got our OpenStack deployment. We have an OpenStack deployment in each AZ. This is a single region view right here. Within the, in the middle are our compute hosts. And on the left there is our Trove control plane. And you can see that we're actually spinning up VMs and we're running sort of the same type of topology that we have in the OpenStack deployment for Trove. So for the OpenStack deployment, we also run a Galera cluster. We also run a RabbitMQ cluster. These are not shared by Trove at all. We run a separate set of the same services for Trove. And so how are we able to get to that implementation diagram? So a lot of what we've built has been based around SaltStack. From the OpenStack deployment itself to the Trove deployment, we've got SaltStates that we've built, Scripps that we've written that are available on GitHub to basically be able to deploy that same picture that you just saw in the previous slide. So what do we use Salt for? Primary reason is configuration management. It helps us sort of control what packages and dependencies are installed on the various control plane VMs. We are also able to use it to copy configuration files down to the VMs and manage users for MySQL, users for RabbitMQ, as well as operating system users. It gives us the ability to create a reproducible infrastructure. So if any of these nodes go down, we're able to spin it up using a nova command. And we've got the exact same copy of the node that we had previously. And the other interesting thing is we actually high state every instance that we boot using Trove. So we have a first boot script that we run on the guest image. And as soon as it comes up, it talks back to the Saltmaster and gets the right packages for the actual Trove guest agent itself. We also use it for remote execution. This allows us to disable SSH to any VM, all the control plane VMs. The way we manage our infrastructure is by logging into a single Saltmaster and running Salt commands to the various nodes that are managed by that Saltmaster. We use it also to define user and resource level access. So we can allow the support team to log in to the same Saltmaster and give them less privileges than maybe the operations team would have. We also use it because it can scale to thousands of nodes. We actually use it to manage all of our customer database instances. So if things like the guest agent process has died or things like that, we're able to do a restart on that service through the Saltmaster. This is an example of what the state file looks like. This is kind of a snippet of it. But it's fairly simple. It's just the AML. We were able to, as you can see in this one, we tell Salt to do a pip install of the specific Trove package. We copy the Trove config file down onto the VM. And we have a service watch that says if the package or the Trove config changes, I want you to restart the Trove service. So the Trove.conf file, which is on the next slide, this is just a Jinja template. We use Salt's pillar data to place the actual values onto the config file, things like RabbitMQ and MySQL connectivity strings. And so we've done all of this today using Salt. We continue to use Salt today. But the next thing that we're working on is to be able to do sort of image-based deployment. So using triple-O tooling, we're going to be providing the heat templates required to stand up our control plane, as well as all the images that are required that make up our control plane. So we're in a world where today we can use Salt Cloud or an overwrapper script to sort of boot a Saltmaster and have that Saltmaster provision the entirety of your control plane. We're going to go to a world where we use triple-O, where we stand up seed cloud, under cloud, over cloud, and then use the heat that lives in the over cloud to sort of spin up images that make up the control plane, and as well as use heat's OS config OS apply config to put configuration data onto those VMs. And now I'm going to hand it over to Sorab. He's going to go into a lot more detail of how we use Salt. Hi. So I'll be going over most of our operations and monitoring tooling around Trove that we use. As Wipel mentioned, we use Salt Stack heavily. The infrastructure that we have is part of the Salt environment. It allows us to basically control the whole infrastructure from a single machine. So we actually log in only on our Saltmaster. And from there, we can execute commands or write automation scripts around the whole infrastructure and run them only from one machine. And it doesn't require us to do an SSH on every other node in the infrastructure. We have a multi-master setup in Salt. So we have multiple Saltmaster. What that allows us to do is every minion talks to all these masters, and we keep them in sync. If one of the master dies, the whole infrastructure is still operatable. And we can basically control the same thing by logging into a different master. Every customer instance, apart from our control plane, which is obviously like all the APS server, task manager, and database, or every MQ, all the customer instances that Trove spins up, they are part of the Salt stack or Salt environment. So every instance is running a Salt minion. And every time an instance is created, it's get added to the Salt environment. Trove generates a unique ID, for instance, that allows us to basically maintain unique ID across all the Salt keys. We also shove a lot of information in the Salt key to maintain even uniqueness of them across all the environments. We have an automated way of accepting Salt key on the master. So for a Salt minion to be able to talk to master, master has to accept this key. And the only way that master will accept it is if it trusts that the key is coming from a trusted minion. So we have, because we know that we created this instance through Trove, we can actually do some automation on accepting and deleting those keys. These are some Salt commands that I listed here. So the first one is basically just standard Salt module. All it's doing is just checking the status of the API, Trove API service. But the key thing is that it's doing it across all the API nodes. So we can see what is the status. The second one is checking the MySQL service status. The third one there is our own custom module. So wherever possible, we use Salt's inbuilt modules. But we also have written a lot of custom modules, which allows us to control the command that gets executed on the infrastructure. The third one is the command.run. This is the most difficult or this is the most unsafe thing to run. And we try to avoid this because this gives a lot of control back to the operator or the person who's running this command because it's actually running that as root and that has like severe implications if somebody can execute wrong commands through this. So the key part here is that we configure our environment such that nobody can execute a harmful command through Salt, even though it gives you a lot of flexibility. So this is a standard Saltmaster config. And what it allows us to do is it allows us to choose what command is allowed or which user is allowed to execute which set of commands and that way we can control the access to the infrastructure per user basis. And it allows us to also configure some blacklisting that probably root user is not allowed to execute any commands or particular modules are not at all supported in the environment. We usually do not allow SU and we let everybody execute the command from their own account so that Salt has a track of who executed what and we can go back and look at stuff if there's something that goes wrong. Root is obviously disabled. So obviously, even if somebody can do SU, they cannot really run any command from there. Apart from command.run, we disable a lot of modules like file, CP, config, which are harmful because they either give out a lot of information about the system or they give some control to the system which is not considered very safe. So as I mentioned earlier, we write, we have written a lot of custom modules around Salt to basically automate a lot of operational tasks into simple Salt commands. The modules are easy to write and you can actually execute and build modules from your custom module. What that allows us to do is you can actually define what command you want to run, what parameter that command should accept, and even if the output that that command is giving, you can filter that output to basically suppress the information. Like if, let's say, I want to show the configuration file, I will try to remove all the passwords or I'll try to hash them out so that I'm able to save the critical information. The next thing that I would like to just describe is how we do the upgrade. So today, we use SaltStack for the whole upgrade process heavily because Trove does not provide in its current form does not provide any API or any flexibility in terms of doing the automated upgrades anywhere. So we have written our own scripts around SaltStack which allows us to do the upgrade of the infrastructure. While we do the upgrade, we make sure that the MySQL process that's running on the guest agent instances or the customer instances is running because that allows the customers to keep on customer's application to just not have any downtime in terms of database access. Usually involves downtime or some maintenance window because in the current form, Trove does not allow multiple versions of RPC message in the same environment. So right from API task manager or conductor to the guest agent, everybody should be on the same RPC version. While doing the upgrade, we follow the same sequence wherein we first upgrade all the customer instances which are the Trove guest agent instances, then upgrade the task manager and conductor, and then upgrade all the API servers. The reason for this sequence is that when an API server or task manager sends a message to the guest agent, that guest agent should be aware of the new version of the RPC. So that's why we follow this sequence. Trove is right now implementing an API which will allow the operator to execute that API as an admin user. And that API would go back and upgrade all the guest agents and then will go and upgrade all the control plane elements after that. Security concentrations for the Trove service. By the nature of the service, there isn't much that we need to do because we do not store any credentials or any information about user in our service. Because we rely on Keystone to do the authentication. We let MySQL handle all the credentials for the MySQL user. So we are not really storing anything from the user which we need to be really worried about. But we use as well wherever possible. We use SSL on the API servers, on the RabbitMQ connections. The database instance, which is part of the Trove control plane, that instance needs only to be accessible from all the control plane elements. And guest agents don't need access to a database. That was an improvement that Trove did a couple of months back, where in earlier versions, every database instance would need access to RabbitMQ as well as the service database. Another thing is we use separate DB credentials and RabbitMQ credentials across all the services, including the guest agent. That allows us that if a particular thing is compromised, we can still let the rest of the service up and running. Next thing I would like to talk about is monitoring. Monitoring forms a very critical piece of keep this service up and running. Over the period of time, it has matured a lot. And we have seen that at least in recent past, we didn't have a lot of issues, wherein like RabbitMQ, number of queues on RabbitMQ are growing, or the guest agents dying out. But from the time we started doing this, we have seen that monitoring, a good monitoring has helped us encounter the issue well ahead of time. So apart from standard monitoring on RabbitMQ and database, which is monitoring number of queues, socket connections, established connections for RabbitMQ, and monitoring probably cluster status, slow query logs for database, we actually monitor the API support with Nagios. We have upstart script for all the control plane services like API, task manager, conductor, guest agents, so that if the process dies, there isn't a response for it. We monitor. So by the nature of our control plane, the database and RabbitMQ are across all the AZs. And obviously, the instances basically can get spanned up in any AZ. So we try to monitor the connectivity to RabbitMQ and DB from all those instances, because they are part of different AZs. And apart from this, we have some custom monitoring that we have set up around the service. One of that includes catching all the instance failures. We have seen that this helped us, because what happens is, intro, if an instance fails, user can go and delete it. And then there is no trace of that instance being failed anywhere in the whole system. And so for that, what we do is, as soon as the instance fails, we try to catch that and grab all the information about the instance, about its corresponding no one instance logs from the instance. And that helped us to identify issues ahead of time. This is another piece. So guest agent process, trove guest agent process, which is running on all the customer instances, usually sends a heartbeat every few minutes back to the service, saying that the process is alive. And that's an indication that probably the next message that will get delivered or the next command that user will execute on that instance will get executed successfully. So monitoring of this is another thing that we have seen which has helped us, because sometimes either the process dies or there are connectivity issues from the guest agent to RabbitMQ. And we were able to catch these failures ahead of time and not let user tell us that probably they're not able to run commands or stuff like that. So this is a simple query which will tell if the particular guest agent is not sending any heartbeats over a period of time. This is another piece that we saw for quite a while, which is fixed for most parts in upstream probe. But there are still issues here wherein usually user executes a backup command. And that basically runs a backup EX process to do the backup. And that actually streams the backup to the Swift. And what we have seen is that if the databases are huge, maybe hundreds of GBs, the process usually sometimes just abruptly dies maybe because it is not able to connect to Swift or something like that. And then that process is left behind. And then it keeps on eating CPU and memory. And if this happens multiple times, then there are a lot of processes that are in a backup EX processes running there, which also sometimes hogs the SQL process. So monitoring of this is another thing that should consider. And the way we monitor this is we look at the database and processes on the customer instances. So we have seen that over the period of time from last one year, we have seen most of the issues while running the system were around RabbitMQ. Maybe for a long period of time, there were issues in the Trove service itself wherein we were not dealing with RabbitMQ correctly, such that we were not deleting the queues when the instances were deleted and those were just getting piled up. The queues were getting piled up on the RabbitMQ. We have dealt with a lot of issues, and we have fixed a lot of things upstream. But when we are running RabbitMQ, should up the socket descriptor limit, maybe in 10,000s or depending on the kind of setup that you have, and should keep an eye on the growing number of sockets and queues to be alerted while ahead of time. If we are using, if the setup is huge and there are a lot of instances, then it should consider RabbitMQ hard beats because even though we fixed an issue upstream wherein we are able to delete the queues, the socket, the way RabbitMQ deals with the socket is even if the instance is deleted, the socket would keep on growing, and if you don't use hard beat, that number will never go down. So using hard beat is one of the things that we improved in our setup recently. We have seen that monitoring is really important for dealing with all the clustered RabbitMQ setup. And whenever we are using clustered RabbitMQ with mirrored queues, it's very susceptible to any kind of network small glitches, and it just falls apart. This is slightly improved with the newer RabbitMQ versions, but we always face these issues. So as I mentioned earlier, service notification or service health check by the guest agent is something that we need to monitor. Another thing that we have seen is when we are doing upgrade. On the flavors, which are really small, like maybe the smallest flavor is 1 gig of memory, on those kind of flavors, it's very difficult to do an upgrade because at that time, what is happening is if the database instance or actual data on the MySQL is larger, MySQL process itself is consuming a lot of memory and CPU. And then for PIP to install and install stuff, that's another huge set of memory that's needed there. So we have seen issues while dealing with the small flavor sizes or dealing with upgrade on the small size flavors. Another very, very weird thing that happens between Trove and Nova is Trove maintain its own quota for instances, and Nova maintains its own quota of instances in security groups per user. And we have seen that sometimes when Trove issues a command and Nova is not able to successfully clean up those resources, there is always some mismatch between Trove's quota and Nova's quota. And we have seen a lot of failures due to that, and we try to monitor or try to keep these things in sync. And whenever there are issues, we try to alert and somebody can go and fix it. This usually should be handled in upstream Trove, but it's not there yet. So eventually it will be, and then nobody will have to deal with this. So just to recap, we have been running this for more than a year. And what we have seen is from the time we started running this and to what it is right now, it has matured a lot. And we were able to identify and fix a lot of these issues upstream so that somebody now deploys the service in the same environment or in the same scale that we do probably will not have to deal with these issues. Maybe there will be other issues, but at least many of the things that we have encountered, we have fixed upstream. We use Solstice, and it has really helped us maintain and monitor and control the whole environment and optimize our operations. Monitoring is another, again, monitoring is really a key part of keeping this service up and running and keeping it reliably running over the period of time. We have open source some of our tools. Not everything that we use is out on GitHub, but we will try to push as much as we can. And there are links posted here, which will provide you with some of the Solstice stuff that we use. The first one provides the Solstice scripts or setup scripts for Trove. The next one is for actually installing OpenStack, and then there are some links for Trove. So that's pretty much I had. Let me know if you have any questions. So you said monitoring is key. So in your Trove images, are you installing some monitoring agents as part of that image? No. So basically, as I said, probably on the guest agent, the way we make sure that the guest agent up and running is we make sure that we have heartbeats coming from the agents, because that's part of Trove. And that's how we try to monitor guest agent, and then we make sure that there are no left behind processes for back up EX. Because from Salt, we can actually look at anything that's going on that machine. But we do not add any monitoring agents at such on the guest agent, which is the customer instance. Because we have seen that with whatever we have itself, if the flavor is small, there is like constraint on the resources. So we tend not to do anything on the customer instances anymore. But we do a lot of things on the control plane to just gauge a good understanding of the guest agents or the customer instances. And within Rabbit, you talked about two important aspects of high availability, making sure the cluster is using heartbeats. And what was the other thing? Increasing the socket limit, I didn't catch what that was. Increasing the file descriptor limit on the Rabbit MQ. So what happens is, if you are not using heartbeats, if instances are getting created and deleted, Rabbit MQ, so when the instance is deleted, Prove service doesn't close its connections. And it makes maybe multiple connections from every guest agent. So the key is just increasing that file descriptor limit if your cluster doesn't support heartbeats? Yes. But then you'll have to keep on monitoring it. And then at one point, you'll have to restart Rabbit MQ so that it forgets about all those sockets. So yeah, you can increase it. You'll have some runway. But at some point, you want to have heartbeats. Because as you boot and delete a Nova instance, that number is going to keep growing. So with heartbeats at least, Rabbit MQ will start closing those sockets. Last question is, does Trove support deploying instead of just database instances? Does it support the concept of deploying database clusters? It will. So actually, today's our design summit. We have four or five tracks. The thing that we're agreeing on, I think it's that tool or something, is going to be what the clustering API looks like. So as part of Juneau, there's going to be at least one implementation of clusters. And the API will be in place so that you could do Cassandra or you could do any other data store if you want to. Thanks, guys. Thank you for the presentation. So my question is more around the HA of actually the database instances and services that actually Trove is providing. So far, you've covered the high-level services that are needed to run Trove and HA around that. But for example, if you're providing databases as a service, one of the things that your customers is going to want to know is, how is my database HA'd? And I didn't hear anything about that. And also, maybe this is out of scope for this discussion. But how does the customer scale? So actually, that comes back to the clustering point, right? But so the original question is, how do you cover, does the API today cover HA for the database? Yeah, so today Trove is only support single instance databases. So your database is not going to be HA if you take Icehouse and try to run Trove. We're adding replication and clustering as part of Juneau. So we can probably answer that question a little bit better at the next summit. But yeah, that's just where Trove is today, is it doesn't have HA support at the actual instance level. And the database is supported today from the original slide deck was MySQL, Postgres, Cassandra. Is there any plans to add more? Are you just going to concentrate on those four or five or whatever it was up there? Yeah, so which database were you asking if we had plans? I just didn't see the slide was too quick for me. Oh, OK, yeah. So there's, I could be wrong, but there's MongoDB, Postgres, Cassandra, Redis, and MySQL today. And then the plan, as you said, the plan is to provide clustering for some, not all. But it's really dependent on what the operators want to support. So like HP, when clustering is available, we're likely going to support MySQL clusters as well as master slave type of replication topologies. At some point, we may support, we may make MongoDB available to our customers as well. And then at that point, we'd go and implement. So we're kind of building a framework and an API. And not all data stores will be clusterable. It will depend on whether there's an interested party that wants to sort of add that support and then want to run that in production. Thank you very much. OK, I'm sorry. Has Trove, or whatever? Yes. So I think it was added two months ago. We have something called configuration groups. You can basically get a list of all the options as well as modify certain options that a operator wants, allows you to modify. So that's in place already. It will. We haven't deployed that to production yet. We need to test that out a little bit and then it will be available. So essentially, HP Public Cloud, we take upstream. We run it in a staging environment for a while. Everything looks good. We just take it and run in production. We don't have really very much internal stuff. We don't run an internal fork either. We do have some extensions to do like rabbit MQ heartbeats and those things. But it's really what's upstream is what we run. Thank you. You made my day. Yep. Thank you. All right. Hey, what's happening? Trevor from RMS. I had a question with the database size. As you had mentioned that there's some issues that you guys ran into when your databases got a little larger. And so I think you said in the hundreds of gigabytes. And so could you talk about what component in the process was slowing you down? Was it something that was on the, was it writing it out to Swift? Or was it something like the agent that was on the box that was doing the backup or whatever it might have been? So on flavors which are relatively very small, like maybe let's consider one gig, right? If you're running MySQL with very small size of data, then you won't have any issues. Because for a smaller size of database, MySQL won't consume a lot of memory. Even if you're taking backup or doing anything on that, that wouldn't consume any memory or CPU as such. But if the database size is large, then MySQL will eat up memory. If you're running backup, that will eat up memory. And if you have any other processes that one of them is being probe gas agent and the salt, and these processes, if they want to do any work, they would consume memory. So basically the problem really happens when the size of the flavor is very small and the size of data is relatively larger, which obviously, I mean, we do support, I think, 60 gig on the smallest flavor. So technically MySQL can run with one gig memory and it can support 60 gig size of the data. But when we are doing maintenance on that node, then we have to be really very careful because we have seen failures in that case. So it depends on the size of the data mostly. Go ahead. Good morning, gentlemen. Thank you very much for your presentation. I appreciate the fact that you had clear architecture and technical information and some meat to it. It's wonderful to see. You guys did fabulous. My question for you is on the salt stack manifest, or not manifest, but formula you put together, did you bake in any of those best practices you talked about tweaking and tuning and fixing some of the elements that will catch you running it in production? In the thing that we have published on GitHub? Yes. So what we have published on GitHub is very standard salt scripts that will just help you set up the whole infrastructure. But so we have tried to basically describe what you need to monitor and most of those monitoring are using Nogils. They are not part of the public GitHub right now. But I mean, if you're interested in particular piece, we would be willing to put that on GitHub sometimes. Okay, cool. Thank you for coming out, guys. Thanks.