 The databases for machine learning and machine learning for databases seminar series at Carnegie Mellon University is recorded in front of a live studio audience. Funding for this program is made possible by Google and from contributions from viewers like you. Thank you. Hi guys, welcome to another session for our seminar series this semester. We're excited today to have distinguished Carnegie Mellon database group alumni Dana Van Aken, Dr. Dana Van Aken. She finished her PhD at CMU in 2021. And since then she's gone off to be a co-founder with me and the CTO of AutoTune. And so which in my opinion is the premier ML or AI for database tuning service. And so she's here to talk about some of the things we've learned and some things she's been doing at AutoTune. So as always, if you have questions for Dana, please unmute yourself and say who are and fire your question off at any time. And then if you can't unmute yourself posting chat and I will interrupt Dana and an extra question. So Dana, the floor is yours. Thank you so much for coming back. All right. Yeah, thank you Andy for having me. Again, my name is Dana. I'm the co-founder and CTO of AutoTune along with Andy. So today I'm just here to talk about AutoTune and Andy gave a great intro. I have been working on AutoTune for a long time now. I've been working on it since 2015. So it's about eight years. And I guess about three of those years has been between two and three is, you know, AutoTune the commercial product. So with that, I've lost my mouse. Okay. Here we are. So with that, let's get started with the talk. I'm going to start by discussing some background information. I'll kind of explain what, you know, a tuning service is and introduce AutoTune at a high level. Then I'm going to spend some time reviewing what AutoTune the academic version was and sort of like what the algorithms were and, you know, what that looked like back then. Then we'll kind of, you know, under keep it real, we'll go over some of the assumptions and, you know, incentives that didn't quite pan out, you know, from the academic version to the commercial version. And some of the stuff that we had to go back and really think about how to, you know, fix once we're launching AutoTune as a real product. And after that, we will conclude. All right, so we'll get started with some basic background about database tuning. So, you know, I feel like I've said this sentence so many times because I, you know, databases are just notoriously hard to tune. Finding the, you know, write-nob configuration or, you know, the most efficient queries or like the write partitioning scheme, it really depends on the workload and the hardware and the data and other factors. So it's, you know, traditionally a very manual problem that has been solved where people, you know, will hire human experts often in the form of DBAs. To, you know, come help tune their databases, they might employ them full-time. But, you know, ultimately, like with, you know, the increasing complexity of today's systems, you know, there's Postgres and MySQL are always adding more knobs. Finding these human experts is, you know, they're harder to find these days and they're also quite expensive to employ. So, you know, at the high level, what Auditune brings to the table is just an automated service that uses machine learning to be able to, you know, automatically configure your database and optimize it. All right, so again, Auditune is an automated database tuning and resource optimization service. As we already covered, it is based on research that out of Carnegie Mellon University that I worked on with Andy during my PhD. And, you know, at the CrossFit, it just uses machine learning to help optimize and the configurations of the DBMS. So one thing to know is that the academic project was only focused on configuration ops. In the commercial product, we've expanded, you know, the offering and the types of tuning that we can provide to include knobs, indexes, queries, cloud configuration. We also provide some, you know, table level recommendations when statistics are stale. So we provide like a whole lot more than we used to. And as far as, you know, what we support right now, we currently support Amazon, RDS and Aurora, and specifically MySQL and Postgres. As to why, you know, providing like not supporting on-prem databases was more of a strategic decision and that Amazon provides a lot of like, you know, normalization, standardization around how you can apply knobs, you know, how you can collect data. So that actually simplifies it quite a bit. So we're not, you know, writing one-off like clients, you know, to go collect data from, you know, this place and that. Dana, is somebody asking you whether you support Aurora Serverless? We do. Yes, we do. We also support Aurora Serverless. We do. Any follow-ups? Okay. Yeah, for both MySQL and Postgres. Good question. So next I'm going to go ahead and hop into going over the, you know, reviewing the academic version of OtterTune. So, yeah. Oh, okay. I wasn't expecting all of that. I thought it was a little bit more staggered, but that's fine. So, do you see the little hat? There's like a little academic hat next to OtterTune. I just have that hat there to indicate that we're talking about the academic version of OtterTune. So hopefully that will help distinguish, you know, which version we're talking about. So, you know, it basically to go over an overview of how OtterTune works. In the very first step, the user will select the target objective that they wish to optimize for. In commercial OtterTune, we highly recommend the query latency, and that's what a lot of our customers choose, but this could be the throughput, it could be the CPU utilization. It's really up to, you know, the user. And with that, we provide an agent that users can deploy in their own environment. And what that does is it will connect to the database and collect, you know, various forms of telemetry, including the configuration knobs, the database runtime metrics. Commercial OtterTune also, you know, collects query information. So that is the purpose of the client side agent. So, you know, the OtterTune will periodically collect this information from the database system and send it back to the tuning manager, which stores it in the data repository. And so there's something that we refer to in the paper as an observation period. It's just a period of time where you're, you know, collecting these metrics. I believe we collect metric information once a minute or maybe once every five minutes. But the observation period is the period of time where you want to observe a given configuration to before, you know, crunching on the data and deciding which configuration to suggest next. So, again, we're going to include the most recent observation in that tuning data, we're going to crunch on the data, and then we're going to spit out the next configuration. The OtterTune believes will improve, you know, your system performance over the baseline configuration, configurations it's suggested so far. And so as far as like what the difference is between OtterTune, the academic project and commercial OtterTune. So again, we support AWS. So in the first step, the user must grant us a number of IAM role permissions. This is to, you know, collect cloud watch data, RDS data, performance insights data. And optionally, they can provide us with right permissions so that we can actually apply configurations directly to the, you know, their database, you know, on behalf of the user. In addition, you know, this is additional data that we collect down in step three. And then finally, when, you know, we go generate new configurations, commercial version of OtterTune will also generate health checks, you know, index recommendations, query recommendations, and etc. Tina, quick question. Yes. Do you ever end up with a situation where the telemetry data can become much bigger than the actual database? Well, I don't think we're quite there yet. So I was just talking with our engineering team about how our, our storage is starting to explode. I guess we collect about eight gigabytes of telemetry a day. So we're starting to have to think about that. But in my answering your question correctly. Yeah, yeah, yeah, no. And the reason I ask is, you know, the Microsoft guys always complain if they start collecting all telemetry, it will be more than the customer data in less than a year. And then if you're starting to see some versions of that problem, especially on the smaller databases that sometimes are very valuable, then many enterprise valuable databases are not that big. They might be sub 100 gigs and you could easily get there. We're just starting to get there. And we are probably going to have to put a customer policy in place soon, you know, alerting them of how, how much history we can provide them because right now we provide the whole history. So yes, that that is starting to become a problem. So, what I'm going to go over now is just kind of the motivation for auto tune way back in the day in 2015 when when Andy and I started working on this problem. So, what auto tune is is it's an external tuning service, meaning, you know, it's not like built into the code base of my sequel or postgres. You can, you know, hook on to any database and the goal was to make it, you know, pretty extensible right so maybe have to provide it with the knob names and some basic metadata about the knobs but you don't have to provide it with any domain knowledge. So it can pretty you can pretty much, you know, take the same algorithm and use on postgres use on my sequel. I don't know cockroach, and so on. And, you know, that to make that happen like a lot of the techniques that we came up with or, you know, automatic there, you know, everything in that algorithm in the machine learning pipeline was pretty much automated. And so, you know, the main idea behind auto tune is, you know, tuning takes a lot of time and uses a lot of resources. So, the idea was to leverage the training data that we've already collected from previous tuning sessions to help to new DMS deployments that come along more efficiently. And so the way that we do this is, we're going to train models from the data in our, all of the, you know, previous in historical data and our data repository. And we're going to use that data to help us figure out, you know, which are the most important knobs. And, and also to try and find which of the workloads that we've seen previously is most similar to our current workload. And then the idea is that we can reuse the data from the most similar workload. You know, to help bootstrap and jumpstart the tuning process while there's no data for the new deployment. All right, so a little outline of the algorithm. So at a high level, this is the auto tune machine learning pipeline. Two of the parts the workload characterization and knob identification. Those are those run as background processes. And then the automated tuning component is the algorithm that's going to be interacting and generating configurations after each observation period. So the workload characterization period, or excuse me, the workload characterization is a data reduction method. So basically what we're doing is we're feeding it all of the, as you know, somebody pointed out there's there's a ton of metrics we're going to feed in all of that metric data and try and prune and find a subset of metrics that capture the distribution characteristics of different workloads. So for this, we've used factor analysis and clustering, but there's, you know, this was this was back in 2017 there's been a lot of great papers published that they use, you know, other techniques. So the high level is just a data reduction technique. Similarly, for knob identification, we're going to analyze all of the data in our repository. And try and we use the lasso algorithm, which is a feature selection algorithm to try and determine, you know, the order and order of importance of the configuration knobs. And, you know, this is very helpful for trying to, you know, decide which are the most important knobs to tune as, you know, as I'll go over next, the automated tuning component uses a form of Bayesian optimization. And the algorithm is much more efficient with fewer configuration knobs. So that's that's the real motivation there. Alright, so for the automated tuner component, it has two parts. So the first part it does the workload mapping. So using the data output, you know, the set of metrics output by the workload characterization component. It's going to, you know, compare those metrics across the different previous workloads that that we tuned in the past with the current target workload, which is the workload that we're trying to tune. And it's going to compare those metrics and basically figure out which is the most similar workload to the current one that we're tuning. That's when we start part two. So once we figured out that out, we train a Gaussian process model off of the data we've collected so far from the target workload, along with the data from the previous workload that we deemed, you know, most similar to the target. So we're going to train them all not and in order to generate the next config. And so the, the way that the algorithm chooses the next configuration to recommend is it trades off exploration and exploitation throughout the entire tuning session. So sometimes it will choose exploration, which means it's going to try data points that knows very little about, and then other times it will try exploitation, which is where it's going to try typically data points that performed really well in the past that it does know about. All right. Next, we'll go over some of the assumptions that we made in the paper. There's the right other. I was hoping I would highlight. So, you know, our machine learning research along with many other papers makes certain assumptions that don't quite pan out in the real world. So one of these is just that the optimization process is just a one time offline activity. We also kind of assume that the DVA has, you know, the tools needed to deploy a copy of the database on similar hardware and sort of, you know, recreate that workload, you know, separate from the production machine. The workloads, you know, inevitably, because it's really hard to get access to any sort of real data. We use benchmarks. And because of that, we only really see like the efficacy of these algorithms on, you know, benchmark data, which is static, you know, it's fixed. We can get away with doing an observation period in five minutes. We can just look at five minutes worth of data to determine, you know, whether like what the next configuration should be because it's just like very, it's just static. And finally, we sort of take for granted how difficult it is that we assume that there's a knowledge base of high quality data available, or the ones like, you know, readily easy for us to access. And so this is, you know, these are just a few of the items where there's a little bit of a disconnect between research and industry. And it certainly doesn't mean that the, you know, research is not impressive by any means it certainly is impressive. But, you know, Andy and I kind of, well, especially Andy, but like myself, I still have, you know, one foot, a little foot in the door in academia. I would love to see, you know, more problems being solved in research that are a little bit more relevant to the problem I'm working on now. Yeah, I think that brings us to the next section. So yes, in this section, I'm going to just give a few examples of, you know, like, I just named the assumptions. Let's dig a little bit deeper there. All right, so challenge number one. It is not practical to tune every database offline in a repeatable environment. Well, there's, I mean, there's a lot of reasons for this, but at a high level. The sort of two setups that we looked at, you know, while I was a PhD student were either trying to tune the staging database and then take those, you know, take the optimal configuration for the staging database and apply it to the production database. Or do like sort of a workload replay where you're copying the workload. We learned very quickly with one of our initial customers that staging databases, honestly, like for most of our customers and ourselves, staging and dev databases don't look anything like production database. So what happened is we, we tuned the customer staging database and they got a 15% you know, production and IOPS, which is cost savings, you know, because they're on Aurora. So once we played the same configuration to their production database, we only saw, you know, 1% improvement. And, you know, in retrospect, now I'm happy that it didn't make things worse, but you know, didn't help. So, yeah, the, the other one I kind of touched on. Users can't capture workloads and replay them not easily. Because this database replay tool that's just amazing, honestly, but sophisticated tools for replaying workloads and, you know, capturing the traces and making sure that, you know, I guess getting it's a repeatable environment and you're getting stable results. So those tools don't exist to the best of my knowledge from my SQL and Postgres to the degree that, for example, Oracle has them. So this is, you know, this is another thing that they didn't pan out. So what ended up happening then right. Well interestingly, what happened is a bunch of our customers tuned their production databases directly. And we've, you know, kind of earned that trust and we have various safeguards in place to make sure that, you know, we don't harm or, you know, cause any degradation to their database. I'll go into that later. But this graph right here is just showing that actually 55% of our customers I collected this date on Friday, these are active customers who, you know, have applied configurations within. I know that like the past week or so, and 55 of them, 55% are actually tuning their prod database directly. Unknown here, like actually the way that I determine this is I look at the name of the database or the tags, you know, the AWS tags in the environment, a lot of them will just say prod. So unknown just means like there weren't any hints about what kind of database it was. All right. Can you, can you say how you got the hints? How do you know what the type is? Oh, sure. I just, I hold on. Sorry. Okay, let me go back here. There I am. Basically, like, so the database identifiers is the name that the Amazon lets you assign to database and a lot of people will just name it like my prod DB or like, I don't know. EU dash something dash prod EU dash app that Apple and dash prod. So that's, that's one way like if they have prod and the name of the database, or if it's an aurora cluster, it's the name of the aurora cluster, then that's one of the hints. Another is the Amazon lets you assign tags to various resources, which can be helpful with like billing or just trying to figure out, you know, classify what which of your resources you're using or for production versus staging versus dev. So a lot of people will use exactly those tags like prod staging dev to distinguish their resources. So that's the other hand. Thanks. Yeah. All right. Yeah. So collecting large amount of high quality training data is hard. Yes, it's very hard. Um, so, you know, Oh, I should have switched these points, but that's okay. Um, so one problem is that like workloads are not static at all. Like honestly, I just popped in, looked for a customer, you know, look for a few minutes for a customer and immediately like, here's this, you know, and this image of the CPU utilization that came up with like, this is over the course of a month. As you can see, you know, it looks like there's, there's, you know, a period where the workload demand increase over the course of the month and then kind of it. What's to say is like it's not a rep, it's not a repeatable baseline. It's, it's not a repetitive pattern necessarily. So this is, you know, one way in which like the data is just itself not as high quality. And so how do we, you know, how do we solve this as a company? Well, luckily, this is, this is a pretty noisy example. Um, a lot of our customers, you know, we support my SQL and Postgres, so it's no surprise they have transactional workloads and they tend to be quite cyclic and you know, exhibit like, you know, day night diurnal patterns. So, therefore, like, whereas we were using the observation period of five minutes in the research paper, we now use an observation period of 24 hours because that usually kind of captures the full cycle of, you know, the workload. But what this means is, you know, whereas we were could capture 288 data points a day with the research. We're now capturing one a day. So, you know, collecting this data, even though we're a company now, we have real workloads. It's, you know, we still don't have the data that we need actually next bullet. We still don't have the data that we need to build generalizable models. You know, for that we need training data for diverse set of training data for workloads, hardware configurations, and probably more. So the, you know, what that means is that basically like a lot of a lot of our work on like the workload characterization building those generalizable models and, you know, trying to reuse training data. We just don't have the data available to be able to do that. So we have to, you know, try different strategies and I'll be going into those a little bit later in this presentation. Yeah. And so we just don't, yeah, last points, we don't have the training data to use such techniques effectively. So the other major differences, of course, is we have real customers and the number one role of customers is you do not crash their databases. We've, I would say we spent a lot of time and put a lot of thought and effort into trying to build trust with our customers. Interestingly, like when Andy and I, you know, and Bohan incorporated the startup. There were really any products like ours on the market, you know, like I think the distinguishing factors we can actually take, you know, a configuration and and apply it on behalf of the user. So we really do automate the full loop. And there weren't, I don't believe like any companies that we could, you know, really look to for how they handled that. Yeah. So, so, you know, overall, we have to be very careful with, you know, what we're applying to the database. We, we spend a lot of time thinking, you know, it's not just I would never like just throw the algorithm or just pass along to the, sorry, the knob characterization or. Sorry, the feature importance algorithm for knobs. I would never just, you know, take the output of that and hand it to a customer. You know, we really do have to have to look at, you know, what we're, what we're recommending and make sure that it's safe. Customers care a lot more about stability than peak performance, peak performance is inevitably what we, you know, measuring when we in research papers. It makes sense, but at the end of the day, like, if you're getting a if the customer workload is stable and like, you know, performances up a bit like customers are happy. The other major thing that was hard for us was time to value. So, how, how do we show the customer, you know, how do we show customers the benefit of using auto tune and days. I mean, ideally would be minutes. But, you know, like I said, once we change the tuning interval to 24 hours, you know, it could be weeks before, before they, they act before the algorithm converges on the, you know, a near optimal configuration. So we've, we've had to think a lot like this was, you know, a lot of the incentive behind offering health checks among like index and query recommendations is just to, you know, we need to provide value immediately. And then the other thing that goes along with this is we also need to set customer expectations on, you know, the length of time these things will take a lot of customers just don't really they want something where they just click a button and everything works. But some of them will, you know, want to add every single knob, like in the whole entire, you know, selection of knobs that we like add the one add and I've seen like two customers in our time, add the full selection. And of course, like if you do that, I mean, the time they would take to converge or tune is just astronomical. So, you know, we, we've been trying to think about like the best way to educate customers about sort of like the behind the scenes, the machine learning without, you know, divulging too much, I guess. So that brings on to keeping it straight. So, you know, to overcome some of these challenges, we've put, you know, it's a data of some questions that Chris is in chat. Sorry. Oh, I'm sorry. So, Ashkay, do you want to unmute next question? Yeah, so I was just wondering so what happens if auto tune itself crashes so do you like versus the information somewhere or it's like the customer or it's like the database is responsible to versus the configurations that auto tune recommends. So I think, okay, so the question is like what happens if auto tune crashes. Yes. Luckily, we haven't had to deal with that but let's what what does happen sometimes is like the client side agent will crash. And so I think that the question is like, is the question like how do we recover those configurations or how do we guarantee that they're applied or How do we recover the configuration. Okay, well, so the configurations are stored in our back, you know, stored in our back end database. But is, I would say like we can easily recover a similar configuration by just rerunning the algorithm on the same data. Of course, if we lose all the data, then then we would be in trouble. Okay, thank you. Amazon does versioning for the configurations as well. So Amazon's storing this stuff and that's persistent for us as well. Okay, thanks. And then when has have you looked at evaluating the contribution contribution usefulness of a particular data point. When do you want to extrapolate on that. Well, you're very low your volume. I can't hear him. Yeah, so he's asking. He's asking, like, have you looked at say like for a particular data point, I guess across multiple data points in a time series is one more valuable than another. No, I don't know. It says from a few slides ago looking at five minutes of data versus one day of data. What what like why do you choose one day versus five day five minutes. Oh, oh, sure. So the reason that we choose one day, I probably just sped right through this is because most of our customers have a periodic workload patterns where, you know, they there's there's really like no activity at night. And it starts picking up in the morning gets busy, you know, high demand in the afternoon and then goes down in the evening so we can kind of sort of generalize and try and pick good default values. Based on the characteristics of most of our customers workloads. So that's why we pick the 24 hour tuning period specifically. You can imagine that actually and we do have this sometimes and we're able to guard against a bit, you know, and ignore weekend data but a lot of times like over the weekend, the data that we collect will not be as will not be representative of the actual workload. Because, I don't know, maybe the business isn't open depends on what it is right. So those data points kind of are misleading for the algorithm. That's a hard problem. And we like I said we can we can or we can data but that's about the best we can do. It's just a matter of collecting more and more data at that point. And I'll go in actually what I'm going to go into this section will touch on that a little bit more. It becomes less of a problem does smaller the search spaces. Any other questions Sandy. I have a question. So I was wondering how auto tune is able to balance tuning parameters like where optimization objectives maybe there's some optimization related to latency or the query of as query runs. There might be different optimization objectives as I understand that my conflict and the tuning parameters I'm assuming is like a limited set. Maybe you have 10 or 20 parameters that you might choose to optimize for auto tune. So, with such a small set and the objectives that can vary how do you balance, which values to choose for any given parameter. Great question. So I, the way that we do this, which I'm actually going to go over in the section is we constrain knob ranges. And this is most important for knobs which many of them are like resource related knobs right so knobs that control memory buffers, knobs related to the number of processes. So we basically like we know the customer's hardware, and we, you know, purposefully choose conservative knob ranges that are the values that that these knobs are allowed to to take during the tuning session. They don't take any gent usually any like value in between those ranges, but not outside of them. Does that answer your question. So I understand that the ranges exist. But do you somehow prioritize the performance objectives somehow, or do you allow the user to specify which is more important to them. Yes, yeah, so that, that is correct. So when the user first starts using auto tune. We have a tuning options panel where they they go in and they can choose from a number of, you know, predefined target objectives that that we've added over the past few years. So that would include like the query latency, throughput CPU utilization, the for our customers, you know, minimizing read write IOPS is important. We haven't actually gotten any requests for additional to target objectives. But yes, we do let the user define that. Thank you. Yeah, so let's, let's go ahead and jump in and, oh, and I'm running out of time. Okay, I'll try and speak through a little bit. All right, so, so. Sorry, kind of rushing through this thing is that back the way that we solve a lot of these problems, which is kind of a no known research is we incorporate a lot of domain knowledge. The reason I say it's a no known research is because you know that's kind of, I went over some of like the. I went over our goals with like the first order to paper and we wanted to automate everything so that somebody who has no knowledge of the database can go in and use the algorithm and you know it spits out. Like, they can use it without any domain knowledge. But realistically, like in, in the commercial product. Corporating domain knowledge has been very, very helpful on two fronts. First, it, it's, you know, sort of the way that, that we implement a lot of the safeguards. I'd already mentioned the knob constraints, which I think I'll go over the next slide. But then it also allows us to, to reduce the, this, the search base of configurations, which is incredibly helpful as, because right now in research I frequently look at papers and the convergence time, you know, can be dozens, hundreds. So, you know, we don't need to reach convergence at order to, but we do need to start seeing a benefit. And I domain knowledge is the way that we do this by restrict using basically what we know about my sequel, what we know about Postgres, what we know about databases, what we know about workloads to, you know, try and really restrain the search base to the values that, that might be good. So, yeah, like in general, I think research is spot on, like, for most databases and workloads, there's probably 20 knobs that are really going to matter for that database. The database is going to vary by workload, but, you know, trying, like, being able to narrow down the search base from, you know, hundreds of knobs to just, you know, 10 to 20 is, is very big. So, you know, like we, we actually did this in the academic project, but some of the ways that we do this are we remove, you know, knobs that that really aren't tunable, and it makes sense to tune like it's the port, or they require human content and, you know, could, you know, cause consistency issues and database. And, oh, actually, this is an interesting tidbit in the initial version of auto tune, we offered a database restarts. So customers could actually tune, you know, parameters like shared buffer where it required restarting the database. But in the history of like two years, nobody had ever enabled it. So we, we didn't, you know, the functionality is still available in our backend, but, but we just don't expose that to users anymore. So we can think of like a little bit more clever way to provide that. We also do, you know, use like automated feature selection techniques. We have a data analyst who does look at the data and we, you know, we do some, some automated analysis to determine, you know, the most important knobs, for example. But at the end of the day, like a lot of it is still manual. And, and sometimes you're, you know, looking, looking at all the knobs before they go into production. And right now auto tune recommends between 10 and 20 knobs per database systems, you know, for customers, they're just getting started with tuning. RDS Postgres MySQL have the most, you know, Aurora removes some of the storage parameters. So that's why, you know, they decrease a little and then serverless, I think removes a couple more. So there are fewer knobs to tune with Aurora and Aurora serverless. Question from the audience. Matt, Matthew, you want to ask your question? Sure. I was wondering, you're talking about configuration. I was wondering if you've done any machine learning on the output of explain for maybe high frequency queries or any queries. I am not. No, Andy, have you or has both? So we don't get, right now in auto tune, the commercial version, we don't get the output to explain from customers because you have to enable that. There's a, for Postgres, there's an extension to enable for the CMU database group of research, just please separate from auto tune. Yes, we do ML on the output to explain analyze. So you can't, you can't generate your own queries. Is that in the product? We can't generate query plans in the product. I do have a line, I do have a student working in research to inject plan hits into Postgres using PG plan hits. Based on tuning optimizations, which is completely separate than the commercial product. Thank you. Right. Yeah, so just going over the, you know, considering nob ranges. This is just a couple of screenshots of our UI for the tuning options where you can select which knobs you want to tune and we provide for the, you know, between 10 and 20 knobs that are the recommended knobs. We come up with the bounds for, you know, for users. And honestly, it's a lot of work to come up with eight bounds. So it's, it's pretty valuable. Users, if they click the show other knobs button at the bottom, they can go in and, you know, to whenever knobs they want, but they have to pick out their own knob constraints. Yeah, like I mentioned, we, we do a lot of testing before adding, you know, new knobs and we really think about the, you know, the range of allowed values. And this is especially important for, you know, knobs like work men, which are known to be a little risky to tune. We obviously have to use domain knowledge. Like the, we know that those don't get a, you know, that's not just a single memory buffer in the database that gets, you know, applied per connection or it's more complicated than that. But these are important things for us to take into account when choosing this. And customers are allowed to go in and change this. But we are, you know, we, we know with high probability that those values that we provide customers are safe. Yeah, so what, you know, how else do we incorporate domain knowledge. So, you know, in order, another way that we can help speed up the, you know, the, the convergence speed up the tuning process is to help the machine learning algorithm discover, you know, good regions in the search space. So, one example of doing this with domain knowledge would be, you know, we can take the, the recommendation from PG tune. And, you know, use that when we're like bootstrapping, you know, when we're first starting a tuning process to try and kind of land around like a good search region. Yeah, and then, and then even after that, you know, so it's, it's helpful to try settings that are generally known to be good for most databases. There's no kind of best practice guideline settings. It's helpful to try those in the beginning of the tuning session. And then you can also try them later on. And we, you know, periodically kind of randomly will will include some some heuristic based recommendations when when we like send off the full recommendation to the to the user. All right. So manual controls and feedback so we're kind of kind of done with the domain knowledge bit for now. These are mostly just these are part of the tuning options panel. These are some of the safeguards and, you know, configurable things and the tuning options that we provide customers. So the first would be the tuning mode up here. So the tuning we have, we have three tuning modes at the very bottom you'll see self directed tuning. This means like we, we can't touch their, you know, their parameter group, which is how we apply the configurations. They basically, we provide the command lend prompt to that they can copy and then paste in their own terminal, you know, and use the AWS CLI to apply the knobs. So that's self directed. We don't apply anything on behalf of the user. The other two are where we do automate, you know, the tuning loop. The first one is auto tuning. So that's like full fledged. We don't ask for the user's permission, you know, when applying knobs. Of course the knobs, you know, are constrained to the values and that we showed in the previous slide. But, but it's just kind of, you know, there's, there's no human in the loop there. Whereas the second option, which is in the slide or in the screenshot, the currently selected one, it's called manual review. And it's basically like you as the user or somebody on your team needs to review and approve this configuration to be applied. And then we'll go ahead and apply it to the database. Oh, nice. Okay. So I was curious, like how, you know, who's who's choosing what. So it looks like only 12% of our active users that are, you know, actively tuning their databases are, are choosing the self directed tuning mode, which is pretty good if you look at it from a trust perspective. So it's surprisingly manual review is, is quite, you know, a bit more popular. But about 30% of our customers just kind of let us go at it. So that's, that's pretty impressive. And let's see what's the second. Oh, yes. And then, you know, what, what we hear from some customers is what we heard for a while was like they, we just provide a bunch of knob configurations and they don't really know what they mean. They don't really know why this is better than that value. So what we've been trying to do is like incorporate more explanations and graphs in some cases to kind of explain like why, why we're choosing the setting or, you know, why we're making this implementation. Oh, performance improvements. So, you know, like, the big question is always like, okay, so given all the noisy data and, you know, all of the, all of the challenges that we discussed in the second part of the talk like, are we even seeing any, are we providing users? And the answer is yes. And actually quite a bit of benefit. So I guess for RDS we're seeing on average, I think it's about like 35, 40% improvement whereas Aurora, we're seeing maybe closer to 50% improvement. And this is, actually this covers a number of target objectives. So it's just whatever target objective they chose. And Andy had asked, Andy had like mentioned, hey, you should try and figure out like why Aurora is more, why are customers getting, you know, more benefit with Aurora than RDS. And of course I didn't actually have time to like really look into it, but I was thinking about it and I'm curious if like the difference in, I think that RDS is closer to 20 knobs, Aurora is closer to maybe 12, I would have to double check, but I'm wondering if the, we're just tuning fewer knobs so people are seeing more benefit more quickly. I really don't know. So that's a, that's an open question that we're going to try and solve, but very interesting. You know, in some cases, some people are seeing, you know, a lot of benefit. These are typically people that, you know, I'm guessing have haven't really done any tuning on their database. So the number of people that have done zero tuning, you know, is around 40%. Quite a few people have done some tuning, which is, which is a bit surprising. And so presumably like the low values here might be people who not only did some tuning, but, but tuned it well. You know, hard to say concretely. Does that graph like combine these stats of like whether I'm trying to optimize for CPU or, you know, IOPS and just kind of combines them all together. That's correct. Yes. So is this graph corresponding to the initial optimization page based when auto-tune converges initially? I, I think the answer is the, I think the answer is yes. So, so basically we consider the baseline configuration, you know, the configuration we're trying to be the first configuration that we ever received from the user. So this is basically comparing the, I think, so Bellhan is the one that prepared this data and he probably like looked at the, you know, the data points, the time series parts that were relevant to the tuning. So times where we were applying configurations. I don't know if, you know, if somebody had their agent connected for three years, I don't think he was looking at that data. But yes, it's just basically like the best configuration that we recommended, the percent improvement over the baseline configuration. I guess what I'm trying to get at is once the model does converge initially and you have a good tuning set or parameters with the workload, I guess this depends on the workload changes over time. But what's the margin benefit over time as you continue to re-optimize? What, what, what has your experience been with auto-tune customers? That's funny. I actually took Andy's slide that kind of answers this. It's just at the end of the slide deck. I can go over it quickly if I have time, but that's, that's actually one of the open problems that we're trying to solve now. Most of our users, because auto-tune, once you get it set up is pretty seamless, they just kind of let it run, you know, for a long time and it's really easy to turn on again once you've turned it off. So that's not exactly answering your question. Like it, I showed earlier that, let me go back to it, that workload that there was very, I don't even see it in my slide deck. Anyway, sorry. It really depends on like how much the workload is changing, how much it's shifting, and that really varies between our customers. Unlike where I can kind of see the workload patterns and I can tell that like a majority of our customers have this. As far as, you know, analyzing like how workload shifts and when they need to start tuning, we don't have any like any solid analysis on that at the moment. But I'll just chime in, I'll just chime in and say that like it's very rare that we come across databases that are static, meaning they are not adding new features, which, you know, schema changes, query changes, or the volume of the data increases. It's very rare that databases are at least the ones that come to us at auto-tune are static where you don't need constant configuration. Now, you can get these sort of knob tuning most of the benefit in the beginning, because the default on Amazon is so terrible for both RDS and Aurora. But this is why we sort of expanded beyond just knob tuning, looking at query tuning, index tuning, identifying incorrect indexes, because that handles all the things, handles all the use cases where the application is evolving. And then people want to know whether their databases are still following best practices. This is Andy's graph that sort of covers it a little bit. I mean, it covers two different tunings. So initially the customer, you know, applied configurations, they went through a tuning session. We improved their query length C by 88%. And then they applied it, you know, what is it, like a few months later, and we further improved the target objective. However, in this case, I could see, you know, you can make the argument like both the workload didn't really change. They just didn't tune it long enough the first time. So it kind of does. But the customer workloads are pretty noisy, are pretty dynamic, and are frequently changing. And we can see that in the queries that, you know, the slow queries that we collect too. Okay. So I think, okay, so, you know, like, it's part of trying to demonstrate, you know, our value to customers. We've also come up with a health score, then encapsulates more than, you know, just the target objective that they're tuning in the knob configurations. So this will incorporate like their health with respect to like how we're scoring them for indexes, queries, database resources. So if they, you know, have super high memory usage or something, you know, look at docs and points. So in this way, like this is, you know, it's kind of like a credit score. It's just a little bit of peace of mind. I don't have the credit scores that much peace of mind, but it kind of works in that same way. All right, so with that, we shall conclude. Oh, yes, so a few open problems. I would love to see more sample efficient tuning techniques out there. There's, there's a tuning paper that was published back in 2022 called llama tune that I just love and it's, you know, pretty like what they their research and their techniques are pretty relevant to, you know, stuff that that we could actually apply. For to the auto tune commercial version. Workload synthesis, you know, trying to figure out how, like, all of this kind of goes back to like generate, you know, dealing with training data dealing with sample efficiency so trying to come up with benchmarks of workloads and synthesis that can, you know, simulate workloads that that are much, you know, more realistic than what we're saying with TPCC, TPCH, etc. And another interesting one is kind of the starting stopping criteria, which is sort of related to the previous question about like when, when do you like stop tuning and, you know, when has the workload changed enough to where to where you want to pick back up tuning because that could be a benefit from enabling it again. And these are just a couple of questions of the many. So that brings us to the conclusion, you know, auto tune. Oh, and it looks like oh interesting. I don't know if it looks like it didn't quite finish the conclusion but that's okay because it's the conclusion so in conclusion. Basically, we, you know, auto tune started out as a research project. It was a really fun research project to work on actually. We came up with a bunch of interesting techniques that ourselves, you know, and some of the other research that we see. It's a lot of like the assumptions and sort of the, the goals like, you know, people, I see a lot of tuning papers come out, you know, focusing on different algorithms. You know, like maybe not the most important thing, you know, on the industry side to solve for us but so it's, you know, this talk has been kind of like some of the modifications that we've made and to make auto tune a commercial product. And with that, any questions. Awesome. I will applaud on behalf of everyone here. So we have a few minutes for questions for Dana say any questions please please far away. Yeah, I've got a couple. I'll start with one Dana you mentioned earlier in the talk that one of the crucial elements if I got that right was for you to be able to look at a workload. And compare against a database of workloads and say which workload is similar to mine. So that I can better understand, is that a good starting point for the parameters. So just categorizing the workload is complicated and and the explain you don't get the explain out but so you're basically getting the queries and perhaps some stats from it. Are there some things you could disclose openly about how you compare to workloads and compute similarity between them. I think I'm sorry for the confusion in advance. My little hat, next to the other team did not work as well that that algorithm is only being used in the research version of order to. So in the research version of order to me basically came up with, you know, hold our different workloads were just different variations of TPCC, YCS fee, you know, and a whole bunch of other benchmarks that were in like all TP bench. So it, I, we haven't applied that algorithm in, you know, to the commercial version of order to because, you know, for the reasons you're mentioning. I think I think a lot more work is needed. We need, ultimately, we just need more data enabled in order to be able to build any sort of generalizable model to to be able to, you know, find similar previous data. And if you guys ever get interested in that Andy probably knows this is a ton of really interesting work on transfer learning for instance optimal optimization that will take some of those workloads cast it into these gigantic vectors at times that get used to encode things like last language models and they seem to work reasonably well, at least in some cases so I've got a bunch of questions but see if others want to go others and pop back into the queue. Obviously also the reason also it's hard to because the hardware varies so much. Right there's different sizes, and then you have to deal with provision IOPS and other kuchimas you can add so it's part of the reason we'll be a bit. Yeah, the data divergence is the other thing that screws things up. Other questions. Did you have a question about the kind of compute resources that you need to run auto tune. Okay, just like what's our compute how much can you do we use. Honestly, nothing special, nothing special. We're, we're not. Okay, the algorithm can be computationally expensive if if you include a lot of samples. So if you have a lot of data points that kind of controls like the complexity of the algorithm and so we limit. Oh, I forgot the exact number but we don't look any further back than like, maybe a few months we only use the data from like the previous few months versus the whole amount of data that we've collected so that does help quite a bit. And then I had a follow up above on as I guess database systems change and you get maybe more tuning parameters I'm assuming most tuning parameters probably changed stay the same years. But as you do get new parameters, how do you like what kind of a database model do you have to deliver auto tune updates. Well, like I kind of alluded to it, it's pretty manual. For example, like, isn't like, I'm thinking that there was some parameter that maybe in one of the later releases of Aurora, like, if you know, and this is a common problem it's like, if parameter X is set to zero then parameter why has no impact and, you know, they were wondering like, Well, why are you telling me the tune parameter why if if like, if this is the case that you know X has no impact and it honestly like that's that's another open problem it's it's hard to try and keep up with like all of those dependencies. Luckily, you know, my sequel hasn't been releasing, you know, too many knobs lately so so they've been easier to keep up with Postgres does really release a lot of versions but in general, there've been a few cases. Sorry, it's, it's ultimately just pretty manual at this point. I guess so I guess the team needs to constantly be aware of updates and the development updates would expect each database that supported. That's correct. Yeah, so luckily, like, whoever's working on it this this sort of exercise typically happens when you know Postgres releases a new version because you know they release much more frequently than my sequel so whenever we need to go in and start supporting a new version that's when we kind of do that that review of the knobs. Thanks for answering my questions. Thank you. I wait time for one more question, Gnash, you want to go for it. Yeah, I think it's two sides of the same coin. Do you have you thought of optimizing things other than the main database like a replica, or even consider whether you could look at the workload and say you know what you're in Postgres you'd better be you are better off, even in that same ecosystem that is Aurora RDS to move to my sequel or vice versa. So how do you try to apply it like outside that main box of just optimizing the master database. So we. Yes and no, I actually like I'll split this into two parts. So as far as you wanted the latter thing you mentioned is where we really want to be. What we really want to be able to do is be able to make instance type recommendations and you can imagine that that would generalize to, you know, potentially like choosing between like, oh when is, you know, Aurora better suited for your workload versus RDS. I think that that was like kind of one of your questions we are looking into the very early stages of that right now with our data analyst. But you know, it honestly like if you see that feature come out for us like it's great we'll get funding then like that that's such a hard problem. So I'd love to solve that. On the other side I think you also mentioned like we tried tuning replicas. I didn't really go into Aurora support but it's actually kind of interesting. We could do a lot there and the reason we don't is honestly for business and pricing reasons it gets a little tricky. But we have like, so AWS allows you to tune at the instance and at the cluster level for example. So you can imagine like some some like formula tuning there but typically what we tell customers is, well just tune your read replica then tune your write replica. You know, tune your other read replicas because in order to be able to do more sophisticated things like that we need to be collecting metrics, you know, like the database metric data, and we're having a hard time getting that information for Aurora for all of the replicas to be able to do something interesting there. So it just hasn't been prioritized. I don't know if that makes sense but. Yeah, that makes an excellent answers and awesome talk. Thank you.