 Carnegie Mellon vaccination database talks are made possible by Ototune. Learn how to automatically optimize your MySQL and post-grace configurations at ototune.com. And by the Stephen Moy Foundation for keeping it real. Find out how best to keep it real at stevenmoyfoundation.org. All right, let's get started. It's another talk today in our vaccination database and mark series here at Carnegie Mellon. Today we're excited to have Lynn Ma, and is a PhD candidate in my group at CMU. And his focus has been on the self-driving database systems. One of the big controversies about Lynn is that he was converted the most congenial PhD student in the Carnegie Mellon database group in 2017, 2018, but not in 2020, but not 2019. So as always, as Lynn gives the talk, if you have any questions, please unmute yourself, say who you are and where you're coming from and feel free to interrupt any time you want this to be a conversation. With that, Lynn, the floor is yours. Go for it. Yeah, thanks, Andy. And I want to say that I was not voted for the post-congenial PhD student in 2019 only because I was TAing for Andy's class, but that's his fault. All right, anyway, like Andy said, thanks for the nice introduction. I'm being from this very group, Carnegie Mellon database group. I'm pretty happy to be here today talking about our system, right? We call it a self-traumatic database system named NoisePage. All right, so I've on the first mission, I'll discuss a little bit about why we are doing this. So basically the database system have already become an essential piece for many of the modern data-driven applications, right? However, they are also becoming more and more complex and very difficult to manage. So according to these reports, a personnel is already estimated to be almost 50% of the total ownership cost of a database system. And then more than 70% of the database administrators, or we call it DVAs, actually think that performance tuning occupies most of the time. And this process, this database administration process, is not only laborious and costly, but also is pretty difficult to scale. So according to the same set of reports, a more than 70% of the DVAs are also managing an increasing number of databases over the year. And then according to these same reports, and then for large corporations or cloud vendors, they actually even need to host thousands or even millions of databases at the same time. So we think that this manual database administration process has really becoming a bigger and bigger impediment for modern data-driven applications, right? So what are the existing solutions for this program? Well, there are many existing tuning tools developed by vendors and researchers over the years. But the problem is that most of these tools still require extensive human guidance. So in a typical scenario to apply these tools, a DBA will first need to prepare a sample workload of the application, and then prepare some spare hardware and then fork a copy of the system. And then they will run those tuning tools to get a set of change recommendations. Then the DBA needs to examine these change recommendations, pick the best one using their domain knowledge, and then decide when to apply those change, and finally apply those changes manually. And oftentimes, they actually have to carefully do this, do apply those changes when the workload volume is low, such as 3 a.m. or 4 a.m. in the morning, which is not really a pleasant task. And furthermore, most of these existing tuning tools only focus on a single aspect of the database at a time, such as indexing, partitioning, or not tuning. So essentially, what this means is that the DBA needs to repeat this onerous tuning process over and over again for each of these aspects of the database. Now, recently, there's actually a push from cloud vendors to provide more automated cloud database services. So I think that the two more prominent examples of this are the Oracle Autonomous Database, as well as the Adrum Automatic Index Management for the Microsoft Azure SQL. And then a majority of these services will actually just run their tuning tools, just like what I discussed, just in a loop, leveraging their cloud infrastructure. So the issue is that this actually still requires pretty expensive exploratory tests with those recommendations. So the vendors will first need to a fork database, and then forward some workload traffic to that fork, and then apply and test all the different recommendations from the tools to see which change is best, or if any change is good. And also, most of these kind of services will still focus on a single aspect of the database at a time without a holistic view of the organization. And lastly, most of these services are reactionary to the shifts in the workload patterns, which means that they only address the database administration problem after the problems occur. So all of these motivate our exploration on a new generation of self-driven database management systems. So we define a self-driven database as a database that can configure, tune, and optimize all system aspects without any human intervention. So these aspects would include, for example, physical design, such as building indexes, or data placement, such as designing hot or cold storage, and then SQL tuning and all tuning, or even scaling resources up and down. However, there are a few aspects of the database that we think just fundamentally require human judgment, such as security or access list control. So we think a self-driven database can't really automate those aspects. And then we think there are a few reasons that make it possible to build a self-driven database right now or today instead of 20 or 30 years ago. So the first is that in this data-driven area, not only the database system can store more data, but also it can collect more metrics and more stats about the system itself so that we can extract knowledge and patterns out of those data to help the system control itself. And second is that with the improvements in hardware technologies, not only the database stores more data for the user, but also it can process data much faster and then retrieve information and also perform the calculation for the self-driven operation faster. And lastly, the recent advances in AI or artificial intelligence and machine learning also provides more convenience tools and better algorithms to help us to achieve a such self-driven goal. So now let me introduce at high level the self-driven architecture that we are developing. But before I get into the details of a self-driven databases, I actually want to make an analogy to self-driving cars. So I should clarify that. I'm just explaining a very simplified view of self-driving cars. And in actuality, self-driving cars are much more complex, but I think this will be a good analogy for the understanding purposes here. So first, a self-driving car would need a perception component that uses its cameras and redars to observe other vehicles and pedestrians on the road and then predict where they are moving to. And the second, a self-driving car has models about the effects of its potential actions. For example, if you turn the steering wheel by 20 degree, or the car needs to know what will happen. And then for cars, usually these physical or mechanical models directly embedded when the car is built. However, it's a little bit different of our self-driving databases, which I'll explain later. And lastly, there will be a planning component that uses the road perception as well as the action model estimations to plan a sequence of actions to get to where the car wants to go. Then our design of a self-driving database actually shares analogies to the self-driving car architecture I just discussed. So our architecture first contains a workload forecasting component that can observe and predict what the workload that they have is going to execute in the future. So I think this is necessary essentially just because you need to know what you are optimizing for and then optimize for it. And second, there's a behavior modeling component that views the behavior models to predict the cost and benefit of different self-driving actions. For example, building an index or building an index or change certain knobs will be the effect of those things. And lastly, an action planning component applies actions to optimize the system performance given the forecasted database workload as well as the estimated action behavior. So in this talk, I will first briefly discuss one of our previous works on a workload forecasting framework for self-driving databases. And then I will spend most of my time on a framework that we just developed to build a database behavior models. And next, I will introduce at a very high level our ongoing work on the last planning component if you have time to do that. Lastly, I will mention how we are integrating all these components into the new self-driving database that we are building called noise page. So let me start with the first component, workload forecasting. So we contend that this is the first step towards building a self-driving database because it is very important to know what the workload is going to execute in the short or longer future. So this is important because there are certain actions that self-driving database needs to apply such as building indexes, partitioning the data or scaling the resources up and down that may take a long time to finish. For example, if you want to build an index on a table with 10 billion rows, well, finishing that action may take hours. So it might be too late that you only apply these expensive actions after you observe that the workload already requires them. And furthermore, you may also want to apply these actions when the workload volume is low to avoid resource contention. So now we define the workload forecasting problem for self-driving database as the ability to predict when, how many, and what queries will arrive to a database at a given future time point. So beyond this, there are also two important concepts. The first is how long into the future that we are predicting. For example, this could be one hour or one week. That's what we call a prediction horizon. And the second is what is the granularity for such prediction? For example, this could be a per minute level or per hour level. That's what we call a prediction interval. So a good workload forecasting framework of a self-driving database would actually first need to achieve good prediction accuracy for different combinations of prediction horizons and intervals. And then it also needs to capture the major database workload patterns, which I will give some examples. And lastly, it should also achieve a good balance between cost and accuracy. But there are a few challenges here. So the first is that in order to apply a self-driving database in production, the workload forecasting will need to capture and predict the workload patterns online and also handles the changes in the workload patterns dynamic. And second is that modern database workloads can actually have a high volume. So for the purpose of this project, we collected the traces for three medium-sized real-world database workloads, and all of them execute at least the millions of queries per day. And next, there can also be different patterns in different database workloads. For example, in the first workload trace we collected, which is from a Pittsburgh local bus tracking service, there is a diurnal workload pattern that follows the human living cycle. Essentially, there are more queries arrived during the daytime and on weekdays, comparing to weekends or the nighttime. And then in the second trace, which is from the CMU application and automation website, there's a different growth and spike pattern with a peak around the automation deadline. So although only different database applications may have a different workload patterns, but also different subsets of queries within a single database workload may also have different patterns simultaneously. So a good workload forecasting framework needs to capture all of these. So to address those challenges, we developed a framework called a prior about 5,000 to build a workload forecasting component for self-training databases. So at a high level, it works this way. The application syncytical queries to the database, the database forwards these queries to query about 5,000. And these are historical queries that arrive in large volume. Then these queries will go through the three steps of query about 5,000, namely pre-processing, encustering, and forecasting, and then I will briefly talk about them very soon. And after these steps, query about 5,000 generates compact prediction for the future secure workloads are used by the data self-training components. So first, as I mentioned before, there can be millions of queries arrived to a database system per day, and even just for medium-sized applications. So it would be a tool costly to capture the patterns for all of these queries and also a beautiful casting models for each single one of those queries. And there has to be some compression. So the first compression we did is just to extract the constant parameters out of those queries. And then we only record the patterns and build the forecasting models at these query template level. And in database terminology, essentially we just convert those queries to a prepared segments. And of course, the distribution of these parameters would matter. So we keep a certain parameter samples for each query template using a reservoir sampling. And then additionally, we also group semantically equivalent queries to the same query template. So this simple step already allows us to reduce the millions of queries that we need to forecast per day to thousands of query templates. So this is good, but then even with that step, it's still a tool costly to build forecasting models for thousands of query templates, especially if you want to use some advanced machinery techniques such as neural networks. So next, we have a class turning step that would further group a similar query templates together. So for class turning, probably the most important question to ask is just what is the similarity metric that you are going to use to group a similar query templates together? So there are a few options. The first is what we call a physical features, which are the runtime metrics that the system can record after the query is executed. For example, this could be a two box read, two box wrote and the query latency. However, the problem is that if a self-driving database applies certain actions such as building an index, but then those physical features, right? Such as number of two box read would actually change. And then in that case, the cost-turning results would be invalid and we need to redo the cost-turning and then build the forecasting models over again. So this is not really a suitable for algorithms here. Then another option is what we call a logical feature which are the information you can extract from the query logical composition. So in other words, this is the query abstract syntax tree. For example, what would be the type of the query will be the columns referenced or the number of drawings, et cetera. So this feature is certainly independent of the self-driving actions. However, in our experiments, we just found that there's not enough information in those kind of logical features to generate a good cost-turning results. In other words, the cost-turning quality is not very good and we couldn't really select a good great self-driving actions based on this kind of product clusters. So it's not really suitable either. So now what we eventually end up using is what we call the arrival rate feature. So our observation is that if we are generating models to predict the query arrival rate in the future, then why not we just directly grow query templates based on their arrival rate pattern and then we just need only to need to build one model for each arrival rate pattern, right? So the reason we can do this is that for many database applications, queries are actually generated in batches using tools like ORMs or stored procedures. So many subsets of queries may actually arrive at the database system at a similar time. Thus, they may just have a similar arrival rate feature. So to illustrate this, assuming that this is the arrival rate history of our single query template, what we're going to do is just to sample a few arrival rate values at a few timestamps and then to form the arrival rate feature and then do clustering based on that. So we found in our evaluation that this is actually a pretty effective clustering approach. And in fact, if we only consider the largest of five clusters based on such arrival rate feature clustering, then that five clusters already covers more than 95% of the total workload volume for all of our three real-world workloads. And essentially we're willing to build forecasting models for these largest clusters that can cover the majority of the workload, which significantly reduces our forecasting overhead. And then lastly, to build the forecasting models for each of these query template cluster, we investigated a number of popular time series forecasting methods with various properties such as whether they are linear, whether they retain memory or whether they use a kernel methods, et cetera. In summary, what we eventually found is that there's no single method performs best in all scenarios to capture all of those that different workload patterns. So in machine learning, one approach to address this is called unsung bill, where we just combine multiple models together with different properties to acquire better predictive power. And in this case, we found that the combination of linear regression and recurrent neural network and the kernel regression give us the best empirical accuracy. And there are also some other tricks that we apply here to improve the predicting accuracy, such as normalizing the input features or called whitening and predicting the arrival rates of different clusters together. So the model can use the arrival rate patterns of different clusters altogether to predict the future for the arrival rates. All right. Now I just want to show you one example of our forecasting results. So which is to predict the query arrival rates for the bus tracking application that I mentioned earlier. So I want to clarify that for the demonstration purpose, I'm only showing you the forecast of the total workloads in the future. But our framework is actually predicting the future of libraries for each individual query template used by the data software components. So here I'm showing you the results for two different prediction horizons. On the top is to predict the workload of one hour from now and then on the bottom is to predict the workload of one week from now. So from these results, we can first tell that predicting the workload of one hour from now is certainly easier than predicting the workload of one week from now. So this is good for self-loathing habits because the system does need to prioritize its optimizations to the workloads in the near future. And then for the workloads far ahead of time, there's always the opportunity to optimize them later. And then second, no matter for the one hour horizon or for the one week horizon, we can see that our focusing from our framework both generates reasonably accurate predictions. So actually now before I get to the next component, I'm wondering if there are any questions from the audience? I'd rather people interrupt and ask clarification questions, et cetera, instead of being confused about something and left confused for half an hour. I'm wondering if there are any questions? Thank you. Good. Let him keep going. Just keep going. Now, if you remember the self-proclaimed architecture that I mentioned earlier, I just described our framework to be the first workload focusing component. And we think this focusing ability provides the premises for the later modeling and planning components to complete the self-proclaimed operation. And the next, I'm just going to talk about a work that we recently did to build the behavior models for self-proclaimed databases. So the task of behavior modeling is to generate behavior models that can predict the cost and benefit for different self-proclaimed actions. So to be slightly more specific, the input of these models are the forecasted workload in the future, as well as some candidate actions that a self-proclaimed database may apply. And then the outputs are the cost best, cost and benefit estimations for that action on the specific workload. And this could include how long this action takes, how these actions may help improve certain core latency as well as the changes in resource consumptions, et cetera. So to further motivate the necessity of such a behavior models, here I'm showing you an example experiment where we remove an important secondary index from the TBCC workload and then we create this index back as a self-proclaimed action with two different choices of actions that use a different number of create index threats. So we will start the workload with no secondary index. And as you can see at this stage, the core latency is relatively high. And then we apply such create index action at around 50 seconds. We can see that right after the action start, the queries actually become slower because the index creation is competing resources with the regular database queries. And then using eight create index threats will cause the queries to be slower than using a full create index threats because more resource consumption. But then the index creation finishes much faster with eight threats. So the queries become faster earlier. And we think this ability to accurately predict the effects of these self-proclaimed actions is really the foundation for robust and effective control of our self-proclaimed databases. But then there are quite some challenges. So the first is that their system is actually a pretty complex software. And if we want, for example, using a single monolithic machine learning model to capture all aspects of the system, including all types of workloads as well as all kinds of possible actions, the model can be really high dimensional and with easily hundreds of dimensions, if not more. And this will actually subsequently require lots of training data to train such a model and also pretty difficult to debug. And secondly, in the modern multi-core environments, actions and queries may actually run concurrently in a single system, right? So the models would also need to capture all kinds of interference between the concurrent operations, which can further increase the number of possible inputs to the models exponentially. And the next, while many of the previous works of database modeling focus on the modeling algorithms, there's actually a lack of a principled framework to generate the appropriate and sufficient training data to train these models. Especially for certain database operations, it can be very expensive to collect the training data. As I mentioned earlier, for example, if you want to know how much time to create an index on a table with 10 billion rows, well, getting that single label for that single action may already take hours, right? Very expensive. And lastly, for the practical application of a self-training database, these models should also have a good interpretability and debugability and adaptability. So to address these challenges, we present an offline framework that generates behavior models for self-training databases. We call it a model of Mb2 integration. So let me give an overview of the entire Mb2 process. So Mb2 first uses a set of specialized runners to fully exercise a different system components of a database system. And then these runners would use the same lightweight metric system to collect the training data and then send to a programmatic training framework that use data to generate two types of behavior models for a system. And namely, one set of operating unit models and then another separate interference model. And now I'm going into the details of this framework. So the core idea of Mb2 is to decompose the complex database functionality into a small and independent operating units or we call overuse to model separately. So the main benefit of this approach is that each model is low dimensional that does not require a huge amount of training data to train and also are friendly to interpret and debug. And in this case, it's also easier to adapt these models under a software update. For example, if we change a few specific aspects of the database system, then we only need to change the models for those specific components and we don't need to change all the models. So to give you an example, we decompose our noise page system to around 20 operating units overuse, including a building hash table or creating index. And I want to note that if you want to apply Mb2 to other systems, then the developers of the system actually resubmissible for decomposing the entire system into small tasks. And then the difficulty of which is dependent on how modularized the system architecture is as well as how familiar the developers are with the system. And that's actually part of the reason why we are building a self-training database from the scratch so that we can have a clean design and also good understanding of the system. And then for each of these OU model, they would have a specific set of input features that represent how much work a specific OU is going to perform. So these features can be different among different OUs. And these features may also include that they have these knobs that may impact the corresponding OU's behavior. So for example, the OU model for serializing the log records may contain the knob of serialization interval. But then all of these OUs share the same set of output labels, which would include the operating unit of OU completion time and also different resource consumption metrics such as CPU, IO, and memory. And note that for one type of resource, there may actually be multiple fields in the output. For example, the CPU metrics may have a CPU time or cache references, cache misses, et cetera. And then during the inference time or during the prediction time, the self-training database will just sums up the prediction of all the OU's to estimate the behavior of the entire system. So to be a little bit more specific here, we classify all the OU's in a database system into three high-level categories. And the first is what we call singular OU's for which the OU model will just predict the behavior of a single OU invocation. For example, what would be the cost and the completion time will be the memory consumption of building a hash table, for example. And the most of the OU's in our system belongs to this category. And the second category is what we call a batching OU's, where the OU model actually predicts the behavior of several OU invocations together in your fixed time window. So this is mostly for the database and maintenance tasks that would perform periodically, such as serializing the logarithms. So for these, for those kind of tasks, the amount of work to perform in each invocation is dependent on how much work is left from the last invocation. So it's kind of difficult to predict the behavior for just one single invocation of that OU. And then lastly, there's also a category of contending OU's where the internal synchronization mechanism of that operating unit may affect the OU behavior. So for those OU's, we would just include the contending information in their model input feature. For example, in this case, we would include the degree of parallelism for the OU of creating an index in parallel, because there could be internal synchronization algorithm, either let's crabbing or some weight on the comparison swap to finish, that would affect the OU's behavior if you create this index in parallel. And again, to give a specific example here, the build hash table OU model will have the number of rows, number of columns, column sizes, and estimate the continuity. Lastly, a related knob that will impact the hash table creation in our system as the input feature. Next, to collect the training data for these OU's, we use a set of specialized runners that we showed ourself that sufficiently exercise each OU through some SICU-based synthetic benchmarks. So, yes, please. Fast, potentially. Yeah, I'll keep going. So, sorry. I don't think that was a question. Okay. All right, you got it. I'm doing something. Then let me illustrate how this works. So essentially in our system, each OU is paired with an OU runner to humorate various possible inputs for that OU. So for example, in this case, the hash table OU runner will use a set of customized workloads to exercise the hash table creation with a different number of rows, different types of columns, different climates, et cetera, to get the labels for the hash table OU model. And then after all these runners, we would use a decentralized metric system that leveraging a thread local storage to collect the features and labels for these different OU's with low overhead. And after that, a robust training framework will search over a wide range of canonical and state-of-the-art machine learning algorithms to find the best machine learning model for each OU. And we found in our experiment that for the scale of our OU and the amount of training data we have, a gradient boosting machine typically performs the best. But the most important thing I want to emphasize here is actually that all these OU's are workload and dataset-independent, which means that they are generic models that we only train once offline, and then the self-drain database can apply them to any dataset or workload during the production. So another important thing I have mentioned earlier is that it could be pretty expensive to collect labels for certain database operations. For example, if you want to build a hash table or sort some data or create an index, all that may process billions of rows, but it could be very expensive to collect those labels. Then to address this, we leverage one observation inspired by a previous work on query execution modeling, which is that for many database operations, or we call OU's, they actually have no complexity based on the number of tuples process, okay? So what we do is that, what we can do is that if we fixed all the other input features about this number of tuples, and then we divide the output labels for these OU's by their corresponding complexity, complexity based on the tuple number, then we sort of get a per-tuple output labels that would actually converge to a constant when the number of tuples gets large enough, because that's their asymptotic complexity. So we only need to collect labels for these OU's with the number of tuples up to such convergence point. And we found in our system that this convergence point is typically a below one million rows for all the OU's in our noise pitch system. And with this approach, it would take roughly 10 hours to collect the training data that would exercise all the OU's in our system, and which we think is an acceptable overhead if we want to build a self-trafficking database. So now I'll just present our model generalization result in a single-threaded setting first. We haven't gotten to the concurrent setting yet, so right now it's single-threaded. So here we evaluate both OU-AP and OU-TP workloads, and then again, we always use the same set of OU models generated offline by MB2 for this prediction. And for a baseline, we compare MB2 against a state-of-the-art modeling technique on query execution called QPNet. However, as mentioned earlier, QPNet, like many previous methods on query execution modeling, do not have a training data generation requirement. So to simulate a production setup, we train QPNet on one of these workloads and then evaluate on some other workloads. So here are the results. So in those speakers, the lower the bar is, the smaller the errors are, and thus the predictions are better. So from this, you can see that if we evaluate the prediction of the two methods, on the workload that QPNet are trained with, then QPNet would actually perform similar or better than MB2 because that's the workload that the QPNet is trained on. And also QPNet has some specialized model structure that can capture the workload pattern. But then on the workloads that QPNet does not train with, and we do actually achieve significantly better prediction. And then this is just due to MB2's decomposed framework that can generate sufficient training data for each of these small overuse to build accurate models, as well as our output normalization technique that allows MB2 to generalize to the asset that a magnitude is larger than what's in the training data. But then next, let's look at how MB2 performs so far in an end-to-end setup, where there's some concurrency. So in this case, we execute the system with a certain canonical workloads while applying some self-driving actions. And I will save the specific workload and action details for the later. The thing I want to emphasize here is that the database system in reality often execute in multiple environments. And during this experiment, the database on average uses 10 to 20 concurrent threads. And in this case, despite the great single-thread prediction accuracy of MB2's models, they actually always underpredict without counting the resource competition among concurrent operations. And for example, the CPU contention. And especially during this time period, when an expensive self-driving action is applied, the models would underpredict significantly. So to address this problem, we introduced another interference model that will capture this concurrent impact. So what the interference model leverages is actually exactly the resource consumption labels are predicted by the OU models. So to be specific, the input feature of these models are some summary statistics of the OU model outputs for some concurrent OU's around in the same interval, such as their sum, their mean, their variance, et cetera. And then the output labels are the adjustment ratios between the actual OU metrics and the OU model predictions due to that interference. So similar to the OU runners, we also devised a set of concurrent runners that exercise various kinds of interference among those operating units to generate the training data. So to wrap everything up, here I'm illustrating the full inference of prediction procedure of MB2. So for a set of OU's that are going to execute based on the workload forecasting and some candidate supplement action, the OU model will first predict the behavior of each OU as if they run in an isolated environment. And then MB2 would compute the summary statistics of the resource computation among all these concurrent OU's. And next, our this shared interference model will take the summary statistics as the input and then output the adjustment factors based on the interference impact. So lastly, MB2 just applies these advance adjustments back to the original OU model predictions and then sums everything up as the inference result. So finally, as come back to the end experiment that I mentioned earlier, let's see how good MB2's predictive power is. So in this case, we simulate a daily transaction on an NITICOM workload cycle where we just alternate the workload patterns between TBCC and TBCH. So in this case, we shorten the entire workload duration to two minutes to accelerate our evaluation. But in actuality, of course, the database workload patterns are much longer duration than that. And then in this experiment, we also assume a perfect workload forecasting per 10 seconds focusing interval. This is because we want to isolate the prediction error of MB2 versus the forecasting error. And lastly, since we don't really have a planning component yet, we use an Oracle planner to change a knob for TBCH and also build an index for TBCC as the self-proclaimed actions. And in this case, we just want to see how MB2's prediction are in terms of estimating the cost and benefit of the actions but not picking the action yet. So now here are the results. So in this case, we first start the TBCC workload and then followed by the TBCH workload. And then we can see that during this stage, the query latency for both workloads are relatively high because they have a suboptimal configuration. Then the Oracle planner decides to change the execution knob for TBCH to improve the TBCH query latency. And we can see that MB2 accurately predicts such improvement. And later on, these Oracle builds an index for TBCC with eight threats. And again, MB2 successfully predicts that the query latency becomes worse while the index is being built but also becomes much better after the index creation finishes. And not only that, this decomposed framework also provides detailed insights with the predictions, such as how long this action will take, how much resources that action will need, and as well as which queries are improved or impacted by the action. So we believe that all of this information is a foundation for a self-firmware database to choose appropriate actions automatically. So now I already introduced the first two components of our self-firmware architecture. Lastly, I'm just going to briefly touch upon the last action planning component for self-firmware databases that we are actively working on these days. So for now, we just call this action planning component the pilot of the database. And the goal of this pilot is to choose the best self-firmware actions with a forecasted workload as well as the behavior model estimations as the input. So users of the self-firmware database will actually need to specify the self-firmware objective. For example, this can be a minimizing the average query latency of the 99 percentile query latency. So this is essentially similar to a self-firmware car that you will need to tell the car where you want to go. The car doesn't know where you want to go. Similarly, users of the self-firmware database will tell the database what their organization objective is. And then for this pilot, there are also three additional responsibilities I think. So the first is that the pilot will need to optimize for both the current and the future workload and especially under system constraints such as the memory consumption, maximum memory consumption. And second is that the pilot will need to decide when to apply these actions and also apply them automatically. And lastly, the pilot should also provide explanations for the past and the future of planned actions for debugging or auditing purposes. So to achieve this, we plan to leverage a framework from a control theory called a receding horizon control to build such pilot. So under this framework, the pilot would divide the forecasted workload into a few time intervals. And then at the beginning of each interval, the pilot only plans actions for a fixed amount of future intervals, which is called the planning horizon under this framework. And then during this planning horizon, the pilot plans one action for each interval and leveraging the behavior model predictions. But then after that, the pilot only applies the first action for the first interval and discards all the rest actions. So as the time advances, at the beginning of the next time interval, the pilot would repeat this process again. So this receding horizon planning framework gives the pilot the ability to optimize for the immediate workloads while taking into account of future workloads. And this intervals also tells the pilot what's the plan of apply which actions at which time point. And this framework is also a friendly to incorporate constraints, but I will skip the details today. I just want to mention that one challenge of this framework is that it would need to solve a pretty complex mixed continuous and discreet and constraint optimization problem still. So it is still a very expensive process to finish this planning procedure. So for this, we plan to leverage another planning technique with many recent success such as in the AlphaGo AI called multi-color research. So essentially what this method does is that instead of enumerating the entire exponential search space, this technique will just exercise a few series of randomized actions until some planning budget is exhausted and then just to pick the best series of actions so far. And of course, you want to be a little bit intelligent about this process so that you can bias the search towards more promising actions. It's not a pure random, but anyway, this gives you a trade-off between the action quality and the planning budget. And then you can have the option to control how much the planning budget you give to the system. And we are actively working on this component and hope to discuss more about this next time. So lastly, I just want to mention that how, to mention about how we are integrating all of these step framing components into our noise page system. So there are two sides of this architecture. So on the left-hand side, there's a C-partile side illustrated inside the core DBMS. And then there's also a Python side that holds all the models. So inside the C-partile side, there's the pilot and then a model server manager that can communicate with the Python side using zero MQ, which is a fast messaging library. And then on the Python side, first there's a forecasting model that are trained online with about 5,000. And then as well as there's also the, there are also the behavior models that are trained offline with about two. So when the self-training database executes, it will first send the workload trace to the forecasting models to get the workload forecast back. And then the pilot will search for candidate actions that may improve the forecasted workload. So while it is doing that, it sends the inference requests to the behavior models to get the cost and benefit estimation for different self-training actions. And then finally, using all this information, the pilot lastly decide the best action to apply. Right? And I also want to note that the forecasted workload as well as the planned actions are also stored inside the heavy system as tables so that we can directly examine the secure. Last, just want to mention that a noise page is an open source system. So if you're interested, could check it out on this webpage and also may even join our journey. And that's all I have today. Thanks for listening and happy to answer any questions if you may have. Okay. Possibly not. I will applaud on behalf of everyone else. So we have time for a few questions. So feel free to unmute yourself and fire away. Let's go for it. Yeah. So I'm not too familiar with the receding horizon control but database tuning is typically like a control problem, right? And reinforcement learning has been created for that purpose and it has a lot of good papers have been published, et cetera, et cetera. Can you compare it to deep, like a deep reinforcement learning approach what you have done here with the receding horizon control? Yeah, definitely. Yeah. Thanks for the great question Bethel. So I don't have empirical results to compare with, right? Because essentially that's a very different methodology and there's lots of lots of work that you have to do to build a end-to-end deep reinforcement learning system for the database system to achieve self-driving, right? So we don't have an empirical comparison against that kind of methodology. But at high level, we actually have quite some internal discussions about reinforcement learning in the early stage of the project. We actually spend quite some time investigating real-time topics and then thinking about whether we should do that. At high level, there are a few concerns which is typically reinforcement learning techniques would require a loss of training data and also reinforcement learning techniques may be difficult to generate explanations, right? Explanability, unlike this, receding horizon control is very easy to understand, right? Every interval is my action. And lastly, there's also the, I mean, based on my understanding of the machine and literature, reinforcement learning approach typically not great at data efficiency either, right? So especially considering a database system, the actions are pretty, some actions may be very expensive and also there can be lots of different actions. Essentially, the feature space is very, very large. So, yeah, the data efficiency, the complexity to get labels for reinforcement learning as well as the explainability. And lastly, the adaptability part, right? How do you handle, how do you change the learning framework when you update the software? All those concerns make us feel like for the immediate future, this kind of our kind of like a modularized approach may be more plausible, but I mean, as the reinforcement learning technique improves, maybe they got becoming more and more practical. That's also possible. This is just a methodology that we choose. Okay, thank you. Yeah, thank you. Okay, anybody else? So then I'll ask you a broad question. What's the, I mean, you've been working on this for several years now, what's been the sort of most surprising challenge that you've had to overcome in building the, like the self-driving architecture for this system? What's the one thing that you didn't think was it gonna be a big problem that turned out to be a bigger problem than we originally thought? So if we repeat the last sentence, I thought the first part, but I did not hear the last sentence clearly. Like what is something that, as you were building your self-driving architecture, that surprised you in being a difficult problem that you didn't originally anticipate? Ah, interesting. I would say, yeah, thanks for the question. I would say maybe that's the, for a many database operations, the intrinsic uncertainty of those operations may be the challenge that I encounter. I think that's probably the biggest when building the self-driving database. So what I mean by this is that for certain actions, for example, building an index, that's probably easier. And then you wouldn't, especially in memory setting, it's not that difficult to predict how much time it will spend. But then there are certain aspects of the system, for example, the concurrency control. In order to know, in order to really, really fine tune certain aspects of the concurrency control, for example, then it's not that easy to actually estimate the impact of, I mean, certain change. For example, how the transaction abort rate will actually change. I mean, if you change the concurrency control knob, right? So just this kind of intrinsic uncertainty involved in certain database operations that I think that's probably pretty challenging. Yeah. I also want to ask you a question, but an easy question. Our self-driving database is ready for prime time. And if not, when do you think they will be? Sorry, I don't understand the phrase. What do you mean by ready for prime time? Our self-driving database is ready to be used today. Like, do you think the technology is ready now? And if not, how, you know, roughly predict how long you think it will be. Even what you've seen in your own research. Another thing. Yeah, thanks for the question, but it's a little bit, also it's a little bit to have a one sentence answer because self-driving database can have different levels, right? So as we have, what we have discussed with others may or may not know. So yeah, if you are talking about the ultimate level in a sense that, you know, it's like, you know, scientific novel, right? You don't need to worry about anything ever. So we don't have is just always control itself really, really well, right? There's no need for any human whatsoever. Then that's not ready yet. I think that would be years to come. How long? I don't know, maybe still could be five years, 10 years, I mean, I don't know how long, but then there are intermediate level that I think actually achievable. And with this project, I mean, for example, with this, if we finish this project, I think for many operations, if you the, or many application scenarios, if the requirement is not very stringent, right? If for example, if the customer can sometimes are tolerate a little bit, you know, mistakes made by the system, right? Little bit, sometimes the self-driving database may not choose the best configuration, right? Then if the system is good for 1995% of the time, the customers can accept that. Yeah, then maybe that's possible in the very near future, right? Depends on the requirement. All right, awesome, so Blaine, thank you so much. I will upload them after everyone else. Take care for everyone for coming. Thank you.