 The databases for machine learning and machine learning for databases seminar series at Carnegie Mellon University is recorded in front of a live studio audience. Funding for this program is made possible by Google and from contributions from viewers like you. Thank you. Hi guys, welcome. Another Davis talk here at Carnegie Mellon. We're excited today to have GN 10. He's a research scientist and director of the Intelligent Database team at Alibaba. He's here to talk about all the interesting ML stuff they've been doing inside of the various database products that Alibaba is in building. So as always, you have questions for GN. As you're giving a talk, please don't meet yourself. Say you are and fire away at any time. That way he's not talking to himself for an hour on Zoom. So, GN, thank you so much for being here. The floor is yours. Go for it. Okay, hello everyone. It's exciting for me to be here. Thanks, Andy, for inviting me. So now let's imagine. Okay, Andy gave me a title. And I decided to let me insert a modifier in front of the title. Let's call this intervention. Now the title is handcrafted domain knowledge augmented AI for databases. Almost immediately some red alarms are triggered in the back of the brain. Isn't that domain knowledge AI is from the first wave of AI. Now we have almost passed second wave using statistical learning. And now in the third wave using a large and deep neural networks. Are you anti-trend or reverse anti-pattern? So in the first wave, people did use principles and summarize best common practice. Recently, due to breakthrough in neural networks, especially transform based architecture, people realize that. Okay, using general model with less hand engineering when pumped up with large data sets can have some amazing emotional capability. For example, the same model can be used by completely different tasks and almost always give the best result. For example, in spider leaderboard challenge, which is to translate natural language question to SQL statements. Currently almost all the top ranked submissions or solutions are based on chat. Using either some kind of good retrieval methods to augment or some very well defined prompts. Prompts are great. The only issues that are sometimes it may not always follow your instruction. And for and to SQL, I personally think that prompts are a little bit too high level. And some low level control will be needed. Oh, here comes today's new message. So I think it's possible to insert precisely. In a very low level handcrafted interventions into the system such it is far more efficient and sometimes even more effective even small models can be effective can be mighty. AFDB is a big topic. And today I'm going to use two real system that have been developed from our team to explain to you why introducing introducing interventions can be helpful. The first system is secret bridge on SQL second is on sharply IQ about DevOps root cause analysis. SQL. So the input definitely includes the original question, but that's not enough. We also need the schema data information, including a table name, colon name, some other constraints such as type information, joint relationship, primary foreign key information, etc. Sometimes you even need to consider SQL dialogues, for example, SQL top and the limit that are used by SQL server and my SQL respectively. Here is one real example. On the right hand side is the SQL statement generated from SQL bridge. I use this example to illustrate the challenges in deriving SQL statements. So the question is what's the average life expectancy in the countries where English is noted official language. At least there are three challenges. First, we have a in the well close, we have a sub query. Second, in the sub query, we have a joint relationship. Third, we need to map one phrase from the question, which is official language to one of the column that is called is official. Notice that here is official is of type Boolean. So we can only assign value either true or false, or maybe not as well. But here in order to make it to be correct, logically correct, we have to assign a value true to his official. And in the other query, use a not in operator. When we develop SQL bridge, one of the most commonly asked questions from my colleagues is since all the almost all the top ranked solutions from a spider leaderboard are based on chat GPT. Why not just a fine tune a large language model and optimize your prompts. If you answer this, let me try to compare GPT with SQL bridge using some examples. Here I want to emphasize the following examples are used actually are selected in favor SQL bridge. Therefore, don't don't read too much into it. It may be biased. My purpose is to show that even the amazing GPT for including the newly released GPT for turbo can still make mistakes. Okay, the first question is from spider funded the name of the maker that produced some cross in the year of 1970. The prompts is pretty long is actually from the number one ranked solution from spider before November 2nd. The prompts consist of two parts. First, you need to provide the data schema information. Then it uses this problem uses some additional algorithm to find the similar problems from some repository. So this similar problems include the natural question natural language question as well as corresponding SQL statement. So you can view it as if it's a retriever augmented solution. However, here GPT fuel for made a mistake in correcting in finding a generating a wrong column. This column even doesn't exist in the table. The correct one should be model model ID. This seems to be minor. Okay, however, even after you correcting this mistake, there's another even more tricky problem where model and make ID, they cannot be joined. There's no join relationship. SQL bridge always guarantee consistency. Actually, this is a not a new idea in the literature. There are a lot of work that uses this similar approach. So we all call engine is and is a transformer based and encoder decoder, it will generate some hidden space. Okay, and it will use a dedicated module to measure how similar this hidden state with respect to the columns that from a table. Okay, in other words, we always select a column that exists in the table. The novelty actually comes from how we make this to be principled, which will be explained later. Okay, on the schema graph under the assumption on the assumption that suppose our core engine, which is a transformer based network, gives the right column. And it just computed the minimum standard tree on the schema graph was a similar schema graph is just a tree where each of the table is represented as a node and it's connected by with these associated columns. And if two column has a joint relationship, then they are also connected. Okay, by searching a minimum span standard tree. You can see that we always return a valid path, which is also the shortest. This issue this actually is further illustrated in the by the second example. Here we have only one joint, but a GPT for have a three joins. There's even more tricky problem here. GPT for selected two columns. The first column actually is superseded by an aggregation aggregation function. In other words, this max aggregator will only return a single row a single element, but the second row actually potentially can return multiple rows, which is a set. Now we have a rule, right. So we need to insert this rule into the secret generation. The only issue is where and when. The engine, right. The engine is based on transformer. It has its own running mechanism. We don't want our insertion to blow up our engine talking about a rule. Here there's another rule. Suppose we have a group by then we can only select columns that have already appeared in the group by or we have an aggregation function in front of the column. The climax of this observation is to realize there's only finite number of rules for us to consider. Okay, here's another very important example. GPT for generate count population for the phrase how many people, which sounds plausible, but actually is wrong. The correct answer should be some because why why because population has an integer type. It makes more sense to make some mention of all this integer type to compute the total number of people right. The reason why secret bridge happened to make a correct answer in this case is because we actually use a dedicated class if classification module to handle all the operators. What's more important, what's more important, we can handpick some hand engineered features for example here we can use the type information to enhance this very simple classification module. Okay. Here's how this methodology of using dedicated module to handle com abstract concepts is a very powerful. Whenever we have concepts that are very difficult to be generated. We have two options. Okay, either you refund you you refund to the model using some similar samples, which require some manual label label in work. So we escalate this concept to be part of the grammar handled by this dedicated classification module. Yeah. Yes, here we have a question. Can you please explain what sequel bridges here. Oh, so bridge is just a name is an internal name for our product is. Probably you're talking about another similar work which uses a bridge, but a single bridge is just the name for our internal usage. I guess maybe he's asking like what is it actually just like a custom transformer that you guys. Oh yeah, actually, the intention is to bridge. Because we want to make it it to be in DB. This is to be a indie indie ML it will be part of the database or co engine. And it serves as a bridge to serve to provide the interface for the customers so that the customer can directly. Okay. input language model, for example, it can just select like a bridge. And then we didn't the parentheses just provide your natural language questions, then it should have automatically generate a secret statement and give you the answer. So maybe is this like Athena based. NLP based transformation, like not deep neural deep learning or does it use like a neural network. Okay. Yeah, this is our core engine, our core engine actually is based on a transformer architecture. Yeah, but it's augmented by grandma structure, we use grammar to introduce to induce some structure to the secret generation progress. The only issues that how to make this to be principled. As I mentioned, we don't want to make these interventions to blow up our engine, right. Because a transformer has his own running mechanism that it's a hybrid between having something like it's probably is the new is the underlying DNS something that is llama base that you find tuned or is it something else. Okay, great, great question. So, maybe I can, I will explain, we will see all the details later. But in short, it's actually, we all the novelty comes from decoder, the encoder is pre trained model. We have designed a new completely new decoder, but which will be explained later. Yeah, but let's first talk about these motivations I think this, this information this examples are very informative. We also direction on and explain why we choose this direction, which will be elaborated later. Okay, so let's see this is another example as I mentioned, based on the feedback from our customer enterprise customers, they told me also the train index is very important. It seems that this change is very easy. Right, it's simple it has a very nice structure is just the definitive the value in the current period, minus the value in the immediately preceding period, and I divided by the letter. It has a very nice structure. However, it's super hard to generate it correctly in a complex sentence. Why, because it may be modified by complex propositional phrases. Here's one question. What's the train index of the total number of stocks dealer way Ming Zhang had in January of last year. Actually, GPT for made a mistake in deriving the correct time. This morning, actually, I just checked the chapter the GPT for turbo. It turns out GPT for turbo is is much better it gives the right answer. Then I tested another relatively more complex example, pardon me for the small font, because this sequel is pretty long. The question is, was the chain index of my attendance rate for the last week. Okay, even though I myself have never asked such a question during my whole throughout my whole career, but I know this is a real question from our enterprise customer. Because it's from an enterprise customer. So we need some additional information to recognize what's the meaning of my attendance. For example, we need to provide the employee ID, and also the corporation ID. Those will be provided in the prompts. For challenge BT for us is provided as a configuration Jton file. Okay, if you look at the bottom, interestingly, turbo for GPT for turbo, even least steps in channel thoughts. However, steps actually are correct. Okay, the high level logic flow is correct. GPT for turbo know knows what it should do. However, it's still make low level mistakes in deriving the correct answer time as well. And it also even lost one critical information about the corp corp split. Okay, it's just our product is the name internally used internally used so far. It actually always generate a structured answer. Notice that here we didn't forget the corporate corporate split information. And if you really carefully actually even generate the time. internally we are using a dedicated module to generate the time so far we have tested a number of different cases, including to simple one to complex ones. It seems that for SQL bridge, it always, okay, so far always generate the correct answer for anything related to change index. Okay, so far, I think at least I have convinced you using interventions is at least a feasible solution, even though it's not a must is a feasible solution. Different people may have different opinions. Maybe next year, who knows judge GPT file may solve these problems completely. However, I can still argue. Our model has only less than 400 million number of parameters is ideal for low resource setting, but then you can rebut. I don't care low resource. I'm rich. So, can you really give me one explanation? Okay. Here, let me, let me make an attempt. So by drawing by drawing an analogy. Do you think is professional car racing is difficult. Even though I'm not rich enough to play this game but I think it must be difficult. But interestingly enough, I found that based on my experience from playing video game, I can almost always, I can almost always make the right judgment on whether I should turn left or right, even though I still constantly make mistakes in the low level. And these two levels seems to be natural now when we put into perspective a chance to get grandma hierarchy, because natural language is unstricted, but for SQL, at least the most part is context free. Here's one big assumption that I made for only for a to SQL tasks, only for a to SQL tasks, it may not hold for other tasks. It's relatively easy to understand the high level meaning because because of the engine. I said, I mentioned that we use a transformer based encoder decoder architecture. And it has a very strong capability to understand the height of a meaning. And then the long range dependencies. Since we are carrying the database for the fact data. So, it seems that there's no need for us to memorize too much world info world knowledge. So theoretically speaking, it's okay to use a relatively small model with the too many parameters. Okay. In other words, a small or relatively small model, it seemed to, at least it should work. The only issues that how to precisely insert this low level control. Without blowing up our engine. To this end, we introduce a new context free grammar, which will make the whole generation process to be structured. On a high level, on a high level, we only generate essential information, which will be assembled by the grammar on a low level since we know the grammar production rule, we know the rule that we are using. So we can actually attach associated actions. For example, here is a question. At least the contracts that are about to expire the call when the contract is selected. We can automatically attach some other colons to it. Our system is accurate and runs very fast, because it has only less than 400 number of a million number of parameters with a very low memory footprint. Okay, there's so yes, what do you mean what do you mean by attaching like it's in the output but like, oh, okay. It means that, yeah, you in the JSON file, we can configure whenever this column is selected, we automatically select the other two columns. For any action for any whenever this call is selected from this table. So this is just some action that we can easily configure. This is this is just to like, because it's like, it's one thing if it's the output, and the customer doesn't ask for it, then they get it, but you're saying for like if it's like a bunch of like nested CTEs or nested queries that ends up. No, no, it's not really, it has nothing, it has nothing with the nested query. It's just, for example, in the in the question, it only asks about information about least the contracts. So it only lists the contracts. It doesn't ask about the end date or committed information. So, but in order to, while in some of the enterprise application. So we have some default setting, meaning whenever the customer asked about the contract, we only, we will always attach the information about the entity and data information as well as who are the committers. So those are related to the business rule has nothing to do with the sub query. Yeah. Yeah. Is it fair to characterize this as saying, you know, there's this body of work in which people, this is now eight years ago when people are trying to make this work. They were looking at natural language grammars, and then trying to map it to SQL grammars and had a semantic layer in between. They would look up ontologies and stuff like that to try to find matching and they're two things they do they do the stynetry stuff for the joint stuff but for all the other right call of selections and stuff like that they'd use the natural language semantic information. Most of the kind of work largely for English, you are doing something similar except that semantic layer is replaced by this transformer. This probably allows you to go multi lingual pretty easily, but you still have the grammar to grammar mapping is that kind of approximately approximately is correct, but the grammar wise so and we realize that. So so first, the most of the traditional work. It seems that they their accuracy is not high. There must be a reason. Okay. Even though there's a lot of related literature that is great based on grammar, they try to generate all the token in the abstract syntax tree. I think that's not. First, it's not efficient. Secondly, it's not the right representation. Maybe it's not the right representation to generate all the other token. Therefore, we change the grammar. That's why we introduce a new grammar, which is relatively more high level. As you can see later, it also allows us to do this passing in parallel. Okay, one key observations that you see currently almost all the work only generate one single token from one hidden state. I think I'm understanding it. Then, when you train the transformer, do you train it on Sequel? You train it on the grammar that you have internally. Right. Actually, we needed to carefully design the training set, which requires labor to labor to human work to labor the data, but during the training we don't really need the grammar. I see. So you still train on Sequel. Got it. Yeah, however, I think it's implicit. Why? Because now, from our decoder. Got it. Our decoder will only generate segmented information. They will only generate segmented information. They have some implicit meaning that will be automatically matched to the grammar. Okay, so here is a key difference. Yeah. Okay, let's see. Well, there are a lot of work because there are so many for us to list on this single slice and since time is limited, I mean, probably I will just go directly to the fourth category. So, so in this category, there are many, many related work based on grammar. We learned a lot from them. I want to emphasize that we have some unique feature that we think are advantageous. For example, we actually instead of consider a subset of a Sequel grammar, we actually consider a super set and rely on the post processing to further reduce it. And then, more importantly, this grammar allows us to do parallel multitask learning. Okay, here is a very critical, usually one hidden state will only generate one single token. But here, a single hidden state will be passed into a multitask learning module, it will generate multiple tokens in our case. Therefore, our parser essentially is doing a parallel, parallel, okay. And we also introduce a way to introduce to make a long range dependency across multiple queries. Let me explain on this slide. This is the high level flow, which is what I called a parallel recursive descent parser. So the whole process will be divided into multiple runs each runs consist of three stages. The first stage. Okay, it's about the pre processing. It will only select will select all the columns that have already been selected from the previous sub queries. Okay, so this is our method to introduce long range dependency. Then it goes through the core engine. This kind core engine relies on the transformer encoder decoder encoder just a pre trained model, specifically grandpa. All the nobody actually comes from a decoder, which I will explain in the next slide. And this decoder will only generate a segmented information. It's up to the grammar to organize the information together. Since now this post processing module can look through the can check the current you currently used production rule, it can attach associate actions to insert interventions. Okay, so let's first take a look at the grammar. But wait a minute, isn't that the gram sequel already has a very well defined context context free grammar. Why introduce a new one. This is because this new grandma. Okay, it's a simpler version of this here is relatively more high level is a super set it gives a super set super set. So we rely on the post processing to reduce it. This grammar actually allows us to do multitasking learning during training phase. And in the inference phase, we can run multiple tasks multiple inference tasks in parallel, which is much faster. So we start from a query. And this query consists of a unit operation that is called a query and a query is just a sub query that doesn't contain nested query. Then we will use a placeholder. Okay, the placeholder will trigger another round to generate another sub query a query critically. Okay, here is very important. We introduce a production rule that's called a cat, which is a short name for colon action template. This cat will sequentially be used by select while having group by order by but not by from why because cat only select the colon and the from need to use table. Okay. So, essentially for secret bridge. The whole secret generation process can be viewed as a sequence of a cat's why introduce multitasking learning. So cat for cat. This production rule has seven slots. They belong to two categories. The first category include including aggregation distinct keywords operator and the sorting order. The key is to observe like this only fixed number of keywords. Therefore, we use a dedicated classification very simple classification module. Out of the transformer layer to handle them. But the corner and the value. They have a variable number of items. We use a ranking algorithm, specifically appointed network to rank them from the best best match. The novelty here is that all these multitasking learning multitasking learning a tasks actually there are multiple tasks right all these tasks are conducted on the same hidden space. So from a cat decoder, the cat decode network. It actually contains full level. It is a full layer transformer, followed by this multitasking learning module. So the cat decoder will sequentially generated hidden states, it was sequentially generated hidden states, but each of the hidden states will be shared by this multitasking learning module. This module will simultaneous generate multiple token. And it's up to the grammar. Okay, to organize them together. I hope that this answers your question. So the grammar is implicit in this does then the follow up question to that is. You have to feed this a large amount of training data right GPT for picks up every SQL it can get and on very fast amount of training data. Let me answer the question on the next slide. Okay, I will answer this question, but a quick question is that for us we just need a for us to fun tune just hundreds hundreds or at most 1000s for spider, you don't have 6000 samples and that's what we need. Actually, because we have some pre trained model of Bert model that supposed to understand the more knowledge from other samples. Okay. Okay, here's just one simple example that illustrated process to generate such a predicate in three rounds and each round there will share the same hidden state. Okay, now I hope that this actually are this picture can answer your question. This is a global picture about architecture. The encoder is actually just a 24 layer grandpa was grandpa is just a fun tune. It's a fun to the version of Bert with a labeled the SQL data. And because it's fun to. So, so, even though we did some modification, even though we did some modification, for example, we, we changed the position of embedding and increase some linking embedding, but that's not important, but because encoder essentially, essentially it's just a grandpa. So all the novelty actually comes from the design of the decoder on the left side we have this cat decoder, which only select columns. And then said we have this from decoder, we should select the table in the middle we have this conjunction network, which is just a simple feed forward the layer, followed by a softmax to select set operators. Okay, I think this example can further illustrate the whole process. This example, probably you can answer some of the your question, but the confusing part. So the input definitely includes the original question as well as the database schema. Okay, they are concatenated together and went through the tokenizer. And the, the pre trend the encoder will generate a bunch of hidden states as the reference, then the decoder will consume this reference hidden states. Specifically for cat, it will generate a number of hidden states sequentially. For example, the first hidden state, after it is generated, it will fit into our multi multi task learning module. This multi task learning module will automatically generate average life expectancy. This average is just an aggregation if you still remember is it comes from the classification module, the life expectancy comes from the ranking module. Okay, then in the next hidden state, we generate a US, which means end of the second sentence, it indicates that the select clause is completely finished. In the next state, we are going to generate the well, in this case, well actually generated a nested turn token, it means that it will trigger another round, whole round of the whole process to generate to find a sub query. Notice that here well close group by having an order, each of the clause contains a US and a sentence, it means that all these clauses are empty. Now, we have the order query, we can collect these two columns and concatenate it to the input, repeat the whole process, we generate the sub query, and then we will use this sub query to replace the placeholder in the order query. Now we get the complete query. I hope that this flow, this example is self explanatory. Yeah, this makes sense so effectively you have an explicit chain of thought processing style happening inside your, your, your network there. Yeah, if you, if you could get similar results and maybe that's part of what you might be exploring next. If you ask GPT with a chain of thought. And then prompt where it is to have to think about each of the clauses one at a time doesn't do better. Yeah, that's a very good point. Actually, this is also the the earliest point of the made, I think, especially for if you have many, many detailed instructions. But at least right now, I don't think GPT for GPT for turbo can completely follow all your instructions. Maybe GPT five can completely solve it. I don't know. But so far, if you make this prompts to be even more complicated, maybe they will work it but it may miss some of the necessary information. Okay, this is one point that I make, but I'm using a grammar, you never miss these rules, right because it's part of your grammar is part of your procedure. It's never missed. Okay. Essentially, we are conducting top down a BFS breath first search traversal for for this algorithm. Okay, I think the time is limited to let me jump directly to the. In the experiments, we conduct experience on public test set from spider note that this table doesn't contain the results on private test data set for hard and extra hard questions, which means that they contain join and nested queries. You see, SQL bridge has a clear improvement. In the future public said we also asked our collaborators to test on their private data set. And here are the results. Okay. Okay, so up to now. Yes. With the literacy that private data set in English or was that some of the language did that matter. Yes. Actually is in Chinese. Our collaborator they tested in their is Chinese. Yeah. GPT to GPT for seems to be doing well on Chinese but probably not as good as English. However, here the reason you may ask you may wonder why you only test the GPD 3.5. For example, the reasons because we did test, not me, our collaborator they did test the GPT for but somehow it never finish. So they only reported this result to us. Okay. Okay, now let's talk about a jump to the second topic that was of course analysis. How many minutes. We'll jump to that just on that first part. I know you're running short of time. Yeah, have you board bench or are you planning to evaluate this on board bench. Oh, so again, what word bench. Yeah, that's a new benchmark. Oh, I see. Word bench and be more comprehensive. I see, I see. I find because I know we are playing around with data chat so happy to exchange notes. If you're interested. Oh, great, great. I think I frankly I think so far in secret bridge probably maybe one of the reason why challenge BT. Maybe it's not that there are main focus. I think one challenges that we don't have a very big training set so far the training set is doing very well and, for example, spider is very helpful. There's a lot of guidance, but if we have even bigger one that covers different say aspects of the grammar and across different domains probably that will even better, because we do have. We do observe a phenomena phenomena that one year ago, we tested on spider which channel on spider and some other public available that assess from that. It has some distributional shifts to the private assess. So if we have more data sense. That will be very helpful. Okay. The reason why I picked the second topic is because it uses a so called causal analysis and where interventions actually widely used for causal analysis. So, they call it the best database usually uses a so called microservice architecture for microservice for microservice architecture, in order to do this local analysis is triggered by an entity and the KPI indicator. So, whenever, for example, here is just a delay. So under the hood, we have some kind of anomaly detection algorithm that trigger the root cause analysis. So the goal, usually for root cause analysis, it requires some domain knowledge to design a course of graph. So the goal is to quantify the influence of all the factors on this course graph and find what's the most likely factor that lead to this abnormal KPI. For example, in this case, we have this red box. And let me this is for the for the Chinese character because this is actually copied from a real system that running internally. Okay. Why, why do we care about the cause analysis. So once upon a time there was a real failure. On our cloud platform system. All of a sudden, a lot of the operations stop responding. And the system generates thousands of normal traces per minute, and each trace actually contains hundreds of course, let alone we have thousands of machines. And each machine can have a number of metrics including CPU memory and a number of waiting threats. So it's important to automate the diagnosis. What's the cause graph from a microservice system. In order to answer that first we need to define me introduce an a working definition for context propagation, which is just to weave together the measurements, including metrics and logs from individual nodes and collect this really traces by attaching a unique ID to the request when this request traverse the microservice system. I think this is one real example I think this is a very intuitive and self explanatory. But the critical part is that to observe that this soldier line represents a synchronous call and that the dash the line represents an asynchronous call. For example, D one course database, followed by operation P one for post processing that the delay actually is equal to the summation of these two periods. However, for one, one is the parent span of D one and D two. They run in parallel. The delay actually is determined by the critical path, which is defined to be the one with the maximum delay. So this is what we call the max plus calculus. It turns out that microservice system is very easy. We can directly derive the course graph based on the RPC call a query relationship. However, it's not complete. Here we list the six types of issues. We observed on our cloud cloud database. In addition to delay, we also have a utility. Resource utilization. And some additional events. For example, software upgrades. So it requires the knowledge or expert to add some additional factor into the course graph. Because different course graph can lead to different root courses. Suppose we decided to add the CPU utilization into this course graph. We need a function to characterize the relationship with other with a delay, for example, one approach is to demand is to use the main specific model, for example, QE model. Here, a poly kitchen formula is one from a Q in theory. Okay, here's the framework of our sharply IQ. It consists of for the pause and the backward pause. The for the pause evaluate all kinds of counterfactors was a counterfactory. A counterfactory is just a what if question. What if you change a subset of factor from normal to abnormal. What can you observe. So that's the kind of factor. Then in the backward pause. We collect all these elevations and make a summary to compute an influence score for each of the factor. Here we are using a sharp value with a new splitting your violence action. Okay, so for the for the pause, we need to define a value function, okay, a value function for each subset of the factors. The tail is like the dog. So the weather function actually is the change of the KPI when all the factors in this set change from normal to abnormal. And all the other factors remain normal. Okay, here's one example. Suppose P1, okay, changes from two seconds to three seconds, it increases by one second. Using the max plus calculus, we can compute its impact on the energy and the delay. Okay, it's actually also equal to one second. So that's why the value function we P1 is equal to one. Similarly, we can derive the weather function for P1 D1 D2 and D1 D2. Okay, so, so we the next we go to the back backward pass. In this step, we need to collect all this evaluation from the folder pass collect all the evaluation and make a summary. Suppose we are computing the influence of P1. We need to enumerate all the permutation for the factors. Here we have four factors. Then in total we have 24 number of enumerations. Now suppose for one of the for each of the presentation suppose we are P1, we are sitting in alarm right. All the guys sitting in front of me. Let's name it to be set as the marginal change is defined to be the difference before and after adding me into this set. For the first line, P1 is the first guy right. Nobody sitting in front of him. So the marginal change is equal to we P1 minus we know we P1 if you still remember is equal to one from the previous slide. We know is naturally defined to be zero. That's why the marginal change for the first line is equal to one. We repeat the whole process for each for each of the line and then take average we computed the influence for P1. Now we can repeat the whole process for each of the factor here I want to emphasize that this is only a conceptual illustration. Why because this conceptual illustration has a exponential computation complexity. We actually have a farmer efficient algorithm by exploring sparsity. Okay, this complexity is all n log m, but let's use this state with this conceptual illustration for easy for easy easy understanding. The procedure that I described in the last slides exactly gives the sharply value, which is unique under the following three famous actions proposed by sharply. However, it's not enough. Why because the traditional sharply value only characterize the discrete factors, but here we have a span. Each span represents a continuous time period. A time period can be further divided into sub sub intervals. So we introduce a probability that is called a splitting in variance. In addition, we also needed to take a causal relationship into consideration. In this line of research actually shape is the first one that introduced sharply value used for explainable m error, but it doesn't consider causal relationship. The top work is called asymmetric sharply value. Even though it considers causal effects, but it's not invariant on the splitting. Here's one example that illustrated splitting variance suppose operation C cost D in a for loop. On the right hand side you see the influence keeps decreasing when you increase the number of for loops. But intuitively, intuitively, if the total length of the whole span remains constant, then we shouldn't. Then it should give the same influence. As we the asymmetric sharply value is not consistent consistent with our intuition. Okay, here's the example. Well, it's accurate, and it runs very fast. And usually orders make it faster than neural network based approach. Okay, now let me finish my talk with the concluding remarks. DB actually indeed is a is a very big topic. Our team primarily focus on in the BML, including a sequel and a lot of function for in the B inference, as well as a ops, including scheduling a normal detection of the course analysis and nob's tuning. Through all these details from today's talk will be forgotten. But I hope the spirit can can still go through one of the key messages that is possible, even though it's not a must is possible to insert low level controls precisely into the right place at the right moment to make general AI to be far more efficient, and sometimes even more effective. All of those can be mighty. I personally believe that this design philosophy philosophy has a lot of practical value, especially for low resource setting with the distribution of shifts. Okay, so this is the end of my talk. Thank you very much. I will applaud on behalf of everyone else. It looks like you have questions you want to go for it right away. Yeah, why not. It doesn't have his hand up no others I'll just go for it. I think this is super interesting. You focused on a certain class of sequel complexity. Other questions that people are asking usually of sequel queries that don't have deep levels of nesting single block queries. Or have you tried how this works when the resultant query is deeply nested, and especially, you know, might the natural form might be a correlated inquiry. So, we actually, so first we do generate sub sub query, but it's not deeply nested usually depends on the, I think it depends a lot on your training data. If the training data usually is maybe a two or three or four levels deep. It is very seldom for for the train and during the work, even equipped you and even augmented by the grammar to generate a deeper one. But you do mention the one thing that I think is very good observation. For example, I think here we have a one. Example that shows that after the correction using been search. We actually can maybe attach some very well. Tested rules for rewriting sequel to make it from nested sequel to join sequel. Right. So this are equivalent to there are many rules that has been proven to be correct. We can automatically be incorporated into our system to make it. Not only correct but also easy to read and runs much more efficient. Great. And one last question and handle with the baton to someone else. What part of sequel does the grammar not cover or is your grammar a superset of sequel. Very good question. So, so this actually involves. I think more work so far as I mentioned, our grammar is a superset so far we, our main focus is to satisfy our customers, our enterprise customer, which is our collaborators. For example, we are supporting hollow hollow frankly is the subset of grammar. It doesn't support many of the keywords. So that's the priority we are going to support. But our because as I mentioned is a superset. It gives us a lot of flexibility to insert new rules. It can be gradually make it to be fully flashed. That's also one of the key portal that I want to make. For example, here, this is from a trouble for GP GP to GPT for trouble for the question that I I show earlier. What's my attendance rate in the past two weeks right, even though this example is wrong. Okay, the generated one answers is not correct. It's a more severe problem with these generations that okay, it's actually also the power. It shows the power of a GPT for trouble. It uses a lot of knowledge, it obtained or learn from others from a lot of other samples. But it doesn't know that our whole system doesn't support many of these functions. Like, like, right, it doesn't whole system doesn't support it. So, even though we do provide it. Few short learning example in the prompts, but somehow GPT for didn't only learn from the few shot, but also learn from his own previous experience. So this is something that we don't want the GPT for, or at least a solution to over, I think. Great. I have one more question if no one has anything else. Go for it. I know we struggle with nested case statements. We don't generate SQL directly in data chat but when we use it for stuff in which that's needed. It is challenging. Do you hit that case. It is super hard. And there are examples but it's not that it's just a mix of declarative and procedural that gets in the way and it's super hard to make that work right. I don't know if you guys hit that problem or if you do something to make it work. Yes, actually you are guiding our direction of the development. Indeed, we, is that you are talking about exactly the point that we are, I mean, a key problem that our collaborator are asking us to urgent us to solve. So far, the solution we provided only provide a single shot. Okay, is based on a single question. There's some easy fix. And then concatenate by using a dedicated or separate the algorithm. Right. I can just combine multiple questions together and then make it to be a single sentence. By doing so, we can still use utilize our tool, but this is one simple solution, but a better solution is basically what the question you're talking about is that we have some maybe some a long range dependency across multiple questions. Right. So how to incorporate all this information together so that you can provide the best guess of the of the of guessing what you are asking for to generate the sequel for the for the last for the latest question. So far, we are, we have a demo, it's not official product, but it's a demo internally used by our enterprise customer. I mean collaborator is based on a large language models. And we depend on, in other words, the short answer is that we depend on different algorithm to make a summary. Yeah, thank you. Alright, I guess my last question would be like what's the what's the, you know, I think you said your accuracy was like roughly 80% for the test you've done, like what's the remaining 20% is it sequel features or just like weird phrases in the natural language that it can't map to sequel itself. Like what's the, like what's the next big thing for you guys to figure out. Yeah. So it. This is a very, very, very good question. It requires some scrutiny on what exactly, what exactly the mistakes that we made. Interestingly, we actually found some mistakes in spider training data set. Yeah. Even though it still can run but logically is is actually the same mistake that I mentioned earlier, it actually pairs a column prefixed prefixed with the aggregation function and the column. Some of the query, some of the engine do run and return a randomly selected call row. So that's one of the, but for some of the mistakes that we are still trying to see how to improve it using two methods either we provide a more similar sample to make the to enhance the part that we're weak or we can escalate some of the concepts which are widely used but are very difficult to generate to be part of our classifier. I think this escalating concept into a simple and robust classification module is very powerful method for us.