 Our speaker today is Professor Hao-Chu from University of Texas, Austin. She's going to talk about machine learning for power system operations. Before I start introducing her, let me remind you there is a special seminar next Tuesday. Time is 10 o'clock, 10 o'clock, 11 o'clock. So, if you can make it, I'll recommend you to attend. Our last seminar for this quarter is next Thursday, at the same time. The speaker is from the city of Palo Alto, Utah. Next week, we'll go back to our usual rules, Y2E236. Our speaker today is Professor Hao-Chu. She's currently an assistant professor of electrical and computer engineering at the University of Texas at Austin. She received her bachelor's degrees from Tsinghua University, a master's and PhD degrees from the University of Minnesota, all in electrical engineering. She was a postdoctoral research associate and later became an assistant professor at the University of Illinois at the Barnard's Champagne. Her research focuses on developing algorithms for learning and optimization forms in energy systems. Her current research interests include physics-aware and research-aware machine learning for power system corporations and the design energy management systems. She is the recipient of NSF Career Award and she is also the faculty advisor for three best student papers at the North American Power Supposing. She is a member of the IEEE PES Non-Range Planning Committee and associate editor for IEEE Transactions on Smart Grid. Okay, let's welcome our speaker. Thank you, everybody. Thank you, Cheng Wu, for inviting me and also the introduction. So, yeah, I came to Stanford, I think, five years ago when I was in Illinois. And it's great to be back, unfortunately, virtually, but hopefully, yeah, we will get a chance to have an in-person visit sometime. Yeah, so today I'm going to talk more on problems in power systems operations and particularly how to design machine learning tools that are physics-aware and risk-aware. So I'd like to thank my grad students and also National Science Foundation for the funding support. Yeah, maybe I should go fullscreen. Is this working well? Okay. So we have seen that the use of artificial intelligence and machine learning and technology has brought the development of data analytics in the energy world. And it has been identified to tackle several challenging problems in energy systems related to disaster resiliency, like some kind of anomaly detection, and basically the operation of the grid as well. So this is thanks to the proliferation of various type of data that we have today in the energy systems or the electric power grid in particular. But the question still remains that although the AI and machine learning ML tools are very powerful, there are still some gaps between the problem-specific challenges if we want to apply them to a real-time power system operation. So I'd like to show like three problems that we are currently working on now that how to incorporate domain knowledge from power systems to better design the machine learning or the neural network tools basically. So the first one is related to the market prediction and a lot of interest have been devoted to this problem, but then there is an issue of dimensionality or like the complexity issue. As we know that a large-scale power system has thousands, if not more than a many number of nodes. So the idea we're going to look at is how to incorporate the topology information of the grid to simplify the model of neural network. And the second one is related to the coordination at the grid edge or what traditionally we call distribution systems that we have seen more and more distributed solar, distributed storage, and all these collectively called distributed energy resources that need to be coordinated in a synchronous fashion to support operating condition of the connected distribution system. And when we apply machine learning tools to this problem, there is a concern that when the machine learning models apply on the field, they can cause some adversary operating condition and due to the risk of these tools. And then we will see how to try to reduce the risk associated with these decentralized machine learning models. Last but not least, I just going to quickly show some recent results that we have related to using the same ideas of scalable learning or decentralized learning, but for more like emergency operations during like extreme weather events, when there are multiple failures, how can we use machine learning to quickly restore the operation to normal condition. So I will focus on the first two problems and then the last one will be very nicely connected to the first two as a small extension of our current work. So the first one is on real time market operation. The specific optimization problem for market operations called optimal power flow. So, currently it's solved as an optimization problem using numeric sawers. And you can think of that input is the great condition that the power demand, the topology of the system and then also other parameters. And the solution or the output here is what the dispatchable resource set point or some other type of actions that we can make to a controllable actions to the great resources. So there exists a multitude of optimization sawers or OPF sawers that you can use. And the issue, one of the issues that it can get stuck because it's a non convex problem, or there could become a numeric and also computational concerns, but then in the daily operation basically we fit in like multiple instances of input in the real time OPF solution what is the current situation of the grade on the input side. And then it will get us a set of output in order to send to the generators or the resources for dispatch. So the idea comes here is that since we have all these instances of input and output, can we train a neural network model to predict what's going to be the idea set point of generators or set point of some resources based on specific input. So this neural network model can simplify the online real time computation burden of the OPF solution because the feed forward computation is very fast. So this is the basic idea of using your network for OPF. And as I mentioned is a quickly a growing area and there are a lot of works looking at the DC which is the linearized version or like the AC origin and non linear power flow problem. And people have started using it as like the OPF solution. Oh, sorry, the neural network solution as a warm start so that the AC OPF can converge faster. There are also other versions of stochastic OPF and also like directly connected to the duality and analysis. So our objective here is trying to explore the great topologies because in all of these neural networks, they're generally black box model. They don't consider or content any kind of information about the great topology or the great parameter settings. So our idea here is that by exploiting the topology, we can potentially reduce the complexity of these trends your network models significantly. Okay, so I just be brief here on this formulation of OPF. So thinking about the network model as a graph with n number of nodes here. So the AC OPF original non-linear power flow, OPF power flow problem aims to determine the real power P and reactive power Q injections at every node in order to minimize the total cost of supplying that P solution like the real power supply cost. So there are a bunch of conditions we need to satisfy the first one is power flow balance and this is according to kickoff law. And the second one is the voltage operating limits because we want the voltage to stay close to the rated voltage level. The third and fourth the limits based on the type of generation plans we have what the corresponding limits at every node for the PQ injection last but not least, we also need to satisfy the network line flow limits. So the IJ is the amount of power flow on each transmission line between bus I and bus J and their associated thermal limits for each line as well. So in this sense, at every node, there are inputs that we can stack in into this vector Xi, which includes the PQ limits and also some coefficients for this objective function on the generation cost. Usually it's a piecewise linear or quadratic function depending on the type of generation we have. And this is the number of input at every node. So we have N number of this Xi's. So if we want to do the regular OPF, we will try to predict what is the optimal P and I here. So at every node, we also have the output of optimal P and Q, which means that the number of output also scales linearly with the number of nodes N. So if we have N inputs and N outputs, if we train it using a typical fully connected neural network or FCNN, we know that in each layer, we would expect the number of parameters scale with the product of the number of input and output, which essentially is in the order of N squared. And for a larger system, this is a lot of a lot of parameters that we need to train and then we know that the more complicated and neural network is the more easy we can have issues of stacking like a suboptimal solutions and also the training time computational issue. So, so the idea is to say that, okay, can we incorporate the topology knowledge to simplify the neural network. And as we will see very soon, the topology type of embedded neural network work is called graph neural network or GNN. So people have explored to use the GNN architecture to predict P and Q. However, missing gap here is that to for the GNN network to be very powerful. There needs to be some kind of locality property, which means that the predicted value should be very close if they are in the neighborhood region. Okay, so like say this is the network here so the P and Q at node 10 should be very close to the P and Q on node 8. So that's essentially the locality property. This is not a case for the generation output because the generation output depends on its own cost. So if a generator at node 10 is producing at its higher limit, it doesn't mean that we should also try to utilize fully utilize generator at node 8. So there is a large gap of using the GNN to predict P and Q. Here comes our idea. We realize that although PQ does not satisfy this locality property, the locational marginal price, which is output from the OPS problem specifically from the dualized OPS problem satisfy this locality property. And what is LMP? So let's consider a simpler version of the OPS where we only have power balance and also this land flow limits here. And it's only related to the real power input P. So this is a simple linear constraints here and we can introduce multipliers like larger multipliers for this constraint and then the last constraint. The locational marginal price is essentially the linear combination of all these optimal multipliers, which help us to determine what should be the specific PI set point based on this PI at node I. Okay. So this PI vector is related to the multipliers associated with the land limits according to this matrix S transpose and the matrix S is fully dependent on the graph topology. Specifically, it shows the same egging space as the graph Laplacian B here, which in power system we call the B bus admittance bus matrix, which it is a weighted graph Laplacian, but then it is topology dependent quantity. So, because in real time operations, there are very few congested lines, which means that this multiplier mu difference is mostly sparse. It's only non zero at the congested lines and therefore the LMP PI here is depending on this congested lines fully based on the topology or the graph Laplacian. And that is why it satisfy nicely this locality property that we are looking for when we apply graph neural network. So, just to show you some real world price map, this is the the Texas ERCOT LMP real time LMP value. And so it's a control plot. You can see that around this east area, which are usually the load centers, Dallas, Austin, Houston, the price is much higher than like the producer or like more remote area of West Texas. And, and when this price definitely shows this locality property because it's very concentrated based on the geographic area. Similarly, this is the California ISO LMP where you also see this locality when the price, which can go as high as $150 in the in the in the Bay area, I believe here. And well, for the rest of California, the price is much lower around $50 power, I believe. So, basically, the LMP allows us to explore the GNN or the graph neural network model to do the prediction. So, very quickly on what is essentially when we look at the input, which is that every node has this Xi. So I have N of this Xi here, and every layer of GNN considers includes to filter a W here, which is based on the topology. So this W IJ entry is none zero whenever there is a connected line. Okay, and then the other part of the filter is the age. This is like a typical neural network filter. So there's no sparsity there. It allows us to explore the high dimensional mapping very effectively. But then the benefits of using GNN when we have a sparse graph comes with W. And this is our result here in the typical power network, the number of lines actually in the order of the number of buses. So it's usually like two or three times the number of nodes in the system. And because of that, the number of parameters in the GNN layer or the number of parameters, none zero parameters in W scales linearly with the number of nodes N. So in every layer of the graph neural net, we have a linear or the number of parameters is in the linear region with the number of nodes. And this is a big decrease from fully connected neural network with this N scale. Because we utilize the topology and specifically what we are doing here is to utilize the locality property of LMP, we're able to make the GNN very suitable for the prediction in real time OPF. And we, as a result, we also attend this complexity reduction from N square to N. So we can use that for predicting the prices or LMP. So basically, this LMP prediction work has been considered in the past, but then it's mostly based on statistical learning or SVM approach, but then not using like a neural network in the past. So the full like decision rules or the train of variables for the GNN based LMP prediction is to say that, okay, if I have this input matrix from every node, and then I train this neural network, specific by parameter theta here, I can get the prediction of the LMP at every location, which is pie hat here. And using the LMP because it's the dual variable or multipliers that I can determine the optimal primer variables, which is the dispatch real power solution. And I can also form the land flow area well in the system. And then in the basic version, we can consider just trying to match the LMP value between the output value, the actual value and then the predictive value trying to minimize the error of the two. So if we want to introduce some kind of regularization, we can also use this land flow because we can completely determine the land flow if I know the predictive pie value. So I can also introduce this regularization term to reduce the land flow violations and this will be shown to lead to better performance later on. So this is the basic idea of our recent conference paper in linked here. So in addition to that we have considered several extensions. And the idea is to do congested land classification, and the idea is very similar because the contrast line contrasting pattern also has this topology dependency. So it enables us to also use GNN to classify whether a line is congested or not. For classification type of tasks, we have used like cross entropy type of loss. And also, after a few layers of GNN, we also have a final fully connected layer just to try to predict this zero one value for this binary classification task. So here are some results. So we have tested it on like a small system 180 node and also a bigger one 2382 node system. For the small one we consider the non-linear ACOPF and also the big one is the simplified linear DCOPF. So the type of metrics that we're considering here basically the LMP prediction arrow. So the top figure here is the LMP prediction arrow, the normalized L2 arrow. And we also compare the effects when we try to introduce this land flow limit because one very important operating constraint in the OPF problem is the land flow limit satisfaction. So we want to see that whether this accurate predicted LMP value can also try to avoid any possible land flow limit violation. So we consider three solutions to propose GNN for LMP prediction or a genetic fully connected neural network and another simplified fully connected neural network. But they still have the same complexity order as FCNN is called graph informed DNN, but then it is complexity order is still the same as a regular FCNN. So we consider the original version also the regularized version when we regularize the LMP prediction arrow with the land flow limits. So here are the results here on the top for the LMP prediction and the red one here is the proposed GNN and then the green one is the fully connected and the blue one is also a similar fully connected neural network. And we can see that in terms of LMP prediction performance is the proposed GNN is very close to the FCNN or the other variation of FCNN. If we use the feasibility regularization, and this is the circle one here, the arrow actually could decrease very visible if only using the LMP loss function. So definitely having the feasibility regularization is very helpful. So the bottom one here is with the land flow limited violation. So for the small case we didn't see much difference in the land flow limited violation is at a very small level of 1% of violation. But for the bigger case, we do see a very significant difference in the land flow limit violation. Actually using the feasibility regularization, we are able to reduce the land flow limit violation to a very small value. In addition, it seems that the regular, the fully connected neural network has some kind of overfitting issue because of the high number of parameters. And then we do see that the land flow violation is much higher between the fully connected neural network and also the proposed GNN. So, as we mentioned, one of the futures of GNN is in terms of reducing the number of parameters, and this is also verified by comparing the number of parameters existed in each model. You can see a huge difference in terms of it's like the order of magnitude different between fully connected in your network and then the proposed GNN model. And that's why we can better fit the training data with less effects of over parameterization. We have applied that to predicting congested lines and is also for the small system AC power flow and also the bigger system of DC power flow. So we use the recall, which is essentially the true positive rate and also the F1 function, which is the balance between true positive rate and forced positive rate to try to see evaluate the performance between these three neural network models. And by and large performs the other one, the other two, thanks to its reduced complexity as you can see here that it has the highest recall and F1 score here. And interestingly, this we again observe that the effects of over parameterization that for the fully connected neural network, the, or the variation of fully connected neural network. In the larger system is performance is much worse than the sink reduced complexity GNN model. So definitely having less number of parameters to feed can help with the GNN and help with the neural network over parameterization problem. So I will just quickly go over this slide because I think there are some questions, but after here I will stop for question for the question. So here is another attractive future of GNN. So we know that in real time operation, there could be some a lot of variations of the line status, sometimes due to weather events or some kind of unexpected fault events. The line can go out of service and this is what we typically call a line outage. So when the topology changes, a regular black box neural network model needs to be returned whenever there's some topology variation. But because the GM architecture already incorporates topology, it can quickly adapt or what we call like transfer learning paradigm, it can quickly transfer to the new topology. Even if there's some differences in the system connectivity. So we have tested idea we use this originally nominal topology GNN and then we apply it to some a trench topology by selecting randomly like a few lines to disconnect them. And we want to see how this original nominal GNN work with this new topology. Very interestingly, like the fitting arrow is still very close to the original nominal level with a few exceptions that this for certain line combination. It may have changed the system mark the LMP pattern significantly. So there is a large arrow, but then for most of the contingency or topology change cases, this arrow are very similar. And if we go use the nominal GNN to retrain it and for this new topology, it turns out that it's very fast. It only needs like three to five I poke and then all of this new topology cases can quickly converge to a small arrow. So the GMM model is very easy to adapt it to a new topology connectivity condition or adapted to a new operating condition. So this is very interesting to us and we are currently trying to understand better why this is the case. And we suspect this is because even though when there are some small topology perturbation, it does not change the subspace of the underlying graph model very significantly. And that's why we can adapt the original pretrained GMM model for the nominal case to this new topology very quickly. So with that, I think this is for the first part of the talk. If there are some questions I can answer now. Yeah, I just had a question about the the axis you are like a, I believe it was like an error. Yes. Yeah, the normalize L2 error. I was just trying to understand what that means in terms of like presented deviation from the actual values. Yeah, so, so we're comparing the arrow of predicting the pie, the LMP value, and then I just normalize it by the vector, the actual vectors norm. So it's like a 10% like here is like in like 10% of arrow, effectively. I see. Okay, that makes sense. Yeah. Hi, I got a question. I'm not sure whether you can hear me. Yeah, I can hear you. Okay, nice. Yeah, so just a quick clarification. So for this location or marginal price. So we're predicting it so that we can optimize the power flow. Is that how this model is working? Okay. And when we are optimizing the power flow, what are we optimizing for is that distance travel losses. Yeah, great question. So, in for conventional generations is this cost. So it's like the depending on the type of few that we're using to supply this tool to power this generator basically so. But then if there's some kind of demand flexibility. So this is one thing I didn't talk very specifically if they're like the demand response resources. It can also be incorporated into a cost here. So it's like, what is the cost if I want to tell this load slightly. So it's essentially like economic cost. Okay, and does this cost account for let's say we have transmission over greater distance we have greater losses. Does it account for that that cost. In the AC opf. That's a great question. So in the AC opf the losses are embedded into the power flow equation or the constraint here. So, yes, it will account for the potential losses on the lines. Yeah. But I thank you. Yeah, our, our current. Our current power companies using GNNs for LLMP prediction. Great question. No, they don't. So, so far, the focus is still on developing like superior optimization solver to solve it. So it's just based on the instance to solve it. Yeah. And then so how does GNN compare to you talked about SVMs and other ML techniques, how does GNN compare to other techniques and I assume that it means be difficult to get trading data that is appropriate in the GNN format, or the pre processing was taken a long time. Actually, it's not too much different if we want to apply like either GNN or SVM, but then definitely we have observed that the SVM like has the known issue that if we want to go to a nonlinear prediction task and then we have to run into the problem of choosing like kernels. And this kind of issues. So it's not always very easy to tune up like SVM for the LMP prediction. But in terms of pre processing, so we actually use the, the actual pie or the LMP was produced by the optimization solver in the offline setting. So, so it, so, yeah, we don't need to like take additional effort because it will already, it will automatically give us like the LMP at every location. Okay, and then I fully address your question. Yeah, that makes sense. And I guess my question is to follow up, given that GNNs are some sort of a black box, have the power companies that you've worked with expected you guys to provide some sort of interpretability to these models. And if so, what are the next steps in figuring what what is actually going on under the hood. Yeah, yeah, great question. So I think definitely like these kind of analysis when we go to like adaptive the topology adaptivity, like this is something that we would need for practical implementation. So, I guess the link here is that this W matrix or this graph filter W here is based on the topology. So, in some way it explained to us that if there is a high demand in one location, how it could affect what's the like the biggest impact to other locations LMP. So there are some kind of connection that we know like usually when there are like a high power demand in one location, it could cause congestion and accordingly it will increase the cost of supplying electricity in certain locations. So there are some connection because of this topology based graph filter. But yeah, we haven't like fully try to establish that yet. Yeah, that's a great point. Yeah. Yeah, I think I'm having a second part. So, so we have the second part is most on the distribution grace site. So essentially, we want to use the same idea to do coordination or co-optimization of the great edge resources. It's like a distribution system or what we call great edge where you have controllable devices like PV photovoltaic inverters that can supply what we call reactive power to great to improve the operation. So this is a similar type of OPF problem, but then a key challenge here is that we don't have a very frequent communication. So at every node itself, it can quickly measure its own local quantity, but there is no like a very frequent communication between a centralized location with this distributed PV. And this is the major operating challenge in distribution system. So people have thought about using similar ideas of learning for optimization because this is also a special instance of OPF problem. And there are several papers around this area. So one big issue in this problem specifically is that the operating objective is related to the voltage as we were seeing soon. And a lot of problems have not considered that what if the result and solution will violate that voltage limit. So our focus is trying to address this strategic risk related to this voltage limit violation. How can we incorporate certain type of objective into the loss function to better improve the worst case voltage violation performance. So just to quickly go on the problem itself. So, yeah, so you can see that there are high similarities between this problem of the real time market problem, although it is only in a very small geographic area like a neighborhood power grade. And because we are looking into maintaining the voltage everywhere in the system. So more of the interest here is to change the queue, the reactive power provided by this distributed PV such that the voltage is within the limits, upper lower limits. And then the objective here is related can be cast as the losses is in the system as as mentioned earlier. So we can use some kind of linearization tools to deal with the the model here and then if we linearize the powerful model that is actually a very nice convex quadratic program with linear constraints. So the more challenge here is that to solve this problem in a centralized fashion that I have to collect information everywhere from this distributed PV in a very fast fashion, because this PV output can change in seconds time scale. But currently the kind of communication that we have between the end users or the remote PV and then the central controller can only afford like a 15 minutes one one one time every 15 minutes. So it's impossible to solve this problem in a centralized fashion. So people have thought about of using this similar neural network idea. So say that the control center try to train this all collects the operating condition for multiple instances and solve for the optimal solution at every PV individually. And then instead of training a total neural network for the whole system, we train a scalable one that at every node and we are only trying to use the local data. So the local measurement data can be available in a very fast time scale. So we only use the local data trying to predict the local solution. So, of course, there is some sub up on a more sub-operability issue because we are not allowing for full feedback and this is a known issue in distributed control that there are concerns of like optimality per se. But then we can still do it by using like the non-linear transfer transfer capability of neural networks. And similarly, gene architecture can be applied as well. So we just use the same filter everywhere in the node. So, as mentioned earlier, the loss function that we would use nominally is only taking every sample k here in an equal fashion like a mean square arrow. So it's averaged over all the sample k. So, the issue here is that there even though if we look at this is the distribution of the sample losses is typically in this kind of shape with most of the samples concentrated in a smaller region. But then there could be like the entail can be really long and these are like the worst case samples. So if we only take the average across the samples, we're trying to minimize at this level. It does not help us to mitigate this worst case losses. So very famous metric risk, risk measure is called the conditional value at risk of seawall in robust optimization, which is to say that we consider the losses caused by this worst case samples. So, given a significance level alpha, we average the losses across the top alpha number of samples. So, so our idea here is that in addition to fit this MSE arrow, we can also try to incorporate the seawall risk measure with the regularization parameter. And this can help us to also reduce the worst case performance. So we can consider the worst case performance of predicting the queue solution. We can also consider the seawall for the worst case voltage limit and violation. So one thing that we have done is to address the computation issue. So typically seawall is very popularly used because it can preserve convexity of the model. But we know that neural networks is typically non convex. Although there are some recent results trying to generalize seawall's property to this general non convex function that satisfies PL condition. But then we might not be worried about the convexity per se, but then the key challenge is in the computation side. This is because that when we try to compute or train the seawall's regularized objective, one issue we have is that it only depends on the worst case samples. So say like top 10%. If originally I have 1000 samples, I can only have 100 to compute the seawall loss. And in particular, when we use more than machine learning tools like a stochastic gradient or mini batch methods, it can further reduces the number of samples that we can use to compute this gradient. So there is a high concern about the statistical fidelity issue when we try to compute a gradient of seawall when we do gradient descent. So we have a small idea here to just address it from a memory standpoint. And this is through like a selection step. So basically we when we use like a mini batch type of method for learning a seawall and regularized objective. In a typical mini batch, we randomly generate a subset of samples in this mini batch BI. And then in a typical gradient descent, we will just use the gradient based on this mini batch BI. So to tackle the issue with the statistical significance related to the seawall objective, we don't use every mini batch here. We only select the mini batch when if it has this worst case samples. And what is the criterion for that? We set a threshold gamma alpha here. And if the seawall value for this mini batch is higher than the threshold, we will use it to compute a gradient and do one step of gradient descent. If the seawall value for this mini batch is very small, it means that it doesn't contain a lot of worst case samples. So we will disregard this mini batch. And this is a very simple step to implement, but it turns out to be useful to save on the computation time when we try to compute the gradient for the seawall objective. So here are some results for the six nodes. 60 nodes or six PV nodes in a 123 node system. And the decision here is to determine on this reactive power queue. So, as I mentioned earlier, we use this scalable neural network. So each individual neural network at each PV will only use the local information like power output and also the incident land flow to determine what is the optimal queue solution locally. So we compare all the effects of using the seawall in the regularization. So the red one is the standard MSE objective, and the green one is using the seawall and also using our proposed mini batch selection idea. And then the blue one is just the seawall with the regular, does not do any mini batch selection, the regular gradient descent. So this is the arrow, the normalized arrow at every node here. And we can see that when we incorporate the seawall risk related to predicting the queue solution here, it does not change much on this prediction performance. And the reason here is that the prediction is already pretty good for the queue value. So at every node, we can predict the queue value very accurately. So it does not make too much difference if we want to mitigate the worst case prediction performance. However, we do see that we're using the seawall value or the seawall regularization, it accelerates the computation in some sense. So when we use the seawall regularization also introduced this mini batch selection algorithm. It can reduce the computation time by like 20-30%. And especially that this mini batch has also reduced time for each per iPod computation when we have the seawall cost. So you may wonder like what was the point to have this seawall. So the risk is not on predicting the queue solution. As I said, the queue is accurately predicted. So it does not change the queue prediction arrow very much. However, if we consider the seawall for the voltage violation. So remember this queue decisions are supposed to maintain the system voltage to be close to 1.0 per unit or the rated voltage level. So therefore, if we plot out the distribution of the voltage deviation, the normalized voltage deviation from the nominal 1.0. You can see that if I use the original MSE or even the optimal one from the optimal solution, the tail can go over 0.05. And this is the threshold that we can allow for a typical operation. So this without considering the risk of this worst case samples in terms of the voltage performance, we can tend to have very large or the worst case voltage violation would be higher. But if we introduce the regularization for the risk of voltage deviation, it can effectively reduce the worst case voltage deviation here to be and make it to be lower than 0.05. So, if we have some issues with the worst case performance is very effective to use the seawall risk module here. And we can see the similar level of computational improvement here in the right table. Actually, the improvement is even more compared to before that we have seen like almost 40% of reduction of computation time. Yeah, so with that, I think we have seen how the seawall risk can help us address the worst case voltage deviation performance. So we have applied the same ideas to another task, which may be more relevant to both of California and Texas residents as we have seen one more extreme weather events, or what we call like emergency operating conditions. So when there are like natural disasters related to winter storm or unprecedented level of low temperature or wildfires, it can damage the physical infrastructure, the grid infrastructure. So we have also used similar machine learning ideas to enable faster response to utilize this emergency response resources like dispatchable load or change the topology of the grade. And this scalable neural network model can help us to attend this solution in a quick and also safe manner. So this is an idea that we have used your network or machine learning to try to determine what is the optimal low shading decision from the control without the intervention of the control center. So moving from centralized to decentralized low shading paradigm. I don't think I have the time to talk about it, but essentially it's the same idea that we want to enable each individual node without knowing what's happening in a global scale, but then just use its local measurement data and then use a pre trained optimal decision rule to figure out what is the best corrective action in terms of load reduction. So this year ongoing work. Yeah, we probably can come back and talk about it in a different time. Okay, so in summary, we have seen two plus a short small one applications of how to use machine learning on your network in operating power grid. We specifically looking at how to incorporate physics knowledge about the grade. For example, the topology information here to simplify the model reduce the number of parameters in the neural network model and address the issue of over parameterization for example. So we have seen how to introduce risk measure to address the potential voltage violation issue when we deploy this decentralized neural network on the field. We want to make sure that their resultant solution does not cause a lot of voltage violation. Yeah, so we have these similar ideas using your network to learn this fast response actions for each individual resources and this is still ongoing work and we have a lot of potential extensions for all of them. So with that, I would like to conclude the talk and just want to re emphasize that there are a lot of opportunities that we can use machine learning when the grade is transitioning to a different operation paradigm, when we need to consider resilience. So that the fast computation or the fast fit forward computation capability of neural net is very convenient for that when we have dynamic resources or some kind of new type of resources that we don't have the exact model of them. There could be new opportunities of using more like a model free learning for these problems. So with that, thank you for for your attention and if there are some more questions happy to take as well. I got a question. Yeah, so this is on the second solution your distributed distributed energy resources. So just to clarify, you are trying to predict what is the optimal output of each distributed energy resource right so I'm just wondering we actually have the policy and the technology to control kind of the output of this individual distributed energy resource. Because this is basically telling each house how much solar power they can, in a sense, sell to the grid right. Yeah, so so it's, it's an ancillary service reactive power so they may not directly sell the the real power output from the PV but then they do sell some kind of power to the grade. So, currently, so that's a great quote. So currently, without knowing these global information they would do a simple like PI, or like a linear scaling rule so they would just react linearly to like the local condition. So, and a new network based decision rules, enable it to explore the nonlinear operating conditions, and then construct is nonlinear decision is globally optimal solution so that's essentially why we want to use a nonlinear decision rule here. Got it. And for this current solution whereby you say to use the linear PI that is done by the distribution grid right. That is by each individual resource locally too. Okay, and who's enforcing this. Is it like the distribution grid operator that's enforcing this or is it our transmission system operator. Yeah, so it's indirectly by the distribution grade. Oh, it's kind of like a, like a kind of pre conflict into the hot well so they can fix that. Oh, each of these note they would react is react to the local condition in this way. So, yes, it is according to the standards. But it's not like directly commended by the distribution operator. It is only like using it is pre programmed, but then in the real time operation or real time control is only takes the local input to compute the corresponding queue solution. Got it. Thanks a lot. Any more questions. Okay, let's thank our speaker. Thank you very much.