 Hi everyone. Thank you for coming to today's SmartBreeze seminar. Our speaker today is Dr. Lu Nyang from Enrio, National Renewable Energy. She is going to talk about predictive analytics for power systems. Before I introduce her, I want to remind everyone that our next seminar is next week. The same time, we have a speaker from CMU who is going to talk about machine learning and artificial intelligence. Dr. Yang is a senior research engineer in the Power Systems Engineering Center at Enrio. Her areas of expertise include advanced data analytics, machine learning and optimization in electric power systems. She currently leads several projects on developing AI solutions for power systems operations at Enrio. To receive her PhD in electrical and computer engineering from CMU and her bachelor in W8 from Chiwan University, China. Let's welcome the speaker. Thank you Chen Wu for the introduction and also for the invitation to speak in the SmartBreeze seminar. Today I'm going to talk about our work on the predictive analytics for power systems with high penetrations of the distributed energy resources. So in my presentation today, I will first talk about who are we and what do we do at Enrio. And then I will talk about the predictive analytics and why do we think they are really crucial for enabling the power system with high penetrations of renewables. I will also talk about two applications of the predictive analytics, which our work have been focused on in the past few years. After that I will give a quick summary and also point you a couple of resources, which you can find more about our work at Enrio. So you probably know this already, the National Renewable Energy Laboratory, which Enrio is a national lab of the Department of Energy. We are located in Golden in Corrado, which is just a 20 minutes drive to the west in Denver region. So we are currently having around 3000 people in the lab, and we have very good and work world class facilities to conduct research in the renewable energy domain. So those world class facilities includes the energy system integration facility, which you are seeing in the picture here. And we also do a lot of the research in partnership with utilities, with industry, academia and also governments, including both federal and state governments. Another key feature of our campus is it's actually a living laboratory on our campus. So we actually have a lot of sensing and measurements collected in our campus. We have rooftop PV systems, EV chargers installed in our campus, and we also have a lot of visualization capability to showcase what is energy consumed by different buildings. And what are the generation by the PV systems in our lab as well. And Enrio's mission really focused on to enable the sustainable energy future for our nation. And we have a lot of research work in those four pillars in our research portfolio, including the renewable power, such as research on the solar wind, water, geothermal resources. We also have the work focusing on the sustainable transportation, including bioenergy, the vehicle technologies, and also hydrogen. The third pillar in our research portfolio is the energy efficiency that includes the research working with buildings, advanced manufacturing, and also government energy management. And the fourth pillar in our research portfolio is really focusing on the energy system integration. And that's also where my work is mainly focused on. That's how you can integrate those renewable energy resources into the power systems and how can you enable the hybrid systems and also have the secure and resilient power systems. So as, as, as the power system engineering center which the center I'm working on, I'm working. So our main goal is to conduct the high impact research and development to solve the challenges of seamlessly integrating conventional and renewable resources, flexible loads, storage and central and distributed generation, enabling the resilient, reliable, flexible, secure, sustainable and affordable power systems at all scales. So for the power systems, we actually mean the systems could be as small as the nano grid or microgrid, all the way up to the distribution feeders systems and to the bulk power, bulk transmission systems. And at our center we focus on developing new and innovative technologies to really enable the integration of those renewable and distributed resources into our power system operation and the planning schemes. So that's a very brief introduction of of overall and our power system engineering center. So now I'm going to talk about the predictive analytics, and especially why we are looking at the predictive analytics for power systems, and also the two applications of where we use predictive analytics to inform the power systems decision making. I don't think I need to emphasize a lot on the motivations why we really needs the predictive analytics, especially for the power systems with high penetrations of the DRs. So in the past couple of years, and we have seen the rapid growth of those distributed energy resources in the power systems. So for example, the installation and the capacity of the distributed PV systems in the US has been on the 20% annual growth in the past five years. And by the end of the 2020 actually reached the 28 gigawatts total install capacity in the US. And if we combine that distributed PV with 15 million of smart appliances deployed at the US homes, the increased sales in the electric vehicles in the nation, and also the rapid increase in the deployment of the energy storage in the power systems. All of those distributed energy resources would actually provide 200 gigawatts of flexibility potential in the US by the year of 2030. So that's only eight years away. And this 200 gigawatts of flexibility potential actually accounts for 20% of the peak load. So this actually means those distributed energy resources have huge potentials to provide the much needed flexibility and the controllability to the power systems. And they can provide different types of great services for the mutual benefits of those resources and to the power systems. But you may wonder that a lot of those potentials could provided by the distributed energy resources are not fully used in the current power system operations. And the main reasons are the main challenge for not using the full flexibility provided by those DRs are because of the lack of the observability in power systems, especially on the distribution systems where those DRs are connected. So traditionally power system operators have done a very good job in monitoring the operation status in the transmission systems. A lot of the sensors have been deployed, including the supervisory control and data acquisition system, the phaser measurement and in the transmission system, there are actually redundant measurements. And by using those lots of measurements, great operators can have a very good idea on what's happening in the power system. And based on their estimation, they can actually perform a lot of the control and optimizations to improve the reliability of the power system. However, on the distribution system side, traditionally, there are not many measurements deployed in the distribution system. There may be just data measurements deployed at the feeder head, which collects the information only at the substation level. There may be some scattered measurements deployed inside the faders, but those measurements may not provide the full visibility and the full observability of the distribution faders to the utilities, all the great operators. And the recent deployment of those DRs actually provides or actually adds more sensors and adds more measurements in the distribution systems, such as the smart meters can provide the measurements at the customers. And the inverters equipped with the DRs can also provide measurements to reflect the status of those DRs. However, there are also challenges with this new deployed, you know, sensors in the distribution system, because there are actually, so, for one, they are not actually provide the full observability of the distribution system, even with those new sensors. And also, those new type of sensors and measurements, they are heterogeneous in nature. So they actually have different time resolutions, different qualities and different synchronization. And a lot of those data have not been fully utilized or fully integrated into the power system operations as right now. And to make the situation even more challenging, there are a lot of those DRs are actually deployed at the customer premises, so which we call behind the meter, behind the smart meter. So utilities usually do not have direct visibility of those behind meter DRs, but only have only monitored the whole house power consumption collected through the smart meters. So power system doesn't really know what's actually happening behind that smart meter. So they don't really know what are the consumptions or the generations of the different DRs deployed at the household level. And the lack of visibility is really a challenge for fully utilizing the capabilities provided by those DRs. And if we can have the behind meter DR visibilities, it can better, it can help the utility and the grid operators to better quantify the impact on the net load. Also help them to analyze the impact of the DRs on the distribution system, not only on the net load itself, but also on the power flow, the voltage profiles, and it can accommodate higher penetrations of the DRs integrated into the system. So because of those challenges are related to power systems are not having enough or full observability of those DRs deployed. So the predictive analytics are particularly crucial for providing the situational awareness for power systems. And this situational awareness is not only real time situational awareness, so knowing what's happening right now, but also the forecasting of the predictive situational awareness, which utilities and grid operators can know what's actually going to happen in the short term future where they can better prepare the grid to accommodate those ever changing grid conditions. So that's why we are looking at how can we develop the predictive analytics to improve the situational awareness for power systems. And in this presentation, I want to highlight two applications of the predictive analytics for both the power system and also for the behind meter DRs. So the first application is we are working on to develop a predictive estimation, which is a step forward based on the traditional estimation. So we actually provide the estimation and the forecast of the power system states, which are the voltage fissures, and by using that forecasted voltage fissures, it can better inform the decision making and operation in the distribution system. So that's one application. And the second application I'm going to talk about is how can we use data analytics to get the visibility of those behind meter DRs. So if we only collect the measurement data as a smart meter, can we know what is the PV generation behind the meter on the rooftop. And for both of the applications, we are developing the data-driven methods with power system physics incorporated. So by combining both the physics with the data-driven method, we can actually do a better job of providing those predictive situational awareness for power systems. So now let's look at the first application. So it's on the predictive estimation. So traditionally, as I mentioned, that estimation has been widely adopted in the current operation in transmission systems. And it is actually the foundations for many applications in the transmission system operations. And the conventional estimation, the goal is the inputs are the measurements collected in the transmission system. And by using a method called weighted list square, it can estimate the voltage fissures, both the magnitude, the angles in the transmission system. But the traditional weighted list square method actually requires redundant measurements. So you need to have the redundant measurements to get a good estimate on the system voltages. However, in the distribution system, as I mentioned earlier, there are not that many measurements and the system may not be fully observable. So the question is, can we still do the estimation accurately, but only using limited amount of measurements? So the answer is yes. And how do we really do this? So we actually borrow a concept from the Netflix recommendation system. So basically how Netflix recommends the shows a user would like to watch based on mining the watching patterns of the users with the similar patterns and use the correlation in the watching patterns to actually make that prediction and recommendation. And the method behind that recommendation system is a matrix completion method. And in our developed estimation algorithm for the distribution system, we use that same concept but apply it in the power systems by augmenting the matrix completion problem with the power system constraints. So that's where the physics information comes into play with the data driven method. So a little bit deeper into how we really do this. So we actually formulate a data matrix that contains both the unknown state variables which are the real part and the imaginary part of the wattage visors in the system. So those are the unknown variables we want to estimate. We also put the the measurements which are collected in the system, including the active and reactive power measurements at individual nodes, and also the wattage magnitude measurements at individual nodes. So some of those measurements are collected in the system so they are partially known, but a lot of them are still unknown variables in this data matrix. So the problem we are facing is we have this data matrix and the columns here actually have correlations between them. But we want to estimate those unknown entries in this data matrix using this correlation. And the way we do it is we formulate an optimization problem. And we actually put the objective function to minimize the rank of the data matrix, giving the constraints of the known elements in this data matrix equals to the measurements. And also the constraints includes the power flow equations. But one point I want to mention here is, since the full-blown AC power flow equations are nonlinear, and by including the AC power flow equations will actually make our problem very difficult to solve. So instead, we actually use a linearized version of the power flow equations, which models the power injections and wattage magnitude as a linear function of the wattage visors. So by doing that, our optimization problem is much easier to solve. So you probably will wonder that since a system is not fully observable, can you still solve this estimation problem and why this matrix completion method works. So first we want to show that the low-rank assumption actually holds, because our goal here is to minimize the rank of this data matrix we formulated. Here I'm plotting the singular values of the data matrix we formulated for a real utility feeder using the real data. And we see that the first three largest singular values are accounted for around 95% of the total singular values, and the last two singular values are rather small. So this means the formulated data matrix is close to have a rank three, and that's actually verified that the formulated data matrix has indeed has a low rank. And on the theoretical side, if you look at the traditional matrix completion problem, where you just want to minimize the rank of the data matrix subject to the known elements in that data matrix equal to the measurements. There actually exists a minimum number of entries required to uniquely recover this unknown low-rank matrix X. So there is a theoretical guarantee that we can recover the unknown elements. And by incorporating the power flow constraints, so those physics-based information can actually help us to require less of the measurements to be needed to recover this low-rank matrix. And another important aspect we have looked at using this matrix completion-based estimation algorithm is, how can we make sure that this algorithm can be scalable to the real power systems, such as the system with, you know, thousands or tens of thousands nodes. And the challenge here is, since we formulate the estimation problem as an optimization problem, and by minimizing the rank and doing some approximations, so the problem we're solving is actually a semi-definite program, and semi-definite program is generally computationally intensive. So it may not, the method itself may not scale very well in the larger power systems. And the solution, the potential solution to improve the scalability of this method is to implement the distributed algorithm. And where the idea is, we just partition the whole system into sub-systems, like we show on the figure on the left-hand side. We actually partitioned the 123 node system that's a standard IEEE test system into five different regions. And for each region, we formulate this estimation problem and solve it using their own information. But by exchanging some boundary conditions between the areas, and by getting this communication, we can actually guarantee that the distributed algorithm will converge to the same solution as the centralized optimization problem. And what I'm showing on the right-hand side is, we plot the objective function values for all those five areas we partitioned. And we say by exchanging the information among the, between the neighboring areas, we are able to bring down this objective function values to convergence after a couple of iterations. So by using the distributed algorithm, we are able to really improve the scalability of the matrix completion-based estimation algorithm. And we actually tested using the real system data and real system models. So that's how we do the estimation part, which actually gave us the estimates on what's happening as right now in the power system. But for the predictive estimation, we want to go one step further by forecasting what's going to happen in the power system in terms of the voltages. So in order to do the state forecasting, we actually need to learn the spatial temporal correlation between the measurements and the system states. And this correlation will help us to forecast the system states in the short-term future by using the historical measurement data. You probably already guessed that this correlation could be very complex because we need to map the historical great measurements into the future system states. And if we, if we learn, if we want to learn this relationship directly, it could be a very difficult task. So in order to solve this challenge, we, we leverage the idea of the kernel learning, which actually use the kernel functions to map the input space to a higher dimensional feature space. And in the feature space, we can learn a much simpler relationship between the feature space and the output space. And that could significantly reduce the complexity of learning this relationship between the historical measurements and the future system states. To further expand this, and we incorporate the power system knowledge into this kernel learning based forecasting approach. We actually develop the different type of kernels for the different types of measurements, such as the power injections, the power flows, the voltage magnitudes, voltage angles and the current measurements we are having in the system, and combine all those different kernels together through an optimization to provide the forecasted voltages in the short-term future. So that's, that's what we did for developing the predictive estimation. Now I have some highlighted results I want to share with you on the performance of those predictive estimation algorithms. So first, let's look at the voltage estimation results. So we tested our estimation algorithm using the real utility faders from our partner, and the fader has around 2,500 nodes. And we actually create the scenarios with 100% PV penetration, which means the peak PV generation equals to the peak load. So at the, so this is very high penetration for, for the utility faders. Now we consider three type of real sensors which are currently deployed in the, in this utility fader. So we have the SCADA measurements at the substations, which mirrors the active and reactive power and voltage magnitude as a substation. And we also assume that we know the anchor reference. The second type of sensors we are having in the system is called grid 2020. Those sensors are deployed at the service transformers in the distribution system, and it can get the active and reactive power measurements plus the voltage magnitude measurements. The third type of sensors we are having are the smart meter data through the advanced monitoring infrastructure, which provide the active power measurements and the voltage magnitude measurements. So for our seven realistic scenarios, for the first scenario, we just assume we know the minimum information in the system, which we have the substation measurements and we know where the zero power injections are. So basically the nodes without any load and without any PV systems. And starting in scenario two to seven, we are adding 1% of the grid 2020 data into as an input. And starting from scenario three, we are increasing the air percentage of the AMI data from 1% to 5% in the system. And the two figures on the right hand side are showing the mean absolute percentage error in percentage for both the voltage magnitude estimates and the voltage angle estimates. So if we just use the bare minimum information with the substation measurements and the zero power injection information. So the voltage magnitude estimation error is around is a little bit higher than 2.5%. But if we can include 1% of the grid 2020 data and 1% of the AMI data, we are able to bring down that arrow to 0.5%. And we are actually seeing the similar trend for the angle estimation. So by using the substation measurements plus 1% grid 2020 data, 1% AMI data, we are able to achieve to bring the average angle estimation arrow to 0.5 degree. And this actually shows we can achieve accurate estimation results with only the substation measurements plus 1% grid 2020 and 1% of the AMI data. And next, we are showing the results for the voltage forecasting. So here we are looking at the voltage forecasting at 5 minutes to 15 minutes ahead of time at one minute resolution. The input data we are taking are the active and reactive power measurements at the load nodes for the past one hour. We are having 80% of the data for the training. And after we train the model, we are using the rest, 20% of the data for testing and evaluation. So the two figures we are showing here are the arrow distribution in both the training and the testing sets. So for both the training and the testing, we say most of the arrows, so the arrow distribution follows the normal distribution pretty well, and most of the arrows are very concentrated around 0. And for the training data set, we do say 95% of the arrows is smaller than 0.9%. And in the testing set, the arrow is slightly larger. So we are saying the 95% of the arrows is around 1.1%. But this actually demonstrates we can achieve very accurate voltage forecasting results under this 100% PV penetration case. Especially with 100% PV penetration, the voltages in the system are fluctuating a lot during from time to time. So before I conclude this part of the application, I want to mention that why it is really important to have the predictive estimation versus the traditional estimation. So with the predictive estimation, utilities and grid operators can proactively dispatch those controllable resources and can better coordinate the control efforts and prioritize the control needs. So we already integrated this predictive estimation into an optimal power flow problem to determine the best or the optimal set points for the distributed PVs in the system to minimize the system losses and at the same time to optimize the voltage profiles. So the figure on the left-hand side are actually showing by using the information from the predictive estimation and to do the controls. We are able to bring down the voltages of the system closer to the desired value, which is one per unit, compared to the case where there is no control. It's down in the feeder. And the figure on the left-hand side are actually showing a comparison between if we are doing the controls with and without the information from the predictive estimation. So with the information from the predictive estimation, the utilities will know which part of the system will have more voltage violations and can prioritize the control needs for that area. So that's the reason why we say by using the predictive estimation, we can further reduce the voltage violations by 30% in a 100% PV penetration scenario. So that's the first application on developing the predictive estimations for power systems situational awareness. So just checking, are there any questions I should answer right now or should we just wait until I finish the presentation? There is a question here. Are you able to hear? Yes, I am. So on the predictive state estimation, it was around slide 12. I actually have two questions. So the first one, it looked like the power flow equation, you made the assumption that you linearized it because it was too computationally. Is it, have you looked at parallelizing on the GPU? Yeah, that's a very good question. So for the powerful equations, yes, we did linearize that. And we haven't looked at the parallel implementations using the GPU yet. We did do a parallel implementation for the distributed algorithm we developed. But that's on CPU and we find the distributed algorithm is quite efficient. For example, we tested the distributed algorithm on a feeder with 2500 nodes. And if we use the centralized problem, it will take a minute, actually a little bit longer than a minute to solve that optimization. But if we can use the distributed algorithm with parallel implementation, we actually only needs a couple of seconds to solve that optimization problem. But we haven't tried implementing the algorithm on GPU yet. Thank you. Any other questions I should answer now or should we just. Okay, sure. Yeah, so that's the first application on the predictive estimation. Another application I want to talk about is on the behind meter data analytics. So, as I, as I mentioned briefly in the introduction part, so for the utilities, they only model the only monitor they smart meter data, and they don't really know what's happening behind the meter with all those DRs. So we have some initial work looking at can we estimate the behind meter PV generation from the smart meter data. So basically saying we just look at the smart meter data and can we know how much behind meter PV generation is. And the first work I want to show actually do the estimation of the behind meter PV using combining the physical plus statistical models. So the idea here is we want to model the estimation of the solar generation using a physical physical model. So we we not only want to estimate the solar generation behind the meter, but we also want to estimate the solar PV parameters. And we leverage a physical PV system performance model to map those PV parameters into the PV generation. And for the load part, we use the statistical model to model the load. So that's a statistical hidden market model regression. We model the load as a as a function of the variables including the hour of the day the temperature. If the day is wicked weekday or weekend, and we develop an iterative methods to do that. So the iterative method means that we take the net load time series data. And we just first assume an initial, like set for the PV parameters, and we use that set of PV parameters to ask to estimate the PV generation and steps, and add that PV generation back to the net load to get the load information and use the load information to estimate the parameters inside the hidden market model. And then we just check how much deviations we have on the net load, and we adjust the power for the solar estimation to do another iteration of estimating the parameters of the solar system. And basically do this, iteratively, until it converges. We will also do a post disaggregation adjustment just to make sure that the summation of the solar generation and the load load estimates will equal to the net load data exactly. So for this particular method, we have tested using 28 data in the year of 2015. And we tested the algorithm using 197 consumers, always the PV installation. And here I'm just showing a representative results for five days. And we are showing the actual measurements for the load consumption and the PV generation. And the green curve represents our proposed methods, the estimates from our proposed method, and the red curve represents the estimates using a state of the art method, which use the consumer mixture model to estimate the load consumption. What we see here is if we look at the load consumption data, we find that for this state of the art method, it actually overestimates the load consumption from time to time, especially during the time when the actual consumption is very low close to zero. But for our method, we can do a very good job in estimating the near to zero load consumptions in the data. And for the solar, for the solar estimation, we do say that for the first four days when the condition are pretty sunny and the solar generation are pretty smooth. So our methods and the state of the art methods can both do a very good job in estimating the PV generation. And also in the last day when the solar generation is very intermittent, our methods can do a very good job of estimating those intermittent PV output as well. So we did a quantification on how much improvement we are seeing. So for all the 28 days average, we are actually seeing 44% reduction in the mean squared error compared to the state of the art method. So by using our proposed method, we can accurately estimate the behind meter PV generation only only with the millimeter data at the households. And we actually do one step further by not only estimating the generation of those behind meter PV system, but we want to quantify the uncertainties associated with those PV generation. So we developed another method to provide the probabilistic estimation for those behind meter PV systems. And in this method, we leveraged the Bayesian structure time series model. And this is purely data-driven method. So we haven't really used any physical information into this data-driven method. And we have a very similar idea as what we did before. So we just modeled the solar generation and the low consumption as a function of the solar irradiance for the solar generation. And for the low consumption, it's a function of some historical, it's a function of those variables including the temperature, the time of the day, day of the week, and if it's weekday or weekend, so that type of information. And by using this purely data-driven method, we can develop this synthetic state space model which represents the disaggregated solar generation plus low consumption. And our goal is we want to estimate the parameters inside this state space model and the fitting is performed by combining the common filtering and the Markov chain Monte Carlo method. So for the validation, we are using I think three months of data to do the validation. And here I'm showing the results for one week. I think that's, yeah, that's in January. Here we show the behind meter PV estimation at the substation level. So this is aggregated PV generation. And we see here, we know only can provide accurate estimate of the PV generation as a substation. We can also quantify the uncertainties of those both for the sunny conditions and the intermittent conditions. For the estimation accuracy, we also compare our method with a baseline method in the literature, and we find that in the January, we test in both the wintertime, January and the summertime, August, and our methods can significantly improve the estimation accuracy or reduce the estimation error for the behind meter PV generation at the substations. So by using those by using this probabilistic method, we can not only provide the accurate behind meter PV estimation at the substation, but also quantify the uncertainties around those PV generations. So with that, this actually concludes my presentation today. So the key takeaways are predictive analytics that are really crucial to accommodate the high penetrations of distributed energy resources. And it has many applications beyond those two I just showed on providing the situational awareness for power system, and it can also help to improve the controllability in the past in the power system. And another key point, key takeaway is for the, we can actually get better results if we incorporate the physics into the data driven methods. So by embedding the power system domain knowledge into machine learning or data driven methods, those methods can actually do a better job. There are still quite several challenges associated with using or developing predictive analytics for power systems, especially with high penetrations of the ERs. So the first challenge is the hydrogenous data we are having in the power system. So the data may come from different type of sensors, and may have different coverage in the system. Maybe collected at a different time resolutions, different synchronization and different entities may own the data separately. So how can we best use the data to provide the situational awareness and leverage that data for the power system operations, and also planning is remains a challenge which we are looking at. And another challenge where we are trying to solve is the scalability of the methods. So we, we have done a lot of development in novel methods, which can work very well in small systems in toy examples. But how can we bring those methods into the real world, and are they scalable into larger systems. So that's a critical challenge we need to resolve. And I think the last challenge I want to highlight is, is really the adoption of the machine learning and data driven methods in the real world. So in part system there are machine learning algorithms have been widely used for forecasting for example. But beyond that, a lot of those new newer technologies involving data driven methods and machine learning methods are still remain on papers, and still remain within the research committee research community. And how can we bring those newer technologies to be really implemented, adopted in the real world for real system is a challenge we are we try to solve. So I think the very last slide I have is, if you want to learn more. So there is a link to the overall great related research. Also at least some other relevant publications I have, which are looking at how can you use data to do solar estimation, and how can you use data to do to design the more on the control side or to design the location or marginal price in the distribution systems. So that's just for for your interest. Okay. Yeah, I think that's the end of my presentation so I like, I'm happy to take any questions. Thank you for a wonderful presentation that there are a couple of questions in the Q&A before the photo questions I actually have a quick question. If we can go back to slide 16. Yes, this one. Yeah. You'll be the AMI, does it matter where the AMI data came from on the network. I don't think I get your questions or you are. You use 1%, 1% AMI, your table on the same. Yes. Does it matter where the AMI data came from on the network on the power grid. So, so in this test, we actually, so for this, let me think so for this realistic fader. Yes, we got the AMI data from the utility. So they're actually from the customers and but in some other testing, we actually do, we actually did some synthetic testing to say where you can place those AMI meters and can and that give you better results. And I think we, we didn't do much of the optimal sensor allocation, but we did perturb the placement of the sensors, and we do say a variation in the performance. So if you can actually deploy, if you have the choice of deploy the sensors at the optimal locations, you may get better results for this. How much variations did you see? So, actually, the figures I'm showing on the right hand side actually shows shows this variation, but this is just we we kind of choose from all those AMI locations. And I think if we are, if we start adding increase the percentage of the AMI data, the variation is actually smaller, because you are actually having more AMI data in the system. But I think if you have really small AMI percentage, this variation is not very large is I think it's around 0.1% variations. And the estimation is independent of time of day? That's a very good question. So we did the test at different time of the day, and we find that the estimation works consistently well. So we say consistent like 0.5% estimation error for the magnitude is better any time of the day. But we do say that from there, there are some times, especially not only not actually for the estimation but for the forecasting. If the PV generation is fluctuating very much, then your forecast is actually you may have a little bit larger errors in your forecast, because it's kind of very challenging to forecast those large swings in the voltages due to the fluctuation of the PV. But for the estimation, we do say the results are pretty consistent. Okay, thank you. Yeah, the first question is how do this models inform decision between alternative methods of storage such as battery versus hydrogen tanks or pump hydro. So with that question, in our work, we did consider some of the storage especially battery storage, because by using the information provided by the predictive estimation, we can coordinate the controls of the distributed PV and the storage. And we can actually design the optimization using different objective functions for the different devices like PV or storage, but we haven't looked too much into, you know, hydrogen or pump hydro in in our study, but we are trying to expand our collaboration with internal and raw people and also outside and raw to look at other other solutions involve hydrogen or pump hydro. So for the second question. The question asked does the performance of the estimation and what it's focusing results depend on the magnitude of the voltage violations in the network. Yeah, I think that that's that's a very good question so I mentioned I touch upon this a little bit earlier. So for the water for the estimation results. The results are pretty consistent throughout the day, no matter when the voltage is actually over the 1.05 per unit when the PV generation is very high, or under the 0.95 per unit when the, when the peak load is happening so the estimation results are fairly consistent. For the forecasting results. So the, the, the absolute value of the wattage magnitude may not have that much impact, but because we are forecasting the wattages into the next 15 minutes. And if there are some large wattage swings during that forecasting horizon is actually more challenging to forecast very accurately, especially for the extreme points. For the upper like extreme or the lower extreme. So we do say a slightly larger arrows in those times, but we are trying to improve that forecasting accuracy during those extreme points. So I think, yeah, the follow up question on that is if the wattage if the power flow linearization is more accurate closer to one per unit. So that's an excellent question. So for the power flow linearization we're doing. So it's not just a linearization around the operating points. So it's actually a linearization using two points, and it's a method called fixed points. And we did a test to say how accurate this linearization of the power flow models is. So we find that when the wattages are kind of ranging from, you know, 0.9 per unit to 1.1 per unit under the different loading conditions, the method actually the linearization is actually very accurate. But it is true that the linearization is more accurate when the wattages are closer to one per unit. So I think the next question is on the challenges for adoption in the real world and how to overcome those challenges. Yeah, so that's actually one of the biggest challenge we are facing for people working on, you know, data dreaming and machine learning methods for power systems. So, yeah, I think for power systems, a lot of the machine learning algorithm have been adopted in doing forecasting, especially low forecasting, also resource forecasting like solar and wind forecasting. But beyond that, a lot of the machine learning algorithm have not been really being adopted by power system operators or utilities yet. Yeah, it's kind of challenging to say how can we really promote the industry to adopt those new technologies. And I think on our side we have been working very closely with industry partners like utilities, vendors and some AI companies, and we are trying to push the adoptions to kind of fill in the gap between the technologies and the real world implementation or real world applications and to convince the industry partners that, you know, there are benefits of using those machine learning algorithms, and they can kind of to build the trust of using those. So hopefully we will see some, you know, pilots or some initial like adoptions of some of the newer technologies in the real world. But I will say this is a challenge that the whole research and industry, industry communities need to work together to address those. Yeah, so the next question is how do we, how do we linearize the power flow equations. So I have a link or a reference in my presentation slide. So you can, you can take a look at the linear, the paper on the linearized power flow equations. But we do consider the losses, and also the reactive power in the, in our linearization model. We consider that so that's the reason why the linearized power flow model is kind of, it's very, it's accurate and it actually fits our purpose of doing this estimation well. Yeah, the next questions is, is there any open source data sets for this sort of studies and it's so share some links. Yes, we do have some open source data sets for the studies we, we, we performed. And, yeah, I can, I can share some links in my slide back, and you can just click on the link to, to look at those open source data sets. And another heads up I want to share with you guys is. So currently we are working on a new project which just started to build an open library to host those advanced DR management algorithms like estimation and opf algorithms and related data sets under a DOE funded project. So we are going to build this open library to host the data analytics method and the data sets and that library is going to be completely open to public. So stay tuned on that. And you probably say some like email from me saying yeah you can go to this website to check out our data sets. Any other questions for students. So, looking at the performance of the models it's a little hard for me to wrap my mind around what a level of good performance is because in different domains. So, you know, we're talking about 5% mape error, like, do we do we care that we have that much error or not and so could you go back to I think slide 16 had some displays of performance I was just sort of wondering what you started the industry standard the state of the art can be at one level you can improve on it but what are some sort of guide posts or milestones of sort of what is useful performance. Applications you're talking about, and it could be very hard to define. Yeah, that's an excellent point so when we did this study we have been asking ourselves, how good is really good right, especially in terms of what age estimation, because what ages can literally what age only has a very small range to vary. So the voltage doesn't vary from zero to two per unit per se. So it only varies around, you know, like, five to 10% around one per unit. So that's one indication that, you know, if you ended up with 3% of the arrow, it could be if your voltage is one per unit, you're actually estimating your water to be 1.0 there's rate per unit. So that's going to be having some problem in the voltage estimation. So I think we did some study and I think I also get some feedback from our industry partners. I think though, if we can keep the voltage estimates arrow below 1% under very high PV penetration scenarios because under very high PV penetration set scenarios, you may have over voltage violations more frequently. So I think, yeah, our target was to achieve 1% of estimation arrow. And, yeah, we actually have, you know, achieved 0.5%. And, yeah, that actually means, you know, if your, if your voltage is one per one per unit, your estimated voltage could be just like 1.005 per unit. So that's fairly accurate for the estimation. And another point you brought up is how the arrow of the estimation will impact the control. So that's something we actually looked at in our study. So we have taken the estimated wattages and the forecasted wattages into our optimization to determine the optimal set points for DRs. And we find that within the errors we have right now, we, our control performance is very good. So we don't actually have a very, how should I say the propagation of the estimation arrow into the larger areas on the control side. So by knowing what's going to happen in the next 15 minutes for the wattages, we are able to preactively dispatch and coordinate the resources such as wattage regulators and the DRs to perform to optimize the wattage profile very well. And we compare that with the information of the perfect knowledge. And we don't see much deviation there. Yeah, but I think, yeah, your points is spot on. So that's the important question that how good, you should always ask, like how good is really good and how the arrows will propagate through, you know, Okay. Thank you. Okay, thank you. There is actually one more question in the Q&A, but maybe just I will email this question to you. Okay, and reply to this particular person I have. Okay, thank you very much. Thank you.