 So, on with the student and postdoc poster presentations, I think you all know that all of the work is done by the students and postdocs. They're the creative force, they're the labor, they're the source of innovation, and they are by far our best product. And so, it is a great pleasure to have this opportunity to have you see 10 or so of our grad students and postdocs. They've all been encouraged to not try to present their entire PhD thesis in 90 seconds. We'll see how they do at that. Most of them are really good at that. Our first speaker is Adrian Albert. He's in the electrical engineering department. His advisors are professors Jim Sweeney and Ram Rajagapal. The title of his work is Data-Driven Energy Demand Management for the SmartGrid. Adrian. Okay, I'm going to talk a little bit about some of my PhD work called Data-Driven Demand Set Management for the Energy Grid. And really one thing that I want to focus here, and it's going to be in my poster, is the electric side, the electricity part of the energy spectrum. Utility companies gather lots and lots of data about how people consume electricity in the residential sector and other sectors. This data essentially hides decisions by the consumers themselves. So what the utility has access to is not what decisions people make or information about what decisions people make, but rather what the consequence of those decisions are, which is really the data that they observe. So this work is trying to infer back the decisions from the data that is observable. So the hidden decisions. In this particular case is the decisions whether to use HVAC or not. HVAC is a big component of the residential electricity budget in the United States. So essentially we're modeling here the decision process of whether to use HVAC or not, using some modeling tools. In this setting, the users will essentially look at how, will take signals from the outside, from the weather, temperature, they will consume according to a given model. And what we want to do here is really to start plotting or understanding what the data really tells us. In this plot I'm showing the consumption versus temperature for two different users. Each dot is a smart meter data point, an actual recording of consumption. And this is what the model sees. I'm trying to essentially decompose consumption into the decisions to use HVAC or not. So for example, we have Bob here on the top that's using a lot of consumption that's responsive to temperature in the form of AC, but Alice is using both AC and HVAC. And this we can actually show in the data, and please, by my poster, to hear more about it. Thank you, Adrian. Thank you, Adrian. Our next presentation is by three people, Jason Chang, Taylor Dahlke, and Nori Nakata. They're all in biophysics. Their advisor is Professor Biondo Biondi, who we heard this morning. And the work is titled Resolving Subsurface Structure from Cross-Correlations of Continuously Recorded Ambient Noise at Long Beach, California. And the presentation is by Jason and Taylor. Welcome, guys. Thank you, everybody. Hopefully, I'll keep this pretty short. I'm pretty hungry, too. So this is, as you know, to tell of our presentation. So really, what we're doing is, as Biondo has told you guys about, is really getting ambient noise and trying to resolve any sort of structure. And so as you guys know, these are just a few of the applications. So it's important for earthquake hazard analysis. So the understanding of velocity in a near surface can really tell you how much things are going to shake when an earthquake occurs, mostly for hydrogeology. And so you can figure out groundwater monitoring and also exploration seismology, which is pretty much what our group does. But not only the production of oil and finding reservoirs, but also really for monitoring the overburden. And we really want to prevent those sort of gas leaks that we've seen. And so this is our data set. It's actually pretty sweet. It's from Long Beach, California. And you can actually see. I'm going to step aside here. All right. I'll just point. You can see Interstate 405 up north there. And it's really a really urban environment. And this is quite different than the ocean bottom cable environment that you guys have already seen. And so you can imagine running this movie for 35 days straight, for instance, and using that data to eventually resolve some sort of subsurface structure to address those questions or those problems at the beginning of my presentation here. And so up top, you can see kind of tomography results. And actually probably the most exciting thing, and I hope you guys drop by because I want to explain more about this, is that we can find body waves, which are quite difficult to extract from ambient seismic noise. And it's actually pretty awesome stuff. So I'll end it there. Please drop by. We love to talk to all of you during lunch. Thanks a lot. Thank you. The next speaker is Jungsuk Kwak. He's an electrical engineering. His advisor is Professor Ramer Jagapal, who, by the way, is speaking later this afternoon. Jungsuk's paper is titled, Segmenting Customers from Smart Meter Data. My name is Jungsuk Kwak. He introduced... I'm a PhD candidate in the EE department. And today I will talk about some part of my PhD thesis topics. And this is data-driven and its management. And so the reason why I put this slide, for the first slide, is this slide is explaining the best of what we are doing in our lab. So Smart Meter is already installed. That's why it's done in gray color. And then what we are doing is learning and targeting. So basically the purpose of all the data analytics on Smart Meter data is we want to improve the... We want to improve the demand side and its management using the Smart Meter data. So based on the time series energy consumption data, we learn the customer's consumption pattern and we want to predict a customer's consumption and load shapes. And based on the various features we want to target for certain energy program. And then on the selected customers, we want to communicate with them. If we have the feedback data, then totally the research would be the closed cycle. But from the academy, we don't have any way to interact with the customers directly. So that is to do part in green color. And then this is how to segment the customers. The main idea of the whole flow is how to generate pre-processed load shape dictionary. So the purpose of pre-processed load shape dictionary is we cannot access low-smart meter data whenever we want to extract some features, whenever we want to segment some customers. So we should have some scalable methodology to make this customer segmentation on... Customer segmentation on big data sets. So if you come to my posters in the basement, I will explain how to generate the pre-processed load shape dictionary. And then the 16 plots are the example of the representative load shapes in entire like 40 million daily profiles. And then the last slide is the interesting facts which I found by this customer segmentation methodology. And we found that when we use 1,000 representative load shape, we can cover 90% of entire consumption pattern by only 272 load shapes. And then the other interesting fact we found is traditionally the utility company sought user residential customers always using the double-pick load shape. But from our analysis, double-pick load shape was less than 25%, and that is one of the interesting facts. That's it. Thank you, Jungsock. Thank you, Jungsock. Our next speaker is Louis Lee in the Energy Resources Engineering Department. His advisor is Jeff Kers, who spoke this morning. His paper is titled Using Analogous Data for Subsurface Characterization. Louis, welcome. Thank you. The project I'm going to be talking about is a project that Jeff, my advisor, and I worked on last summer. And it's to use analogous data, which is the training images that he described earlier in his talk to characterize subsurface. So the main problem that we have in reservoir engineering is usually just a lack of data, but this is also the case in climate modeling, for example, that Jeff showed. So we would need to model the spatial dependence over this field. So geostatistics was developed for this purpose. And geostatistics was developed in the 60s with random function theory. So it requires modeling those covariances that Jeff described, and this is usually done using a varrogram. So the problem with a varrogram is that it is somewhat difficult to model, but it also requires a mathematical decomposition of the data. And this decomposition is difficult because it essentially boils down to you have two unknowns and one equation. So in this particular example, we took 100 data points and we used these samples to generate this estimate of a region. So the method that we've been investigating uses analogous data. So this is a training image that Jeff described earlier. So in our case, this could be a neighboring region, perhaps, that's already undergoing production, so we know more about it. And our approach computes the statistics directly on this training image. So we no longer need to do this decomposition or we also don't need to do this varrogram modeling. So if this is the actual truth here on the left, our estimate is shown on the far right there. And if we compare that to the estimate using the varrogram based approach, we can see that ours is smoother and it's also better able to describe the large scale features in this image. So if you want to know more, please come by my poster later. I'll lunch. Thank you. The paper is by Chinmoy Mandeim in electrical engineering and Jenguan Zhu in computer science. Their advisor is Balaji Prabhakar and their paper is Congestion and Parking Relief Incentives. Welcome guys. Good morning. So our project is Capri, Congestion and Parking Relief Incentives. As we know, traffic congestion has always been a big problem and it is costing not only our time but also money. According to a national report, it is estimated that the wasted time and fuel due to congestion in the U.S. is $121 billion in 2011 and the number which is projected to jump to $199 billion in 2020. So to alleviate the increasing traffic pressure in a Stanford campus, we proposed the Capri project. In this project, we have built 20 RFID scanners across the Stanford campus to monitor the through traffic into campus and we pay a random chunky rewards to the commuters for them to change their commute times. So basically, we pay users rewards if they shift their commute time from peak hours to off peak hours or they shift their commute mode from automobiles to walking and biking. Up to now, our program has run for two years. We have 4,500 users generating more than half a million automobile commutes and more than 54,000 working by commutes. Based on this data, we have carried out analysis and built a generative model to explain how and why users shift their commute time under the influence of our incentivized system. If you are interested to our solution to the traffic congestion problem, please come to our poster session. Thank you. A subject that gets close to all of our hearts. The next presentation is by Sid Patel in the civil and environmental engineering department. His advisor is Ram Rajagapal and the paper is aggregation for load servicing. So this research focuses on the scenario of electric utilities serving residential customers. The conventional practice is for utility to take all of the customers in a geographical area, average together all of their consumption, and offer the same rate plan to all of them even though some users are more expensive to service than others. But now we have smart meter data, which gives us high resolution information on individual consumption. So the question that we're asking is, how can we use this data to design sensible and customized rate plans? Our model is a utility that purchases electricity in a two-stage wholesale market, the day ahead market and the real-time market. At the day ahead stage, the utility has to forecast its users' consumption and it purchases electricity at the day ahead price vector, which is the blue curve there. At real-time, if the actual consumption of its users deviates from its forecast, it has to purchase the difference at the real-time price, the red curve, which can be quite spiky. We know that it's impractical to forecast the consumption of just one user. So the utility has to aggregate customers together to effectively forecast and purchase electricity for them. So what we studied is what is the effect of this aggregation on the per-unit cost of electricity that a certain group of customers can have, as well as the accuracy of forecasting their consumption. Because we can make an observation here. Users who consume off-peak should be able to get a cheaper rate because their electricity can be purchased at a lower per-unit price on the day ahead market. And users whose consumption is easier to forecast should also be able to get a cheaper rate because they contribute less to real-time deviations. So we studied the trade-off between these two features as the utility attempts to build aggregates of its customer base on a basis that is not just geographical, but that is using the data derived from smart meters. And we studied this using a PG&E smart meter data set. These are obviously cartoons, but the real data and analyses are on the poster. Our next speaker is Rafi Sevlian in electrical engineering. His advisor is also Professor Romer Jagapal. His paper is titled, A Model for the Effect of Aggregation on Short-Term Load Forecasting. Hi, I'm going to talk about my work. I guess you gave the title. So load forecasting is used all throughout power systems and other infrastructure systems. And now that we're thinking about smart grid, what are the components of smart grid? It'll be local generation, local storage and influx of information and all this stuff. And maybe we could tailor applications to people. This is an idea, right? And you might say, oh, that might be a good idea. However, it turns out it's very difficult to forecast individuals, but it's very easy to forecast a very large aggregate. So forecasting an individual would give you 50% error. Forecasting the entire grid is going to give you 2% error. And you might think, why is that the case? Well, the colloquial answer is law of large numbers or the fact that if I get a bunch of people, I'm just going to average them and I'm going to swamp out the noise. Turns out that's partially correct because it feels only the law of large numbers. You should have this 1 over square root of n drop off and load forecast accuracy all the way down. However, empirically in my research, trying tons of different forecasters using all the PGINI data, you realize that you have this irreducibility here. So my research is developing a proper mathematical model that fits the data correctly, that fits at the individual level and at the aggregate level trying to explain this aggregation error curve and trying to find applications in the smart grid. One of them was what Sid just presented, and there's many others. One could be to use this as a proxy for distribution system operation. How can we integrate forecasts? Well, you should have a good model for how it scales with the size and also for improving forecasting and general system design because if you think about it, we've built this system at such a large level because the per unit cost of uncertainty, we expect it to go down. But as you see, there's a limit to how far it goes down. So we could actually think about how could we resize the grid so that the per unit cost is just as the same as it was before, but maybe try to make it more efficient. So I just think it's a very good system design parameter, this level of aggregation. So come by and have any questions. Okay, thank you. Our next speaker is Sahil Shnoy in the physics department. His advisor is Professor Dmitry Gorinevsky. His work is risk management and forecasting for electricity markets. Hi, Sahil. Hi, everybody. So the motivation behind this work comes from the fact that utilities need to forecast 24 hours in advance what their power usage, how much power they need to buy from electricity markets. So there's two types of risks to this, which is essentially one where you either buy too much power and it's not all of its use. So most of it goes to waste or you don't buy enough power and you get hit by the spot costs of the electricity market. So to tackle this problem, we're essentially trying to develop a risk adjusted forecast. So as initially we build a forecasting model and then we want to sort of estimate our margin that we need to buy in addition or less than what we forecast in order to reduce the risk of these extreme events from happening. So in our robust regression model, which means that we throw away outliers and fit the bulk of the data, we essentially use such things as temperature, time of the day, month it is, et cetera, and the load value in previous times and whatnot. And the novelty behind this project is that we're essentially combining two different branches of statistics, one which is robust regression, which fits only the bulk of the data. And as you can see here by this histogram on the top, we're just showing the body of distribution, but that doesn't account for the outliers that we have, which I've shown below, which we also, which we fit. So we're essentially combining two completely complementary branches of statistics together that were originally not, this work has not really been done before. Just to give an example of some of the results we got, we essentially, we essentially look at two different models. One is the basic normal model that everyone uses. And the other one is this long tail model that analyzes these extreme events of happening. And as we can, as, and results turn out that you actually save $47,000 per day, if you listen to your long tail model, but if you, if you, which amounts to $17.1 million a year. And if you're interested in knowing more about this, please stop by the poster and I'll give you more details. Thank you. Our next speaker is Pejman Tomasebi in energy resources engineering, Pejman's advisor is Jeff Cares. His paper is high performance computational methods for reservoir modeling. Hi, Pejman. Hi, everyone. Actually, I'm a little lucky because Jeff talked about the training image and geo statistic before. So if I want to review just what is the actually the training image, I should say that we have a lot of data in the reserve model, like the geological model, like the remote sensing data and the well data that we have, we also have access to seismic data. So if you want to build this training image in order to use in the reserve on modeling, we just use all of the information and actually build this training image for the reserve on modeling. But how we should use this training image in the reserve on modeling. Actually, Jeff talked about that we look at this problem as a puzzle, puzzle solving. So suppose that you have this training image and you have a bunch of the hard well data or information about actually the reserve that you want to simulate. So like solving the puzzle, we use the different pieces of this image and put them together based on the similarity and also the consistency with the data that we have. But where is the problem? Actually, the problem that we have, and let me to say that where is the big data? The big data that we need to finally deal with is this training image in this reserve on model. So if you look at the size of this image, it's very, actually it's larger than this 2D image that we have. So based on this problem, actually we grouped and classified our solution into two groups. The algorithmic, actually, the acceleration and the hardware. In the algorithmic acceleration, we believe that it's not really necessary that we deal with this very large training image because we can actually reduce the size of this training image and this data into a smaller version. And instead of working with this large data, for instance, we can actually just work with this smaller training image and handle the computational time that we have. Another part is to use the availability, actually, of the hardware that we have. For example, we can use the classroom computing and make the algorithm parallel and we can also use the GPU in order to have a more acceleration on the reservoir modeling by doing some of the computation like the convolution and FFT by using the GPU. Thank you. Our next speaker is Chuan Tian in the Energy Resources Engineering Department. His advisor is Professor Roland Horn. His paper is Machine Learning for Well-Test Interpretation. Hi, I'm Chuan Tian. I'm going to talk about machine learning in the Well-Test Interpretation. So in conventional Well-Test, we already have a well-controlled flow rate like we shown here. You already have a constant rate and then we record the pressure data and we calculate the derivative plot of the pressure. And we use this derivative plot to estimate reservoir parameters like well-boost storage, permeability, and reservoir boundaries. And permanent download gauge is a device which can record both pressure and flow rate data. And what we would like to do is to interpret the data from permanent download gauge to estimate reservoir parameters. Like we shown here, here is a flow rate recorded by PDG and there is the pressure recorded by PDG. However, there are several challenges. So first of all, the data is really noisy and also because we have a variable flow rate here, it makes the pressure signal harder to develop. And also, we need breakpoints to separate each transient in the pressure signal. And also because the PDG is operated during our reservoir operation, the record can last for several years, which contains millions of data. The volume of the data itself can be a challenge problem. And what we did, we formulate this problem into a supervised learning problem. And here is a case shows our result on the real PDG data. This is the flow rate and pressure from the PDG. And as we can see here, our method is able to remove the noise from the data and also to deconvolve the pressure signal without implicit breakpoint detection. And also because our algorithm runs really fast, so we can apply this method on the huge volume data from PDG. And if you want to know more how we achieve these results, just come by at my poster and I'll give you more details. Thank you very much. Please join me in thanking all of the students for their presentations.