 Dr. Rahul Ramachandran is deputy editor for Earth Science Informatics and a principal research scientist at the Information Technology and Systems Centre at the University of Alabama in Huntsville. So this is the second project that we're working on right now. It's automated event services that we expect to go there. And the goal of this project is to look at different big data technologies or identifying events of phenomena in Earth Science data. One of the three technologies that we are looking at is Hadoop, SIDB and Neapolaris. We are focusing more on SIDB right now. So the first thing that we did was we tried to scope the problem down. And again, if you look at the research papers in atmospheric science, there are five basically common research approaches. You do event analysis where you're actually looking in detail about what happened in the rain. There is event climatology papers where people are analyzing the data to find the presentativeness of phenomena in the center. What is the spatial temporal distribution that's in the rain? What are the cycle durations? Things like that. The third common paper is the synoptic climatology where you're trying to look at an evolution of an event. What happened before it at the time of the event and after the event? And then the fourth method is forecast method. We are trying to find out, you know, the research is focusing on trying to come up with new methods for predicting a particular event. So the goal is if you build a new analytics tool for big data, you should be able to support four of the five common research approaches that are used at least in atmospheric science, which happens by the way to be a big data domain, right? We are looking at satellite imagery, model simulation and stuff. So by focusing on your analytics, you know, the tool will actually provide things, these value propositions to an end user that for event analysis, the tool can allow people to find interesting events in the data. The event climatology is a no brainer, right? If you have large volumes of data, you should be able to do event climatology very easily with the data sets. You can do synoptic climatology where you have figure out how many events in your data, then you can do causality analysis with other data sets. And finally, like for the forecast methods, if you can build this tool to be more interactive, then you can allow researchers to develop futuristic tools to refine their methods for finding new events and data sets. We, and again, this is still evolving in terms of how we envision the civil and analysis workflow to be. And we think there are two stages to this. The first stage is, you know, working with the big data on an HPC where you're focusing on the analytics part which is more of an interactive exploration part. And then you have to go to the second stage where you've done your discovery of the data, discovery of the event, and you've done the segmentation and the factorization of the event. And you've reached a descriptive data part where it's a much smaller piece that you can bring down to do your detailed science analysis. But it's not as clear cut, you know, there is obviously overlaps between the two. And the struggle is finding out, you know, what you can do at different stages and what would be the most useful. So this is our, you know, the design, a high-level system architecture. So this is an HPC trying to create, looking at IDB, which is chunking the data on the HPC. We're building an event that we've set up. The goal is to develop an event's package that can be then run within an IDE or a GUI through a browser. And we're using CMS to handle the role, the authentication and commission, and also the collaboration part. The idea is, you know, you allow people to share what they're discovering. So the initial focus is to basically build a simple stuff, you know, simple visualizations for event detection like bar charts, control plots and maps, queries to do, detection, segmentation, factorization, correlation statistics. The tracking is, it's a trickier problem to implement in this kind of an architecture. So that's, it's in our list, but it may be looking down the road in terms of when it's actually developed. So the notion of an analytics pack, you know, the goal is to build a package that can work with Python and R. So a user who are familiar with, you know, using Python and R as part of their analysis can use it. So it improves the role notion of adoption. So this is a simple example, you know, this is based on our data mining work that we've done earlier. So you can actually import a package. So this is add and import, and then you can run these functions, or these functions will be actually running on the HPC. But the user, you know, using the desktop to do that analysis. So we have done some initial prototypes with this. This is using Polaris, which is a home ground system. We tested it out with SSMI data, which is a microwave data. We took different parameters, grain grade, thin speed, cold water vapor, liquid water. So this is a fairly small dataset. It's about one terabyte. We striped it across our whole cluster. And then we have an engine that we can query this dataset pretty fast. So this is a really simple query UI where you can select, you know, which datasets you want. We have data from 1987 to 2012. We have two bosses, very basic queries, you know, one needs to be around, two needs to be around, simple thresholds on the data, and then doing statistical analysis on the data. And you can select your data channels, you know, your time range of interest on it. So this is really a simple example. But this is the notion here is that it's interactive. You're actually interactively playing with 20 years of satellite data. You know, it's your simple threshold. So simple question I want to look at how it comes in the Gulf of Mexico, right? So if you were rain great, you want to find extreme rain rate events in this spatial region, you can already get results based on the year and you can drill down to a particular year. And then you can see season distribution of these events. Then you see one month looks at more. So this is where your, you know, this whole notion of the data pointing you in the direction comes in. What's going on? Then you can drill down to a particular month and you can actually pick up the particular hurricane. So this is Katrina and you have a heat map, which is basically a spatial frequency. It's showing the location of the rain for that month. And then you can link it to the actual data. So you can actually go see. This is the actual satellite data. You can browse the actual image for that particular data. So this is a really simple example. But I presented this in a conference. And I actually, after my talk, the scientist who works with this data, she came and said, you know what, I want to play with your tool. So she and I sat and, you know, in between sessions. So this is the work that she's working on right now. This is a, she's looking at gap friends in Central America. I will not pronounce it. It's basically a phenomena that occurs. The weather conditions are right. That the topology causes this region of jet to fall. And this regional jet then basically causes this ocean of feelings and just reaching, which is very important for the local industry. I guess that's when they do the fishing and stuff like that. So they are working on algorithms to, you know, detect these things. So she said, okay, I want to play with your tool to see whether I can do this by just running a simple query. So we ran a simple threshold query on the wind speed for that particular region. And then we, you know, select a particular year. And then she could start seeing the season of distribution of those events. And if you select a particular month, the heat map shows exactly where it is. You can actually see the three events that are active. So she was super excited. And then this is not the only case that we did. And again, you can verify by linking back to the actual data to make sure yes. The other case that she looked at is this whole Somalia, which is, you know, a really important precursor to the Indian months. So which is again, no little jet that occurs off this horn of Somalia. So again, simple stuff, threshold query, looking at a particular region. You can see the heat map picks up exactly where it's happening. You can see the seasonal distribution of these events. One of the, you know, important questions is when does Somalia get active start. So you can select the particular month and see where the onset of this jet starts with the data. So it's really nice that if you have a tool where you have this large data that you can actually play with it and explore interactively, the kinds of questions you can ask is, you know, these examples demonstrate that. People kind of need things with this. Anyway, that's our presentation. We've got any questions. I'll be more happy to answer.