 We're excited to talk to you about big data because we're undergoing a revolution in the way we can use data to imagine the world that we live in and to have conversations about the changes that we see coming in the world. And I want to go back to 2009, when Barack Obama was inaugurated, because this fairly famous picture of the inauguration is an excellent idea of very early big data. That single image wasn't just an image. It was an interactive medium, something with more than a billion pixels that could be explored. And in zooming in to President Obama, you're able to show the places you care about greater detail and attention. And by exploring it, the person driving the interaction was able to become more in touch with the image, more intimate with the data. Here we see Michelle Obama with President Abraham Lincoln's Bible, the original Bible of Lincoln, which is quite exciting for us to be able to find in the picture. Our relationship data is changing because the interactivity we have now is due to new advances we have in storing information, in being able to use graphics processors on computers to make this possible, and on being very smart about using internet bandwidth cleverly. The ability to do this to a static image has recently been supplanted by the ability to do this to any moving image. So for instance, now I can take the question of biology of plants and present to a biology student plant behavior in a way that was not possible even a few years ago. For instance, I can take a billion pixel picture of Brassica rapsis every 15 minutes for a month. When I do this, then the student can zoom into the plant at 15 minute intervals seeing the plant growing. It's a time lapse, but it's a time lapse with an unbounded amount of resolution detail behind the time lapse so that the learner can watch the plant falling over and trying to get up. And they can see gravity at work and circumnutation at work as gravity and light fight an epic battle for the plant's future. Fundamentally, the fact that we can take now any exploration through space and time of data reveals to us the possibility of thinking about data very differently than we did 10 years ago where we had to put it in a table and look for trends over time. So we can do this in the very large case as well. One very nice example, our own sun. We think of it as a yellow ball in the sky, but the solar data observatory that is circling the sun allows us to look at the sun again with nearly a billion pixels per frame of resolution and see it rotating. But we can do more than see it rotating. We can interactively zoom into the sun and watch the actual processes on the surface of the sun. So for example, we can look at small coronal ejections on the surface of the sun. And again, by doing this, we take the child who's learning astronomy and we give them a new appreciation and understanding of the sun itself. One of my favorite things to do with this image is to show the origin of a solar flare. If you go in this particular day of sun data to the right spot, you can see the beginning of the coronal ejection inside this deep rift in the sun. So take a careful look here and you'll see the plasma starting to eject. And as it ejects there, you'll see it follow down the rifts and eventually end up as a solar flare. And as your children would enjoy knowing that solar flare is much, much larger than the planet Earth. So it's a good thing we're so far away from that particular body. Now the idea that we can show data and interact with it through space and time applies to any quantitative data source, even ones that we were not made with a camera, like this, but were made in simulation. One of the interesting questions that cosmologists have always faced is what are the gravitational lines of the universe look like? And how did we go from a homogeneous Big Bang to a very heterogeneous set of gravitational lines? And for the first time in the last few years, we're able to take billion pixel supercomputing simulations of gravity and show the formation of galaxies, star clusters, and superclusters. The reason this is interesting is because the same cosmologists who could only look at this quantitative data analytically using tools like MATLAB and Excel can now visually understand the data. This is important because when you're talking about trillions of pixels of data, and by the way the green circles are black holes forming, the white dots are solar systems and galaxies. When you think about the best, most efficient technique we have for taking billions of pixels of data and providing them to the human body, the most efficient technique we have is our eye. It's the best possible input. So the fact that we can take any data set and create from it quantitative information that can be visually intimate to us is a new revolution that we have in our understanding of the Earth itself. And when I take all the data sets that we have on Earth and consider the ways in which we can apply the same techniques to understanding Earth process, well that's where our conversation with you begins today. This is a beautiful picture of lights on the Earth at night, but it's a static picture. And what we do now doesn't need to be static any longer. So this satellite is showing you all fires across the planet Earth over time. So you can zoom in, for instance, in Saudi Arabia interactively, and this is freely available on the internet now. And you can see oil extraction flames. You can see Bakken and Marcellus crude oil extraction frames in America where drilling and mining operations are causing flaring. You can take any amount of Landsat information over the last 30 years and composite it to create interactive demonstrations of mining, of valley mining in Australia. These images over 30 years allow us to see human processes at an Earth scale and allow anybody to interactively see that, mountaintop removal in West Virginia. Here you see farmland becoming a massive shale gas field so you can see urban development. Here you see Lake Ormia in Iran disappearing because of damming here to create agricultural land during a drought over the course of 30 years. These are the kinds of changes that big data visually can enable for us. And to show you one very important part of this, which is deforestation, I'm very happy to present Matt Hansen. Thank you, Rila. So my theme that I'll be talking about is one application of using Earth observation or satellite imagery to track a particular dynamic, and that is forest cover change. And this is a similar sequence of images that Rila just showed from the Landsat sensor. It's been up in orbit since different versions of it since the early 1970s. And here we are in Brazil. In 30 years of record, you see fine grain deforestation pattern occurring in the appropriation of rainforest converted to pasture land and row crops. Initially, a lot of these clearings are very fine-scale colonizers from the south of the country coming up to establish a very modest kind of subsistence lifestyle. And when we look at this pattern in Rondonio, this is the famous fishbone pattern of forest cover change by individual landholders. Later on, we start to see big clearings in Madagrosso, which are related to agro-industry, big soybean fields, big industrial cattle ranches. And as we zoom out to the continental scale, we see this what we call the arc of deforestation along the front of the Amazon rainforest, going from the coast up in Pará all the way around Madagrosso to Rondonio and Acre state. And this is just an incredible record of human change on top of a landscape. And the biggest thing that I want to convey is how do we move from this type of data to a thematic output? And so I'll scoot over to this picture. When we show the time-lapse sequences, those are raw pictures. They're images. And we need to turn those into quantitative estimates. For example, when you look at the changes across Acre state in Brazil, how much forest was lost? How much forest grew back? And we have to have very clean inputs of imagery to turn that into a biophysical estimate of forest cover. And this is an example of a cloud-free global image using big data processes where we start with a million images, filter through all of the images, throw away the clouds, throw away the smoke, and try to examine only the land surface, and by tracking the really good pixels of the land surface, we can turn this into a measurement of forest extent and change. I want to say very importantly that big data and its use for societal good is based on really progressive data policies. The Landsat sensor has 40 years of data in the archive, and it's available to anyone on the planet. So I can make my maps, European Space Agency can make their maps with Landsat data. It's very important that providers have this type of mentality where they're tasking these instruments, storing the data, and then letting the data free. If we do that, we can kind of engage everyone from civil society to private industry to government to come and look at the data and come up with a consensus understanding of what's happening to the planet. I can't stress that enough because we move basically from research playing around these data demonstrating different capabilities to operational records. And I would like to you to picture having 50, 100 year records of every patch of ground on the earth. How often was it planted as soybean? How productive was it? How long has it been a city? When was that turned into a pervious surface? How much does it flood? And we have this capability right now, but it does depend on technology, it does depend on progressive kind of visions of data. Anyway, so we start with a million images. We can take this clean image and time series of it and turn it into a biophysical product. So here we have in green tree cover that didn't change over a 14 year period. In red, tree cover that was lost. If it's red, it's deforestation largely. Blue is gain, which means trees were planted or they naturally re-grew. And if you see pink, it's both. Pink is both loss and gain. So these are forestry land uses typically, where trees are treated as a crop. So over time, we see the trees coming and going. We might see the trees disappear and never come back. Furthermore, in the time domain, we can disaggregate this to a trend. So we look at the colors now. We're looking at only forced loss in this color bar of yellow to red, with blue highlighted as this past year. And we can see big fires in particular years in the far north, a boreal forest. We can see different reds and oranges, meaning more recent clearing of Chaco and Argentina. And one of the big findings of this particular dataset, and it wasn't a finding, it was a confirmation of what we know, was that the big deforestation country, Brazil, actually through a policy initiative that included civil society and industry and government, slowed the rate of deforestation starting in 2006 to the present. Just went down by 70 to 80%. And in this color bar, you can see the yellow colors dominate. The yellow colors are in the first five to six years of the period. And that's what we see in the arc of deforestation. This is the only, really, policy intervention in terms of slowing deforestation that we have to date. They get a lot of credit for this. And in fact, the proof of their policy success is the satellite record. Nobody can refute that. The other side of the coin is that all the other countries in the tropics combined drown out Brazil's signal. So increases in Chaco loss, in Argentina, Paraguay, Bolivia, Tanzania, Angola, Miumbo forest, Insular Southeast Asia, Southeast Asia, all of the Southeast Asian countries, deforestation, forest cover loss increasing over the same time period to the point that it drowns out the Brazil signal. It's a statistically significant increase in forest loss in the tropics. But again, this record lets us track that. And if we're going to make a policy intervention or whatever, we can measure the excess or otherwise of that policy intervention. We'll zoom into Indonesia to take it down to another scale. And one of the things that we like about the satellite is that it orbits the Earth. It is calibrated consistently. So we have a globally consistent picture that we can make comparisons apples to apples of what's happening. But we can drill down and look at individual countries, even parks, and say this is what's happening at a local scale. That's another really powerful part of this big data story. If we look at Indonesia, the beautiful cloud-free composite, this is raw imagery. We turn the raw imagery into the tree cover extent loss and gain picture. And you can see the forest land use transition in Indonesia going from west to east. You start in Sumatra, Kalimantan, Portion of Borneo, Sulawesi all the way over to Papua. Sumatra is the most mature stage of forest appropriation and conversion to higher-order land uses, largely palm estates, but also forestry acacia plantations. They have five-year cycles on some of these acacia plantations. So you look at Sumatra. It's almost done. There are a few protected areas left. Borneo is in the next stage. By the time you get out to Papua, it's only logging roads. Logging roads lace the landscape. And agro-industry is just starting. So this is like a nice little demonstration of humans taking a natural environment and converting it to a more higher-order economic purpose. This is the annual forest loss. And when you, again, track annually in 2012, from 2000-2012 after the fall of Saharto, there was a decline in forest clearing that increased all the way through 2012 to the point that Indonesia cleared more primary forest than Brazil in that year. And they have a quarter of the forest of Brazil. So they're going in opposite trends. Brazil's going down. Indonesia's going up. Now we're going to look at what two countries are on the island of Borneo, Malaysia, and Indonesia. Here's the image. Here is the green tree cover, red loss, blue gain, pink both. And as we look right here, we see a very clear line. This is a transnational boundary effect. So you see economics and policy and governance very clearly in the satellite image across administrative boundaries, whether they're parks within a country or between two countries. This is the border between Malaysia and Indonesia. On the Malaysian side, intensive conversion of lowland forest to palma states and like. And then as you go up into the more interior high topography of the interior Borneo, intensive logging, then you cross into the territory of Indonesia and it's mostly protected areas. And this is very useful information to understand the use. And then here's the annual change. And so basically the point is if we have progressive policies, if we have good observational data sets, and I like to describe these as public goods, you know the GPS that we all use, what if you had to pay every time you used one of those signals? It would be very low participation, right? It's just there. And the GPS spun out this huge suite of industries. And earth observation like weather satellite should be in the same domain that we have regular publicly available time series. And the value added comes in their use and the characterization of land dynamics and the downstream applications of understanding what it means for carbon emissions, what it means for development and human health. And with that I'll just pass it back to Illa to show an example of urbanization. Thank you. These same tools as you can see from the way that Professor Hansen speaks are powerful when the visualization is coupled with narrative, with storytelling by somebody who is a content expert. That's the critical bit here. How do we create big data products that are highly interactive but mated to strong content knowledge so that everybody can make sense of the information and become active participant in civic discourse? Now the satellite imagery that lets us see deforestation lets us see many different effects. And here's just a prelude to some of the different kinds of effects that you can see when you do earth time lapse. One example here, Shanghai over 30 years, lets you see land use and the changes in land use from farmland to urban land areas. And again, this allows you to understand both scale and categorization over time. Lights at night let you see places that have developed massively greater electricity infrastructure in the last 20 years, red. And you can see that the whole area we're in has seen a very significant increase in electricity usage and urban infrastructure. And now I want to present to you two other very different techniques for big data visualization. When we create time lapses from an earth view, you're able to appreciate changes on the earth over time. But we can also use the same graphics tools, the same computer vision tools to take many, many dimensions of data in human behavior and make them visualized over time. One interesting example is human commute patterns, the way we move from home to work. And an example here that I present to you is an example just for the state of Pennsylvania. These are every commute of every car in the state of Pennsylvania. But what the visualization allows you to do is it allows you to visualize everywhere people work and everywhere people live in the state. But then by animating between those two points and giving you the ability to zoom in and out of the image, you can start to see patterns in agglomeration, patterns in behavior of how people live and work. And by color, I see income level. Red here represents low income jobs, green represents high income jobs. So you can start to see the way people commute in terms of the suburban lifestyle that they have, the amount of carbon that they release in going to work, and the number of locations they work at. And the low income consumers, in fact work in a highly disparate set of places, those are the malls, whereas the high income workers work in the financial district in the city center. So these are animation tools that can literally take trillions of pixels of data and push them to your screen at the same time so that you can start to see patterns and trends that were very difficult to understand non-visually before. Demographic data writ large. Demographic data about all of us is a very big challenge. There is 50 dimensions in the US census, everything from gender to how much money we each make to the education level we have and what type of work we have. But now we can take the exact same tool that lets us see landsat images on the earth and we can visualize arbitrary demographic data. This shows all census data for the United States. Red represents high income jobs in housing blocks. Green represents low income jobs. So right away you can see disparities in wealth across the United States. But if you zoom in and play with time, because remember we can go back and forth in time, for the city of Seattle, look where is green and where is red. And look as we slide time at how the green is overcome by red, that's gentrification. That's why people working in low and middle income jobs in Seattle have to drive for an hour to get to work because they can't afford to live near their workplace any longer. Now if I do the same thing for Detroit, Michigan, that area has had a 10 year recession. And when you play over time, the same transitional video, there is no change in gentrification at the block level. Because Detroit has had a recession for 10 years, so there has been no conversion of wealth. These are the stories that you can create demographically, but the real power comes when you add more dimensions. We can take the wealth picture, green, low income jobs, red high income jobs, and I can add race in the third dimension. So let's add Hispanic race, proportion of people who are Hispanic country origin. Now when I take the map and manipulate it in three dimensions, the spikes I see, the peaks, are where Hispanics live in the United States. And what color are they? They're green. That's bad news. There's a very strong correlation between density of Hispanics in this example and lack of wealth. I can do the same thing for Asians by going in the US census data to all Asian demographics. And what you see is completely different. The spikes that you see developing are on San Francisco, Seattle, and along the eastern coast. And they're red, which shows you a high correlation to wealth. And for a depressing picture, choose African-American. And then what you see is massive densities of African-Americans dispersed throughout the east and the southeast, but with no wealth, except in Washington DC. What big data manipulation buys us is the ability to change the relationship we have to the data because we can ask the questions we could not have asked before and visualize them. I'm gonna take all that data that was on the map now and simply put it on an X, Y axis instead. The same tool. I'm simply not using the map anymore. Now I'm showing you male-female ratio along this line and number of jobs along this line. And I've scattered all possible jobs onto that line for every housing block in the US. And if I zoom into that, what you'll see right away in green is it's centered at 50-50. That is the places that have the most jobs. Half of them are men, half are women doing the jobs. But this red is still African-American. I didn't change the color code. So what this shows you is it's well left of the middle. Why? Why are there tens of millions of more blocks with women working than men in America who are African-American? Because of the incarceration rate. We have so many African-American men in jail or out of jail that have difficulty getting a job in the US that we have a massive bias toward female employment in the African-American community but not in the non-African-American community. Those are the trends that we can see. Now, as scientists, as social scientists and as demographers, these trends are powerful. But the real power of big data with my last presentation part comes when you add narrative and social sharing. Here we have a community in Pittsburgh who has a Coke plant. This is a Coke oven that makes a coal into Coke for steel mill operations. So they have one in their backyard and they have major health problems with asthma and lung disease. So they have their own gigapixel panorama taking pictures every 10 seconds all the time, 24 by seven, on their own webpage. Mated to that, they have real-time wind speed. They have real-time reports from the federal government of air quality in their neighborhood, as well as three of their own houses. So you can see real-time PM 2.5. And they report their own smells and asthma attacks. So they've taken big data and created an interactive site that allows governments, municipalities, and the public all to take the same data as Matt was saying and use the same data as ground truth to understand the health consequences of local industrial action. This empowers the local citizenry because now they have the ability to be at the same table with municipal leaders and with industrial leaders together trying to solve a problem by starting with the common ground of the same language. And that language is interactive big data.