 Excellent. Thank you so much. It's a pleasure to be here, and to be a guest of this university and of Paradigma, and to be in Madrid for the first time in 15 years, which has been beautiful, even through the eyes of someone whose personal clock is somewhere over the North Atlantic right now. And, of course, I always love to come and discuss big data wherever anyone is talking about it. The conversations often become very technical, and through the day to day you'll probably have a lot of informative technical discussions with your colleagues and the other speakers. But I prefer to step back just a little bit and look at questions other than the sort of the ones that you'll encounter today, like, you know, well, is this really big data? It's only one terabyte, you have big data and it's four terabytes, and I use Hadoop and you use Cassandra. And look instead at big data as an idea and as a mindset. Two of my colleagues put it well, I think, when they wrote that big data is more about attitude than tools. It's about recognizing that you have valuable, rich data and about doing something with it to get some sort of special insight from it. So I'm going to talk a little bit about what this mindset means, the big data mindset, and then give you a couple of examples of places that are outside the usual examples that illustrate, though, the mindset as it's applied widely. So the first part of the mindset is that you measure everything. More data generally means better conclusions and a fuller understanding of what you're doing. You don't have to rely on guesses if you've actually measured something. For example, suppose you're selling car insurance and you're trying to price the car insurance based on how dangerous a driver your client is. The traditional way of doing this is to actually use a big data technique. You predict how likely a person is to have a crash based on his age, his gender, perhaps what kind of car he drives, whether he has had a crash in the past. But recently, sensors have become widespread enough and data has become easy enough to process in large quantities that a handful of American insurance companies now offer services like this one, which is from Progressive and it's called Snapshot. It's a little device that you plug into the diagnostic port on your car and it measures how you actually drive. So the insurance company doesn't guess how you drive. This measures how hard you apply the brakes, whether you drive a lot at night, which is particularly dangerous, whether you drive a whole lot of kilometers every year, which is, of course, the best predictor for your likelihood of a crash is that you just drive a lot. And a handful of other things and it transmits this data back to Progressive, which monitors your driving in real time. And the value of this capture to Progressive and to its customers is up to 30%. A car insurance plan where you voluntarily accept this monitoring service costs as much as 30% less than a comparable plan where you don't. It's that valuable to the insurance company to be able to know how you actually drive rather than guessing at it. And when techniques like this aren't available, you compute things that you can't measure and you try to do it in a sophisticated way. So I wrote recently about the milling heads, the machining heads in industrial milling tools. These are these giant machines that cost hundreds of thousands of euros and are the automated cutting devices that make things like the aluminum body for a MacBook like this one. And it's very difficult to know when the cutting device is wearing out, even though it's imperative to know because it lowers the quality of the work and it can fail catastrophically. But to know whether the device is wearing out, you have to remove it from the machine, you have to put it into a very powerful microscope, it takes time and a lot of money. And so some researchers have found that you can instead make much easier measurements. You can measure the sound coming from the machine, you can measure the vibrations coming from the machine, the rate of acceleration of the drill bit over time, and put this into a machine learning algorithm that then predicts the wear in the drill bit. And it's extremely accurate. And this is being used in applications as complex as jet engines. GE's latest jet engine, which is called the Gen X, and it powers the Boeing 747-800 and the 787 Dreamliner, captures 10 times as much data from sensors like this in the engine that don't measure actual wear, they measure proxies for wear. And as soon as the plane lands, it uploads as much as one terabyte of data for each engine at the end of a long flight that's then used in sophisticated machine learning processes to figure out what in the engine is wearing down and what needs to be replaced immediately. When you have data, encompass as much of it as possible in a single model. Suppose you're an airline and you're trying to figure out which of your customers are least likely to actually make it on a flight after they've bought a ticket. Some high percentage of people with a ticket never actually come and check in on the flight and sit down in their seat. You want to figure out who they are so that you can resell their tickets to other people. It's an important part of yield management. If you were doing this manually in the era before big data, you might take some guesses at what it is that's correlated with not making it on the airplane. You might say, well, we'll look at the type of fare that a customer has bought or the weather elsewhere in the system, maybe there are delays. Or if you take a big data mindset, you toss all of your data in and leave it to the machine to do some discovery. And you find, as one airline did and the economist reported, that the best indicator that someone will make it onto a seat having bought a ticket is that he ordered the vegetarian meal on the airplane. This isn't necessarily something that you would think to measure, think to set up a model to look at. But when you have lots and lots of data and you put it all together, you discover things that are interesting like this. You don't limit yourself to a specific part of your data set. You toss everything in. And finally, big data is about constant recurring improvement. Every time you make a change, you take lots of sensitive, high-quality measurements and you understand the complete impact of the change you've made on the system. And then either make more changes or less changes, move it forward or move it back. And in this way, you constantly optimize whatever it is that you're working on. Now, these are obvious, right? They come down to basically understand your system, right? It's not a novel idea that you should take lots of measurements and do your best to understand quantitatively how your system works. And indeed, the predecessors of big data have been around for a very, very long time. This is Matthew Fontaine Mori, who is a commander in the United States Navy. He was the first superintendent of the United States Naval Observatory, which today has the atomic clock. But in the 1840s, he pioneered a terrific early example of big data. When he stumbled across a big pile of shipping logs, the logs from American naval vessels that have been all over the world, and these logs aren't very interesting on their own. They list at the beginning of every watch on the ship, the location of the ship and the time. And so it's not really very interesting to know that a single ship was in this place at this time and then in this place at this time. But when you take thousands of these logs and apply some computation, you can come up with a very accurate picture of currents and winds and other things that are relevant to shipping. So in the 1840s, he published a very important book called The Wind and Current Chart of the North Atlantic by compiling these thousands of ships' logs into something that was useful. And so this is a great example of big data, because it's finding value in data that is big that is not there in the smaller individual bits of the data. It was only valuable because he took a thousand instances of these records and combined them together to get what you see here, which is a remarkably accurate chart of the currents around what is now Florida. It wasn't much later than that that machine computation came into the picture. In the 1890 United States census was the first use of punch cards of the sort that dominated computing through the middle of the 20th century. The company that provided the punch card machines to the 1890 census became IBM later on and continued to pioneer the systems. This was an enormous success. The Census Bureau was overwhelmed by data after the 1890 election. It took 1880 census. After the 1880 census, it took years for the Census Bureau to tabulate the results and get summary statistics. In 1890, they used punch cards, automated the process, and finished on time, actually ahead of time, and well under budget. And this is remarkably similar work to what any of us would do today. You're taking individual census return records and tabulating them into summary statistics, like you would do in R now, if you take a big data set and you ask for a summary of some variable. Still more than 50 years ago, the modern credit score became available. In the United States, the Fair Isaac Corporation, FICO, invented the credit score that now dictates whether or not we may have a loan or have a credit card or have a house or a car, or actually how much our car insurance will cost if we don't have one of the things that plugs into the car. This is another classic big data example. FICO buys consumer credit information from all the banks, whether you've paid your bills on time, how much credit you have outstanding, and so forth, and applies algorithms that predict your likelihood of defaulting on debt in the future. And then it sells that prediction back to the banks from which it bought the data at a markup. So it's taking enormous amounts of data. It's again combining them to be more useful than they were alone because if one bank has given you a credit card, that's all they really know about you, is that you have a certain amount of credit on the credit card. They don't know if you have a mortgage or something somewhere else. So when FICO takes their data and combines it together, it becomes more valuable than it was in the single instance. And FICO has been using more and more sophisticated methods since then, but it's basically the same system now. And speaking of its value of big data to finance, Bloomberg LP, which creates the Bloomberg terminal you see here, and it's at the foundation of the financial industry, was founded a remarkably long time ago in 1981, well before the era of what we talk about as big data now. Michael Bloomberg, the mayor of New York who founded this company, is now worth about $25 billion, but even 10 years ago he was worth $5 billion, which is well before anyone ever mentioned the term big data. Bloomberg carries 11,200 data sources in its terminal that again it buys from a lot of different sources, aggregates from here and there, processes on its own, special indices and so forth. And also, more than 10 years ago, this was introduced in the United States CVS is a national chain of pharmacies, and the extra care card is a small card that you keep on your key chain. It has a barcode on one side. When you make a purchase, you scan it at the register so that CVS knows who you are and they can target you more accurately. And it's remarkable that people have gone in for this because you're doing nothing but handing your personal data and purchasing records to the corporation, but they give you a slight discount on things when you use it, and now 66 million people have CVS extra care accounts. They have a very sophisticated way of tracking purchases and targeting future purchases with them. And this system has been around now for more than 10 years. The cost of setting something like this up in 2001 was staggering given the cost of computation and storage then and the amount of data entailed, and today this kind of system is available to pretty much any mid-size or large retailer and plenty of smaller ones as well. So since data has been around so long, big data in the form that we see it today, why is it that we're talking about it today? One reason is that it's become much cheaper to deal with big data. You've probably all seen some form of this graph, and this one is from Wikipedia so you can find it yourselves. In 1965, Gordon Moore, who founded Intel, predicted that the number of transistors on a single processor would double about every two years. It's called Moore's Law today. And indeed it's held true. You see that there's a logarithmic scale on this graph. And generally speaking, processors have gotten twice as powerful, twice as many transistors on the processor, twice as fast every two years since the 1960s, and it continues today. A corollary to this is that the price of computing goes down roughly at halves every two years. So the computers of today are vastly more powerful than the computers of two years ago and unbelievably more powerful than the computers of 10 years ago. And they're unbelievably cheaper. If you were to buy a computer of the power of a desktop computer 10 years ago, it would be the cheapest mobile phone that costs just 100 euros or so. On top of this, we have virtualization, which has added another level of commodity pressure onto the cost of computing. In 2009, an hour of time on an Amazon EC2 Linux machine, small Linux machine, cost 10 cents every hour, and this year now it costs 6.5 cents every hour. So the price of computing, even on this one service, has decreased 35% in the last three years, which is a remarkable change. You can take, so it was remarkably cheap to begin with in 2009 to pay 10 cents for an hour of computing on a reasonably powerful computer, and now it's vastly cheaper. So if you have big data, why not run it? You can't afford not to. It's so cheap to deal with big data now. This was all further made possible by MapReduce, which was a paper published by a couple of researchers at Google that takes big problems and distributes them across lots of computers. You'll hear lots of discussion today about what is the best way to handle these sorts of problems and some people like MapReduce and some people don't, but this had an undoubted effect on the price of all of this and the adoption of big data and the subsequent adoption of MapReduce into Hadoop, which is an open source version of it, based on some work at Yahoo. Big data has also been enabled by the web, which gives us unbelievable quantities of real-time data on human behavior. Everything and any good well-managed website like Zappos that you see here, which sells shoes, knows to measure absolutely everything that comes in from their website. They know if a button is more effective, if it's light blue or if it's dark blue, they know if the button is more effective, if it's four pixels over to the right, they relentlessly test absolutely everything and very recently this data has become real-time as well. This is a screenshot of a service called Chartbeat that gives you real-time patterns of who is on your website right now. This is a sample from a website that sells e-cards and you can see that there are 2,774 visitors on the website at this moment and it shows you where they're scrolling on the pages, what they're clicking on, where they're coming from. It's become very popular in newsrooms and retailers because it lets you see if something is becoming hot and then move it up very quickly and try to capture some of that energy. I should disclose that O'Reilly, my company, is an investor in Chartbeat, but I've been using it since before I came to O'Reilly so I feel okay telling you about it. And on top of the web, that's just the web that gives us all of this data. We have the social web as well, an entirely different level of data. We are handing over, all of us who have a Facebook account, are voluntarily handing over enormous amounts of personal data about ourselves. Who our friends are, whom we communicate with, what sorts of things we like, where we live, where we used to live, where we went to school, all these things that are extremely valuable to businesses and businesses have responded by trying to get a lot of that data out of Facebook. And so you see Pepsi has a Facebook page with over nine million people who like Pepsi and every one of those nine million people is handing over to Pepsi a little bit of data about themselves that Pepsi then tries to use to get some sort of insight into its customers. And a lot of companies go even further and have applications that scrape much more of your Facebook profile after you opt into it. But this level of communication, whereas a company you can listen in real-time to hear what your customers are saying about you is absolutely unprecedented. So with this definition of the big data mindset in mind, I'll give you a couple of examples of places that have applied the mindset for some time and done it very effectively with quantifiable results. The first is Los Angeles, which you see here in a beautiful rendering. Those of us on the east coast of the United States don't think Los Angeles is quite this beautiful. It's only in the rendering that it's beautiful. New York is much nicer. In Los Angeles is famous for its freeways, of course, which looks something like this. They're vast rivers of traffic. They are built in the last 50 years. They're often clogged with terrible traffic jams. But really the backbone of Los Angeles's transportation is the surface streets that intersect at traffic lights. Los Angeles, the city of Los Angeles, has 10,000 kilometers of surface streets. And these surface streets, the 10,000 kilometers, intersect at 4,300 traffic signals throughout the city. Many of them enormous, like this one, which is at Sepulveda Boulevard and Venice Boulevard on the west side of LA. This intersection gets more cars passing through it than a typical expressway interchange. About 73,000 cars a day go through this traffic light. And the scale of traffic generally on these surface streets is enormous. So every one of these signals, every one of these 4,300 traffic lights is networked into a gigantic city-wide signal control system that can control the traffic lights remotely. And at every one of these intersections is a handful of sensors. You can see it here. This is how the system sees the same intersection, Sepulveda and Venice. With all the lanes that go in the different directions, left turn and right turn lanes, the green arrows mean that at this moment, when I had them take the screenshot, the north and south traffic had green lights, the east and west traffic had red lights. The circles that you see around the outside represent sensors embedded in the roadways. These are wires that capture accounts of cars going over them through inductance. And they register vehicle speed so they know if there's a traffic jam, they register the number of cars passing over them. And they know how long people have been waiting at the light, whether people are getting stuck in the middle of the intersection and so forth. There are 18,000 of these magnetic sensors altogether in Los Angeles. So there are a lot at every intersection, as you see here. And every second, every one of the 18,000 sensors sends this data in real time to a system, to a computer that's under an emergency bunker under the city hall of Los Angeles. It gives you some indication of how important this is to LA that it's in the emergency bunker underneath everything. Even in a nuclear war, the traffic lights will be safe. And every second, after the system receives readings of the 18,000 sensors in the city, it sends back out adjustments in real time to all of the 4,300 traffic signals in the city. So every second, readings come in from every traffic intersection and then adjustments go out back to every traffic light. They can adjust a traffic light by as little as, you know, traffic in this direction is very heavy, so we need to lengthen this green light for 30 seconds and then go back to the regular cycle. It can make permanent adjustments. It's very sophisticated. The result is that when the system goes into a neighborhood, when they add the system to a square kilometer of blocks, it decreases travel times by 15%, and motorists make 20% to 30% fewer stops at traffic lights. It's vastly more likely that you'll be able to go through a green light once the smart system has been installed. And the improvement costs about $150,000 per intersection, which is a remarkably low amount of money compared to the cost of reconfiguring an intersection, adding an extra lane or putting in a, you know, replacing a left turn lane with a different kind of lane. And in a city like Los Angeles, which is very dense, there is no appetite at all for things like adding capacity to streets. So no one wants to widen streets. Los Angeles, Los Angelinos think there are too many streets and they want no more. So this is the only way to add capacity is to make the streets smarter. And they do it by encompassing all of the data in the city. It's only by capturing a whole system of traffic lights that they're able to get a result like this. So it's what I said earlier about encompassing, right? You can add a smart system to a single traffic signal and it'll do okay. But what this system does is it understands the impact of lengthening a traffic light four kilometers away on the traffic over here. And it can keep, it can minimize travel time in the larger, in the entire, in the entirety of the Los Angeles street network. So every LA has started putting in this system for the 1984 Olympics. So again, it's been around for a very long time. This notion of big data and optimization, but it has all the hallmarks of a modern big data approach. It's automated and adaptive. Every time it adjusts a traffic light, it measures the impact of that adjustment everywhere else in the city. And then once it understands that impact, it can use that technique again to reduce traffic in similar situations. So that makes optimization possible on the scale of more than 4,000 traffic lights. And in increments as fine as lengthening a green light by a second or two. The impact is huge of this kind of system-wide intelligence. It's very much a sort of big data. So LA has been using this to deal with traffic jams for 30 years. The next example has been an issue for 70 years now. This is the Grand Coulee Dam. It's in Washington state. It's the biggest dam in North America. And the statistics are sort of staggering. It was open in 1942 and it's nearly two kilometers long. It's 170 meters high, which is higher than the Great Pyramids at Giza. It impounds a lake, a reservoir behind it, of 324 square kilometers. And the power plant inside it is the biggest power plant in North America too. It's a staggering scale. It has 33 turbines, 33 generators, which running at peak capacity would generate more than 7 gigawatts of electricity, which is enormous. That's as much as New York City consumes on a typical afternoon. And it's also enough to power 23 Googles, which perhaps says more about the scale of Google than about the scale of the Grand Coulee Dam. And operating the Grand Coulee Dam is phenomenally complex. It sits in the middle of the Columbia River, which is not a particularly long river, but it's a very voluminous river. It has the biggest volume of any river in the American West. It comes down from the Rockies in Canada, British Columbia in Alberta, and then through Washington State in Oregon. The Columbia drains an area the size of France and passes 250 cubic kilometers of water to the Pacific every year, which is a difficult measure to think about, but it's the only way to characterize an amount of water quite so vast. Every year, the river's volume fluctuates by a factor of 5 between the spring runoff season and the dry season. So every spring it increases to five times the volume of the previous winter. And the dam has to be operated in order to keep the reservoir low enough to absorb flooding, but also high enough that if there's a drought in the summertime, the farmers who are irrigated by the water won't go dry. So it's a classic optimization problem, right? You want to take a statistical prediction of how much rain you'll get, how much water you'll get going into the dam, and then figure out what the operating protocol with the least risk is. The dam also has a lot more constraints though. It has to conform to environmental regulations. Salmon are enormously important to the economy of the Northwestern United States, and the salmon survive better under certain circumstances through the dam. They survive better when they go through the dam through a turbine, through a generator, than they do when they go over the top of the dam through the spillway that you see in the middle, which is remarkable. The generator turns out not to shred them. Something like 98% of the fish survive a trip through the generator. And also the spillway adds gases to the water and things like that that are bad for the fish as well. So generally the operator of the dam prefers to run it through the turbines, but that causes problem when they have to drain water through the dam, and there's not enough electricity demand in the electrical grid to accept the electricity that the turbines would produce. So they need to optimize for that as well. If that sounds like a big job, and it is at this dam, consider that the Grand Coulee Dam is just one of 27 dams on the Columbia River and it's biggest tributary, which is called the Snake River. And there are altogether about 400 dams owned by a lot of different entities in the region that the Columbia River drains. Below the Grand Coulee Dam on the Columbia, a lot of the dams are called Run of the River Dam. They have no reservoir behind them. They're designed merely to pass through water on the river as they receive it. Like this one, which is called the Bonneville Dam. It's near Portland, Oregon. It's the last dam on the Columbia River before it gets to the Pacific. You can see behind it that it has no reservoir. When water comes down, they have to pass it through immediately. If they were to close the gates on the Bonneville Dam during spring run-off season, it could take as little as six hours for water to rise over the top of the dam, which would be a catastrophe. And so you have to run the entire river correctly in order to anticipate when the water that you're releasing from the Grand Coulee Dam is going to make it down to the Bonneville Dam and when the Bonneville Dam needs the water in order to generate electricity for its customers. So here is a video prepared by the Bonneville Power Administration, which operates some of the power plants. Oops, excuse me. And what you see here is a schematic of the river. The river is flowing from right to left, and every one of the marks is a dam on the river and also on the Snake River below it. You'll see that the river, the dam on the farthest right, which is the Grand Coulee Dam, starts to release water in the morning as electricity is needed in the western United States. Every morning the Columbia River fluctuates by a factor of about four. So at four o'clock in the morning, very little water is passing through. The volume of water over the next couple of hours will quadruple in the Columbia River below the Grand Coulee Dam. And you'll see as this bulge of water is passed from dam to dam. So at five o'clock in the morning, the gates start to open on the Grand Coulee Dam and the Chief Joseph Dam just below it. And now the bulge of water, this huge cascade of water that's coming from the first dam makes it to the second dam, which has to put even more water out. And then the bulge of water from the first dam makes it down to the third dam, which has to pass it through. So you have to time these bulges of water very precisely so that the dams never have to spill water over the spillway that they don't want to spill over the spillway. And so that you don't overload the electrical grid and you provide enough water to generate the electricity that's needed as well. You don't want to under load the electrical grid. So how do they do it? They treat it as a big data problem and it starts with a lot of data collection from a lot of sensors. It starts with a weather forecast. And the guy who runs the weather forecasts for the dams told me I can never have too much data, which is the mantra of big data. Give me more data, please. And what he's done with the data is make very accurate predictions. He takes data from 2,200 precipitation gauges across the northwest and in Canada, as well as 1,800 river gauges that tell you how much water is running through a river right now and is able to create what they call a checkbook accounting. So they know that when water comes down as rainfall, they know when it comes down as rainfall, and then they know when the same water goes out through the rivers below the mountains. And if it hasn't gone out through the rivers yet, they know that it's still there as snowfall or it's gone into some other usage. And they can forecast what is left as snow in the mountains and what will come down in the springtime. Every day, hydrologists take this data and create forecasts for river flows at 212 different points on the rivers and streams in the basin. It increments as small as six hours. And then pass an operating plan on to the Army Corps of Engineers, which knows how to operate the dams and coordinates the operation of the dams. They use the software called HEC ResSim, which is free software. You can download it now and start modeling any of your favorite river basins. And they run thousands of historical patterns against the pattern of the river. So they come up with a plan for operating the dams and they take the spring rainfall of 1966 and the dry summer of 1972 and the winter of 1986. They put them all together and figure out what the probability of flooding or low water or anything like that is by running tens of thousands of these combinations. And then they come up with a probabilistic model for operating the dams. On top of that, they model power consumption, which you see here. The dams are producing electricity given in the blue line at the top. Demand is given by the red line. They're exporting a lot of electricity to California, which is why production is higher than demand. And at the bottom you see wind power in the same area, the green line, which is highly variable. So the operators of the dams have sensors in all of the wind turbines in the area that measure output from these turbines every second and direction of the wind and so forth and send it back to the central command center, which then models the output from the turbines and predicts how much electricity will be needed from the dams. You can see, for instance, that in the last day on this graph, wind production spiked and then it fell down and so they closed the dams and then opened the dams very quickly as wind production was falling. So the quantified part. In 1966, sorry, in 1964, an agreement between all the owners of all of these dams in the Northwest came into place that said we will operate all of these dams together and as one single system with one set of commands for output and so forth and coordinate our operations. And the result of this agreement is that the system as a whole, operated together, produces as much more electricity as is produced by the Bonneville Dam. About a gigawatt of electricity is produced additionally to what was produced before the agreement because of the coordination of the system. So what comes up in the future with big data, now that you have an idea of the mindset and how it's applied, we think there will be better integration between machines. For instance, we'll all have these Google cars that drive themselves and not only that, but when you want to have a sushi rather than opening your phone asking where there's a good sushi place, finding directions to it and then putting them into your car's navigation system and driving there, you just tell your car, I want good sushi and it will drive you there. So better integration between machines, fewer interfaces with humans. You see the same thing happening in the F-35 which is a forthcoming fighter jet. It measures its performance the entire time that it's flying and before it even lands, it radios back to the Navy and orders replacement parts automatically for anything that is wearing out during the flight. All of this is enabled by the predictive power of big data and by being able to understand humans and other machines. We also see better integration with humans coming through things like natural language processing where Google has such an enormous store of written language that it's able to understand when you write big data and misspell it that you meant big data spelled the other way and with the much more sophisticated problem of machine translation, for instance, from English into Arabic like this, enabled by having the entire internet to use as a corpus for natural language processing. And the other way around too, we can use big data to write human readable text out of big data. This is from a company called Narrative Science which publishes a blog on Forbes that is written entirely by computers, earnings announcements and so forth. They teach the computers what a Forbes article looks like by reading in lots of Forbes articles and then they produce new articles with streams of financial data. It's not a template, it's a computer that's been taught what a Forbes article is, what the pattern of a Forbes article is and that then adapts it every time it gets new data. Sensors everywhere will help enable this. Think of what your phone knows about you and how it interacts with you. It knows when you go to bed at night because you use it 10 minutes before you go to bed. It knows when you wake up in the morning because you use it first thing when you wake up. It knows if you're waking up in the middle of the night because probably you use your phone. It knows where you are and how fast you've gotten there and whom you're communicating with. So it knows if you've walked somewhere or taken the metro. It sounds sinister but think of what it means for something like healthcare where your phone says, it looks like you haven't been sleeping well and you're walking slower than usual and you haven't been answering the phone lately. So maybe you should see a doctor. Here's a good one. Or maybe it'll tell your Google car to take you to Starbucks perhaps and have a coffee. You seem tired. So I will leave you with this final graph which is a little opaque at first. This is the Netflix, the American video rental service, online video rental service has a sophisticated recommendation engine where you rate movies you've seen and then it suggests new movies that you would like given your ratings on other movies. And in 2006 they held a contest. They said we'll give a million dollars to anyone who can most improve on our algorithm for recommending new movies to people. The contest ran for three years and over the course of the contest people submitted their work as they went. So they maintained a leaderboard and knew who was ahead at that moment. This is a graph of the progress of the leader in the Netflix prize over the course of the prize. The winner of the prize improved on Netflix's algorithm by about 10%. But as you can see the first guy who submitted a response at all got 40% of the way there. So this is what you'll see in a lot of big data applications. The first time you apply it to something that's really new you'll get a lot out of it and it looks dreary at the right side. It looks like why would I even try up then. But the exciting part is the left side. And I think that's an invitation to bring big data to all sorts of places where it has never been before. Thanks very much.