 All right, so welcome everyone to the first distinguished lecture for the year. I'm very happy to introduce Elisabeth Zoska from the University of Washington, which is a great place for people to go to school. It's a former chair- I didn't go to school there. Yeah, well, then Brown's a good place. It's a former chair of the department and has produced a fair share of, I guess, two more than 20 PhD students at this time. His traditional academic specialty is in networking and operating systems, but more recently he's become into eScience and is the founding director of something called the eScience Institute at the University of Washington, which does a lot of data-intensive science work and so on, including work in this very slick project, which he may tell us about like instrumenting the seafloor with all these cables and so on. All right, I hope you mentioned this because it's a good one. He's also the chair of the computing community consortium, is that right? Which gives him kind of an unusual, very high-level view of the field, which I hope he'll share with us today. Also, he has the virtue of enabling a reception on the third floor after this talk. So with that, thanks very much. Great. Thanks, Mike. Well, thanks. It's great to be back here with so many friends. This is going to be an extremely high-level talk. So those of you who came here for technical detail, the reception starts in an hour. All right, so it's going to sound a bit like I'm channeling pharnum, and I hope you'll forgive me for that. What I'm going to do is talk to you a little bit about what the field has done and then I'm going to talk to you a bit about this terribly named outfit called the Computing Community Consortium that Mike mentioned and what we're up to. And then I'll talk a little bit about sort of my view of the direction we're headed, which hopefully will be interesting. So about 40 years ago in 1969, now 43 years ago, I claimed there were four great things that happened. What can you think of exciting that happened in 1969? I realize most of you weren't born then, but you read Wikipedia. So what happened in 69? Huh? Moonshot. Right, okay, moonshot is one for sure, a phenomenal engineering accomplishment. What else? Woodstock. Woodstock, yeah, we got the moonshot here. But you got the flowers before. Yeah, well, it's 1969, right? And we got woodstock. What do we do without the web? Is this the greatest, okay? What else happened in 69? The Mets won the World Series. It might have happened once since then. And the fourth big thing that happened in 69? The first packet traveled over ARPANET. All right, so ARPANET was a four-node network, right? And this packet was sent by Charlie Klein. You can see the CSK in the right-hand side of that log who was Len Kleinrock's programmer at UCLA. It sent a packet to SRI. It was then called the Stanford Research Institute. And remember that back then computer networks were for remote login for the sharing of expensive resources. They weren't for like, you know, shopping and pornography and stuff like that. And so Charlie's packet was attempting to log into a PDP-10, I believe it was, at SRI. And so the first packet that went over the ARPANET contained the character L and the character O, and then it crashed, all right? So some things never changed. And now the question is, with 40 years of hindsight, which of these things had the biggest impact? And my claim is unless you're into Tang and Velcro, the big contributions of the space program, it's really clear that the internet was the big one. Nobody remembers anything that happened at Woodstock and the Mets are of no consequence at least this season, right? And the reason is we're hooked to these exponentials and I don't have to tell you this story. You know, we spent 100 years with mechanical calculators and then we spent 10 years with the vacuum tube machines and then we've had a pretty good 50-year run with transistorized machine. The transistor was invented in 47. It was pretty big in its original incarnation. Integrated circuits 10 years after that, unbelievably clumsy. The Moore's Law that you're really familiar with. Here's the cool impact of Moore's Law. The computational power of a 1950s room-sized mainframe is literally embodied in an electronic greeting card these days, all right? It has the same number of millions of instructions per second capability and we talked about the moonshot. The amount of computational power that got Apollo 11 to the surface of the moon is now in a furby, right? So this is not like the highest and best use of this capability but nonetheless it's pretty remarkable that this stupid little thing has as much computing as got us to the moon and, you know, the internet is on the same exponentials in the past 20 years or so the number of transistors has gone up by a factor of 2,000 per unit area and the number of internet hosts is up by a factor of 2,000 as well. Those are big numbers that really change the world. And the way I think about this internet thing is Seattle used to be a connected region because we had an airport and a port, you know, and train lines and now it's because we're hooked to the Pacific Rim by these internet cables and that's how commerce and everything else moves in and out of the region. Finally, it's important to realize when you talk about Moore's law that software has in every way made as much progress as hardware and some of it's pretty visible. You know, I remember Deep Blue in 1997, again, a huge accomplishment but a couple of years later for 20 bucks you could buy a piece of software that had a higher chess federation rating than Deep Blue and ran on your silly little PC. Similarly, if you think about Watson just last year that the little known story about Watson is that shortly after they did this one on TV, they brought it to Capitol Hill and Rush Holt, who's one of the last physicists in Congress actually beat Watson, all right? It was not a full game. By the way, the Republican on the left lost miserably, okay? So Watson killed him but Rush actually trumped. And you know, there are in more detail about software. I've seen some wonderful charts, for example, that talk about a set of numerical problems, either numerical optimization or numerical linear algebra, where if you graph the progress due to hardware gains and the progress due to algorithm improvements over a 50-year period, the progress due to algorithmic improvements dominates the progress due to architecture, all right? So it's not like the architecture doesn't matter, it does matter hugely, but it's the combination of those things that give us the power in things like optimization that we have today, okay? So that's sort of going back a long way. There was this interesting story a year or two ago in the New York Times in which a set of guys at the Wharton School were asked to identify the innovations that were the most sort of world shattering in the past 30 years. And why do you pick the past 30 years? If you go back further than that, you're competing with like the wheel and fire and things like that that are kind of hard to pick off, okay? So this is life changers, the top innovations of the last 30 years, according to people from the Wharton School. What do they know? Well, at least they're not computer scientists, which is important because what they said was of the top 20, like half of them are just hardcore computer science. And most of the other half are sort of half or one-quarter hours, all right? So, you know, you wouldn't have picked these. I mean, media file compressions, social networking. I mean, maybe yes, maybe no. But you ask a bunch of business school profs what has changed the world in the past 30 years and the list they come up with is all stuff that our field has done, which is pretty great. So a couple of years ago, three years ago now, like lots of other people, I was asked to look back at the past 10 years and say what had really, what have you done for me lately, all right? So that's the question. And this is the list I gave them unmodified except for type font from a couple of years ago. It was search, scalability, digital media, mobility, e-commerce, the cloud, and crowdsourcing as things that I thought were totally different in 2010 from what they'd been 10 years before. And just to talk about a couple of these, the way in which we build scalable systems is totally different now. A dozen years ago, Jeff Bezos was in all of the ads for DEC Compact HP Alpha servers. Of course, most of those companies are dead or dying now, all right? But the important point is that, first of all, Jeff had hair back then, but Amazon's scalability was limited by the largest reliable symmetric multiprocessor that could be built, all right? And that's how they accommodated their user growth and their catalog growth. What they could do if they ran out of capacity was move the entire catalog to a clone and use some sort of random routing to send half the users to this system and half the users to that system. But that's a completely laughable way today to think about building a web-scale system. And it's important that the progress in making reliable systems out of unreliable dirt-cheap pieces of hardware, this is entirely due to fundamental work in computer systems and algorithms that goes back 25 years, right? So now what we're doing if you're Amazon or Microsoft or Google or any of these people is taking dirt-cheap components, sticking them in racks, hundreds of thousands of disk drives per data center, counting on failure and counting on algorithms to wallpaper over the failure. And that's how we computed this scale. But just a little more than a decade ago, we had a totally antithetical approach to building these scalable systems. Digital media, it's obvious. Again, most of you, the younger ones have grown up with it. But the way in which we do text and audio and images and video, whether we're creating or editing or consuming, it is totally different than not so long ago. I mean, I'm behind the times. I finally got rid of our CD collection a while ago. And it's so much more convenient sitting there on a Macintosh. But the idea of film, I mean, God help us, right? It's just something nobody even thinks about anymore. And again, that's a total transformation in the past 10 years and finally mobility. This is a great graphic from Intel. These things are on the web. I'll give you the URL later. But it's what happens in an internet minute. This showed up a couple of months ago and an ad of theirs. And it's astonishing. 1,300 new mobile users in a minute. 47,000 app downloads, 204 million email sent. You know, it is utterly astonishing what people are doing these days with the stuff we create. And now, how did this happen? There are three studies over the past 15 years that have talked about this all from the National Academies. There was one in 1995 that I actually was privileged enough to work on but came up with this thing we call the Tire Tracks Diagram because it looks like somebody drove across the page. And here's what this shows, okay? These are years down here. Think of these as billion-dollar market segments of IT over here. And what the lines do is to show academic research, industry R&D, product introduction, and billion-dollar industry emergence and people moving back and forth. And this study was redone in 2003 which added a bunch more. It's hard to predict what these are going to be but it's easy to retrospectively look at them. And what I want to bring to your attention is the report was just redone and released a month ago. Peter Lee from Microsoft Research chaired the committee that did this. And it's a slightly different way of looking at it but this is really worth taking a look at. It shows how fundamental innovations in our field lead to these things that change people's lives. And the lessons from these reports are all the same. First of all, all of these market segments show the clear evidence of federally supported university-based research. And it's really easy for people to look at the enormous R&D that industry puts into our field and say what role could the federal government possibly play. And that's to the last point. Even Microsoft, my neighbor, which is one of the few companies that makes investments, significant investments looking more than one product cycle out, it's only 5% of their R&D budget. So the other 95% is people doing the next version of office. And that's important work, but it's not putting ideas in the larder for the future. And if you look at what Cisco or Salesforce or any of these other folks are investing, looking more than one product cycle out, it's dead nothing. Secondly, there's nothing linear. People move back and forth. Ideas move back and forth. Third, the things you didn't think of are often as important or more important than the things you do think of. And fourth, having lots of different research themes being pursued simultaneously allows them to later be combined in ways that you didn't anticipate. So this track record is really clear and important for you to understand it and to communicate it to others because this is not something that can be managed particularly well. So that brings me to this computing community consortium. And let me just quickly tell you what we do. We're part of the Computing Research Association. We're funded by the National Science Foundation. And the goal is to help our community envision even higher impact research that touches more lives. Susan Graham and I co-chair it. There's a council which I won't spend time on. It rotates pretty regularly. It's a great group of people. And we have a whole bunch of activities. Instead of belaboring them, let me give you a couple of examples. So for example, recently, there was a National Robotics Initiative rolled out. And the genesis for that was an exercise that we oversaw run largely by Henry Christiansen at Georgia Tech with help from folks at Carnegie Mellon and Penn and a number of people from the robotics research community. And what they said was, despite the apparent successes of robotics, the autonomous vehicles and things like that, there really is not a roadmap for the robot science that needs to be done in order to enable the next generation of stuff. And we'd like to work on that roadmap. So they had a set of meetings which the Computing Community Consortium funded and sort of oversaw. A lot of discussion with agency heads, the Office of Science and Technology Policy and the White House as a result of this and liking the roadmap a lot, directed a whole bunch of federal agencies that they had to have a robotics story in their FY12 budget. And this National Robotics Initiative was rolled out in the summer of 2011. And it had real money for new robotics research. So beginning from this vision exercise that the robotics community proposed to us, Henrik, Sebastian, Theron, a set of others came to us. We managed to drive this forward with, gosh knows, an enormous amount of help from federal agencies in the White House and get real research funding for robotics. Similarly, in big data, and Farnum's been instrumental in this, Randy Bryant and Randy Katz and I did some position papers for the Obama transition team in 2008 on the importance of large-scale data analysis. We ran some workshops that coalesced a community around the academic use of Hadoop. Tom Khalil from OSTP came to us in 2010 and said, you know, these other agencies you're starting to listen and people in the Department of Health and Human Services and people in the Department of Transportation are beginning to realize this is important. So produce a set of white papers for me that speak to the heads of these individual agencies and tell them why this is important to achieving their mission. It's not just a research curiosity. And lo and behold, we now have a big data initiative which is going to put a lot of money into our field fundamentally to drive this forward. We had a study in computer architecture recently. Tom was involved in that from here and again, Farnum, there were a set of workshops and eventually a really good paper on 21st century computer architecture which I think will inform programs pretty significantly. Again, Mark Hill led the latter exercise, Joseph Torellos and Mark Oskine in the middle. Susan Graham and I and a set of other people from the CCC Council did this government report looking at the overall $4 billion R&D program. We had an exercise. I'm not sure if anyone from Michigan participated in this a year ago that brought about 35 mid-career faculty member to Washington D.C. to understand how the sausage is made, how science policy is formulated. So the goal is that these people will be better at interacting with policy makers in the future. Finally, we ran this symposium a half year ago in Washington D.C. looking at the 20-year history of the federal multi-agency coordinated R&D program. The funniest part of this was Vince Surf and Al Gore both talking about how the internet was created. They turned out to be pretty good friends, but it was great having them. One of the interesting things that Gore said is that we predicted in the early days of the internet that it would sort of change political discourse because we would each find our own personal echo chamber on the internet, right? And what Gore pointed out is that this has happened, but it's happened through the bandwidth that we provided the cable industry, all right? 85% of the campaign dollar is still spent on 30-second television ads. That's how America gets its information, even though it's not how you and I get our information, all right? And so the truth is whether you like cooking or dogs or left-wingers or right-wingers, you can find your personal echo chamber, but it really hasn't been the internet that's created it, okay? And we need your participation and we welcome it. So there are lots of ways you can do this. Okay? So let me now look a little forward. And as part of that 2010 article, I had to tell a story as well. Here's Butler-Lampson's version of the story, and that is that we're transitioning from simulation to communication to embodiment. So Butler's view is these are the 40-year transitions of the computing field. In the beginning, we were computing, all right? And then we were communicating in various ways, and now we're making a transition to what he calls embodiment, all right? So that is computers interacting with the physical world. My version of it is pretty similar. It's that over this decade and beyond, we're going to be the people who put the smarts into everything, all right? And I really think most of this smarts is embodiment. So let me just describe to you what I mean by a number of these and almost none of them are what I personally work on. This is standing here wishing I was in some other research area, all right? But hopefully it will give you some ideas. So let me talk about smart homes for a sec. This is a fantastic guy at the University of Washington, Shwetak Patel, who is instrumenting homes to monitor their energy consumption. And he does this, you know, using machine learning, the secret sauce for all of computer science. And here's what he's done. He's got this cool little single point of attachment device. You plug it in one place in your house, any wall outlet, all right? And it uses the noise on the power line. When a switch flips, it puts a spike on the line. When a compact fluorescent light is running, it modulates the line. When a Panasonic 50-inch flat panel is running, it puts a signal on the line and it differentiates those signals, all right? And so it can tell exactly which devices are running at any period of time, beginning from a web database of those signatures and then using some amount of supervised and unsupervised local machine learning to clean it up, which then goes back in the database. If you have two Panasonic 50-inch TVs because you're a rich person, they're going to be different wire run lengths from where you chose to plug this device in. And that's going to change the signal so you can tell them apart. There's obviously an ammeter that's part of this as well, all right? There are other examples of this like the Nest thermostat and things like that. But the cool thing here is this intelligent home that actually helps you in interesting ways is not as far off as we think. We've been yapping about this for years. But these sorts of devices are remarkable. Schwedig does the same thing for water and for power. For water, here's what it is. It's a little diaphragm with a little generator behind it and you screw it onto a hose bib that you're not using, like an outside hose bib and you turn the faucet on. So now the whole water system of your house is pressing against this, right? And when you flush the toilet, it goes wubba, wubba, wubba, wubba, right? Because there's this hammer that goes through the water system in your house and the hot water and cold water systems are connected through the hot water heater. So it's one big system, right? So again, you can differentiate these signals from the different water-consuming devices in your home. And that plus a flow meter tells you exactly which device is consuming how much water. For gas, it's an acoustic meter that sits strapped to the regulator on your gas meter. If you walk by a restaurant at night, you'll hear the gas meter singing and your home gas meter does the same thing. Schwedig had just moved into a new home when he started doing this work and you have to get ground truth data so the grad students were allowed to completely instrument his home with this stuff and having bought a new home, he and his wife had destroyed the kitchen and moved in one of those expensive wolf gas stoves, which can boil water in 20 seconds or something like that. And it turned out that the gas stove consumes way more gas than his furnace, okay? By the way, electrically, guess what the single largest consumer of electric power in his house is? Wallboards. Sorry? Wallboards. Not wall, you know? Dry. Big blocks. Toaster. Ready for this? It's his set-top box, all right? His cable box. And the reason is these are badly designed computers that are on 24 hours a day running and they are not subject to energy star, okay? So 16% of his power in his home is consumed by his set-top box. Not astonishing, but who'd guess, right? Similarly, you buy a new refrigerator and it's a new modern energy star refrigerator and you move the old one down to the basement where you keep the extra beer, all right? And it's sitting there sucking up power and this tells you where it's all going. You know the smart car story. Sebastian Thrun has wonderful talks about this and what we're seeing now is a lot of this is available in more or less regular vehicles, right? You've got adaptive cruise control. You've got automatic stay-in-lane systems. You've got self-parking if you're nervy enough to try it. There are Google autonomous cars driving down the road. This is a reason not to drive in California or Nevada, I think. Here's what we're competing with, okay? We're competing and Sebastian makes this very clear with human beings who were terrible drivers before they started talking on cell phones and texting, all right? So the bar for being a decent driver is actually not that high, all right? The first few accidents these autonomous cars have are going to get lots of PR, but think about all the accidents that are taking place every day just here in lovely Ann Arbor. But even beyond that, there are a lot of other important things we can deal with. For example, you know, highway transportation uses more than 20% of all of our energy. Traffic congestion, when you're sitting there burning gas getting zero miles per gallon is a big deal. The elderly can't get around and public transit doesn't work. And when you build automobiles, I hate to say this in Michigan, there are enormous environmental and financial costs and computing has a lot to contribute to all of these things. So for example, sensor information can tell you where transit is and get it to your smartphone. You know, you can imagine a multimodal transit option selection. You can imagine putting zip car on steroids. That's just a logistics problem, right? But the average personal vehicle in the U.S. is used only 5% of the time. That's computed by looking at the number of hours in a year and the number of miles a typical car has driven, all right? So let's assume that we don't like mass transit. We're still going to drive single occupancy vehicles if we could figure out how to share those cars and drive the average utilization just up from 5% to 10%, which doesn't seem like an insurmountable goal, all right? We would dramatically reduce the environmental and financial costs of building more cars because we need half as much. Similarly, from Google Overflight, you can determine that the average utilization of roads is only about 5%. In fact, even at peak non-super congested times, that is faster than stop and go, all right? So you could imagine lanes of the highway devoted to vehicles that had adaptive cruise control and stay in lane systems that would dramatically increase the density in those lanes and reduce the amount of planet we had to pave, all right? So there's a lot of stuff that you could do that's short of sort of turning your car over to the Jetsons. There are a huge number of things in health. If you haven't seen it, I really refer to you to a bunch of articles Larry Smarr has written in the past couple of years. Larry has become the fully instrumented person, all right? You know, what really happened was a number of years ago, Larry was running NCSA at the University of Illinois and he had a beard and was 20 pounds overweight and sort of slumped around. And then he moved to San Diego in 2001. And now he has a wine cellar and an exercise machine and a personal trainer and has dropped the 20 pounds and off the beard. And then he decided that he would fully instrument his body and he actually discovered a significant malady that he had that his docs had been unable to diagnose, right? But the truth is, here's the way I think about this. You bring any modern car into a mechanic for service these days and they jack a computer system under the dash and they read out your last six months of what's been going on and they find and fix the problem. And when you go in to see your doctor, she says, where does it hurt, right? And this is crazy, right? Why aren't you as well instrumented as your car? And in the coming few, you know, coming years, very soon you will be and partly it's going to happen through extremists like Larry, but also to be honest, the affluent jock industry, right? So any of you who are runners or bicyclists or anything like that, you know, you do tries, you've got all this instrumentation and that data is being pushed up to the cloud and that data can be used in really effective ways. So personalized health monitoring is one thing we're going to see tons of. Obviously, evidence-based medicine as this data gets available. What my colleague Lee Hood refers to as P4 medicine. This is this genotype-phenotype correlation. We'll get to big data in a minute, but this is the biggest of the big data problems, all right? It's how do you correlate people's personal health with their genomic history and what's going on with other folks? Smart robots. We've spent years with robots in structured environments bolted to machine floors like in the auto industry. Now, for better or for worse, we've got these things cruising around our home. They are not very effective vacuum cleaners. You should believe consumer reports, but we're selling millions of them a year. But Rodney's latest Rethink Robotics, this is the new name for what had been Heartland Robotics, is really industrial robots for unstructured environments. So that is robots that interact with changing environments around them as opposed to being bolted on the machine room floor and you can't get close to them without making sure that the power has been disconnected. So again, there's going to be incredible penetration this decade. The combination of health and robotics. We have an engineering research center at UW that Tom Daniel now runs on what's called sensory motor neural engineering. So think of this as prostheses directly coupled to the nervous system. So when Yoki Matsuoka was at CMU in Penn, she had this poor monkey named Quincy. Quincy is in the center right over there and Quincy had a prosthetic arm and an eight by eight bed of pins embedded in his skull and he could reach out and grab oranges and bananas and feed himself by thinking, right? So this again, it's a ways off in humans, but it's pretty remarkable and it's going to happen progressively. That's what this center is about. So here's what I mean by smart science and I'll spend a bit more time on this because this is how I've been spending my time in the past few years. This is sort of data driven discovery in all fields. It's not just science and I take the inspiration from this from Jim Gray and I say transforming science again because this sequence we've been through, Jim referred to this as the fourth paradigm. I think of it as maybe the fifth but science traditionally was theory and it was experiment and it was observation and they obviously played together. They reinforced one another. So observation suggested experiments to sort of assess theories. This is, I'll come back to this in a minute but I've been working on an oceanographic project the past few years and this is still today how oceanography is basically done, right? The ships are bigger, okay? But roughly you go out to someplace and you drop instruments in the water and you measure what's going on where you happen to be, right? And so for 30 years now and of course Dan Atkins here and others remain stays of this computational science has really added another leg to the stool to how we do discovery. This simulation brings us places we can't go or don't want to go. You know, the first 50 milliseconds after the Big Bang, nuclear stockpile stewardship stuff like that and what's happened more recently is the thing we call eScience that's data-driven discovery. Jim worked with Alex Zalay from Johns Hopkins and I know Alex was here for Dan Atkins event a couple of weeks ago on the data system for the Sloan Digital Sky Survey. The Apache Point Telescope that generated the data for the Sloan Digital Sky Survey over seven years generated 80 terabytes of raw image data, all right? So this was the granddaddy of data intensive science projects. It created the field of survey astronomy. It changed the politics, the sociology of astronomy which scientists actually shared their data with one another. The way astronomy used to be done is you're a poor astronomy grad student. You spend three years designing a new instrument. You bid for telescope time. You spend a few weeks on the top of a mountain freezing your tail off and you're going to be darned if you're going to give that data to somebody else because at this point that's four years and two miserable weeks of your life, all right? And you're the person who's going to use that data. The folks doing Sloan realized they were going to have more data than they could utilize so they made it public but through Jim Gray's work it also became available to school kids and school teachers and parents and amateur astronomers simultaneously and that's been incredibly powerful. So here's the next generation of these projects. Remember 80 terabytes over a seven-year period, okay? The new astronomy project, LSST, generates about a Sloan worth of data every two days, all right? And this is not just one more zero on a simulation. From simulations of the images you can really see the difference this is going to make in the astronomy. The Large Hadron Collider up and running generates again a Sloan every couple of days. These little gene sequencers generate a terabyte a day and labs have 25 to 100 of these things, all right? They're all over the bio community. I'm working on this thing called the Ocean Observatories Initiative which is trying to do for oceanography what survey astronomy did for astronomy and that is contribute an observational and sort of component to what had been an expeditionary science. I'll talk more about that in a minute. The web is a huge source of data. Point of terminals are huge sources of data and the goal here is the automated extraction of knowledge from this because they're just too darn much to look at. So what this does is drag all of computer science into all other fields of discoveries in ways that computational science, traditional computational science honestly didn't. From our field's point of view it was a niche and from the point of view of most major universities it was a niche. I don't mean that in a disparaging way. I just mean that you didn't have to have a high performance computing center to be a great research university. Michigan didn't really have one for example, right? You didn't have to have scientists, researchers in every department across campus doing high performance computing in order to be a leading research university. But the message here is that if across your campus people are not completely conversant with data driven discovery you're gonna fall off the leading edge. This is tightly married to the cloud. Mike's laughing because this is an undergrad of ours from 2003 who four years later got himself on the cover of business week for coming back to Seattle and working with a couple of us to teach the first sort of Google style computing course. There's a long story about this guy I won't tell but obviously Google and Microsoft and Amazon all have cloud stories here and that gives you the scalability you need. This is now a four year old graph from 2008. It's from Werner Vogels, the CTO of Amazon.com and what it shows is the Amazon machine image usage from a little company called Animoto. Does anybody here use Animoto? Right, so this is so dumb that even my kids don't do it, okay? So what you do is you give Animoto a bunch of JPEG images in an MP3 file and it produces a syncopated slide show, okay? And it takes like six minutes of PC computing to produce a one minute syncopated slide show because it's doing a pretty sophisticated acoustical and image analysis, right? And so now like most computing companies these days they don't have any computers, right? They use AWS and what this shows from April 2008 is how many Amazon machines they were using and this is in the 30 to 40 range, okay? And what happened on April 16th is they rolled out a Facebook version of their app and they went from 35 to 3,500 Amazon machine images, right? Now in truth, I don't believe they fell off a cliff. They still exist. What happened was Amazon stopped collecting data on them but if they had fallen off a cliff then I assert that this is exactly how science looks, right? Which is you are piddling along on modest amounts of data exploring new algorithms and then suddenly a conference deadline comes up, right? And Kawanga, right? For a period of a few weeks of long nights, right? And then you submit the paper and it's back to working on the next new algorithm with modest data sets for a long period of time, right? And just like no company can... What VC is going to buy you a 3500 computer data center on the off chance that you're successful, right? And one of the greatest things about AWS is not just that it can grow rapidly but it can shrink rapidly as well and that is important to scientific discovery as it is to companies like Animoto. This is another slide of Werner's. It's the tasting room in a Belgian brewery, right? Beer brewery and the tasting room is the room in which they used to generate their own electricity. And that big brass thing in the middle is part of the original generator. You now lean against it while you're drinking beer, right? And Werner's argument is the 10 years from now if you're running your own large scale university data center or corporate data center, it's going to make about as much sense as generating your own electricity which doesn't mean you shouldn't do it but it means that for lots of it you should rely on a utility that can do it more cost effectively than you and amortize it across larger numbers of people, right? So this doesn't mean you shouldn't have solar cells on your roof, you know? But you're probably not going to have your own nuclear plant, right? And so similarly for me, one of the sort of crying problems in both the National Science Foundation and universities is there are all these hidden subsidies that make this internet-based, cloud-based computing appear more expensive than it is. Like at a university Santa Claus pays for the power and Santa Claus pays for the cooling and the elves do the backup. And that creates this illusion that it's very expensive when in fact, sorry, that it's very inexpensive, that Amazon is expensive. When in fact, if you fully cost these things it's a pretty competitive way to compute, right? So as I said, this is going to be competitive, pervasive across campuses. If you're not at the forefront, you're going to not be competitive. The goal of the eScience Institute that I run at UW is to spread this technology around the campus. We did a study of top investigators across the campus to find out what their computing needs were and we were very careful in identifying these people. We picked the best junior faculty and the best senior faculty across a large range of departments in a pretty methodical way and to a person they said, we're drowning in data. Most of our data is sitting in Excel, all right? People are managing their data five or 10 years behind the time which means a couple decimal orders of magnitude behind the time and this is at all levels of the science pyramid. In fact, in the eScience Institute we don't focus on the top 5% because they can take care of themselves and we don't focus on the bottom 50% because we don't want to deal with them, all right? So we deal with people between the 51st and the 95th percentile, all right? Who are well-funded, highly successful researchers who are way behind the times and how they're managing their data and again, because of the exponential growth in data we found person after person after person who was spending enormous amounts of time in sort of data management rather than data exploration and science. The important point I want to make is that our provost didn't get this at all. Fortunately, she's now at the University of Illinois, all right? But I'm good for you and for us, okay? And why is that, okay? When a provost wants to know what kind of computing scientists need in the future she goes to people who label themselves computational scientists, right? It's mnemonic, right? And these are terrific scientists. They're doing QCD and all kinds of other great stuff but what they want is more subsidy for whatever they're currently doing, right? And it's not this large-scale data discovery, all right? So when you ask a broad range of top researchers for a broad range of field independent of how they label themselves what they say is we're drowning in data and we need help, all right? And that help can be as simple as getting off a spreadsheet and into a database system or getting off a local cluster into the cloud. It can be as sophisticated as using machine learning for discovery, right? And there's plenty of great computer science to be done in this. So let me talk about the oceanography work really quickly. This really is how it's done. It's an expeditionary science. And the goal of the Ocean Observatories Initiative is to deploy about 2,000 kilometers of fiber optic cable on the Wanda Fuca plate off the Pacific Coast and hang it with thousands of chemical and physical and biological sensors that are constantly streaming data back to the ocean research community as well as to school kids and parents. So that's the idea. It's to do for oceanography what has happened for astronomy through survey astronomy. It doesn't replace ships. It allows you to do a different type of work. This work was really the pioneer of this effort was a wonderful guy named John Delaney at the University of Washington who's an oceanographic geologist. And he hypothesized 20 years ago that anaerobic bacteria were expelled from vents in the seafloor when eruptions occurred. And he was laughed at for 10 years until it happened that a robotic vehicle was in the vicinity when an eruption took place. And lo and behold, it turned out he was right. That kind of work that combines biology, seismology, geology is something you can't do from a ship because when the eruption occurs the ship is booked 18 months out and furthermore Alvin is booked 18 months out and it's on the East Coast. So even if you could get Alvin off the Woods Hole vessel and fly it to Seattle and get it on the Seattle vessel because you've managed to trade time with somebody all of the action is half-lifed away before you get there. So that's the sort of work that this kind of system makes possible. And again, big data is about a whole lot more than science. It's about discovery in all sorts of fields and it's about lots of other stuff. And I talked to Mike Cafferella about some really interesting project recently. This is a startup done by one of our faculty and a bunch of students and it tells you when to buy consumer electronics, okay? And it does it by mining scads of historical price data to see what the impact of new product introductions is on prices. Okay, almost done. One of the last things I identified a couple of years ago was smart crowds and human-computer interaction. Does everybody know Luis Bonon? Wonderful young guy at Carnegie Mellon. Luis invented captures and recaptures, these annoying things that you have to type into. His original observation, which is astonishing, is that worldwide nine billion hours per year spent playing computer solitaire, right? The amount of time it took to build the Panama Canal, 20 million man hours, okay? So in less than one day of computer solitaire, you've got enough person hours to build another Panama Canal, right? So the question is could you put just some fraction of that work to useful good? Because people are willing to do all kinds of stuff in return for points and social stroking, right? So I hope you know the story of recaptures, okay? One of those is computer generated and the other for the first year was from the New York Times digital archive, okay? So the New York Times archive back to 1980 was available in digital form so it could be searched. From 1980 back to 1850, it was only in photographic form. So in order to search it, you have to convert it to ASCII basically, which means that you have to OCR it. And the OCR was only about 80% accurate because these are 100 year old newspapers that have been sitting in drawers and libraries and weren't very well-prepared in the first place, right? So for the first year, one of the two words, the computer knew the answer to because it had generated, the other one was a slightly distorted piece of text that had failed OCR from New York Times archive and in just one year, they completely digitized 130 years of New York Times. So here's the University of Washington version of this. David Baker is a phenomenal biochemist whose program Rosetta is the world champion at protein folding and protein structure calculation. And David used to consume about a third of the University of Washington's rack space grinding away on this. And one day he woke up and realized that he really had an embarrassingly parallel problem. Embarrassingly parallel because what you're trying to do is to find a minimum energy fold of the protein and the way you avoid getting stuck in a local minimum is you start from hundreds of thousands or millions of starting points and then you fold from those starting points and you hope one of them finds the true minimum, not just the local minimum for where it started. So David turned this thing into a screensaver and he had hundreds of thousands of people running the Rosetta screensaver and it was animated and he started getting mail from people saying you may have the best program in the world but it's dumb as dirt, right? Because it's doing this when it should be doing that. So he came to Zoran Popovic in our department and said could I turn this into a video game so that people could participate? And that's this game called Fold It. So Fold It now has hundreds of thousands of people, believe it or not, doing protein folding for points. And glory, I guess. In the beginning, 13-year-old gamers were killing PhD biochemists at this which shows either that there's such a thing as too much education or that kids these days have some sort of 3D spatial awareness that at least I lack because I didn't grow up playing video games like you guys. There's an older guy who more recently was the champ. They have their own little Facebook-type pages in Fold It. This guy's name is Boots McGraw. He calls himself a redneck from outside Dallas. Boots was the first guy who actually had a protein synthesized in a lab. That is they built what he had designed and they sent him this model and on Boots' page he says you can't read this. I'm really glad to have this model. I'll bring it to them in my office so when my coworkers once again ask why I'm not playing Farmville with them, I can show them the model, right? But here's the cool thing, okay? Last year, 50,000 gamers solved an AIDS-related protein structure problem, the solution to which had eluded the scientific community for a full decade. So gamers are doing actual biomedically useful science playing this game. It's really incredible. Here's one other example of this. This guy, Tony Tether, was my nemesis at DARPA for a number of years. We got rid of Tony and replaced him with Regina Dugan who unfortunately just left, but she was terrific and she hired Peter Lee who you know from CMU to direct an office in DARPA, right? And Peter had this problem. He was commuting from Pittsburgh to DARPA for the first few months of the job. And Peter's one vice is this hopped up car which he then managed to wreck so he doesn't have it anymore. But it's a GTO with a Corvette engine and blacked out windows. And Peter was within one ticket of getting his driver's license pulled which is going to put him out of the commuting between Pittsburgh and Arlington business when somebody showed him this program, Trapster. Has anybody used Trapster? Come on, fess up, right? Okay, so tell us what Trapster does. Right, it crowd sources where speed traps are. Okay, use the GPS to know your location, right? It works. Save Peter's ass, that's for sure, okay? So what Trapster does is you crowd source the location of speed traps and then when, you know, so now you're driving along and if you come to a place where people have reported speed traps, it goes boop, boop, boop, boop, you know, you slow down, you drive past and then later on it gives you a sort of thumbs up or thumbs down where you report whether the speed trap is still there or not. So think about it, I realize the rest of you are responsible drivers unlike this guy. Think about the history of this, you know, they introduce radar, so you get a radar detector, so they introduce lasers, so you get a laser detector, so they make detectors illegal, so you get an expensive little one that fits under your hood, all right? This is, you know, can't do much about that, right? So Peter then had the opportunity shortly after arriving at DARPA full time to meet with Bob Gates, the Secretary of Defense and he had no idea what to talk to Gates about, so he started by telling Gates about traps and it turns out Gates is a speeder too, right? So it went great, but then he showed Gates this thing which is called something like North Korea Uncovered and these are points of interest in North Korea that are effectively, think of them as uploaded from iPhones by regular people, right? So the question is, is this the bunch of intelligence operatives or is it a bunch of spy satellites or is it just random people with iPhones sticking stuff in there, okay? And that's what it is and that's what led to the balloon challenge, all right? And the important thing about, does everybody know about the balloon challenge? Who does not know about the balloon challenge? Okay, a couple people, so let me just do this quickly, we're almost done, I swear, okay? So DARPA a couple of years ago, one morning deployed 10 weather balloons, red weather balloons on, you know, 50-foot cords at 10 places around the U.S., ranging from places as easily findable as Union Square in San Francisco to, you know, Katie Park in Katie, Texas, wherever that is. And there were 4,367 teams registered to try and find all 10 of these balloons, right? And the intelligence community had said there was no way this could be done. Operatives can't do it, satellites can't do it, it can't be done. But there were nearly 1,000 submissions, 370 correct submissions. The winner was a bunch of folks from MIT, all right? So what you did was to set up a reward system for crowdsourcing this. By the way, there was great subterfuge going on. So for example, a bunch of people in Russia set up, they hacked the DNS so that people trying to enter their sightings of balloons were in fact standing into this Russian server which submitted them as their own and then discarded them, all right? So this was a lot more sophisticated than the obvious Photoshop fake sightings, okay? So the goal was to get people to use their 10 slots. Okay, unbelievable this worked, okay? So as a result of this path to the balloon challenge, it's clear that there's an entirely new way to crowdsource intelligence that does things that the intelligence community doesn't otherwise know how to do, right? Next to last, I think smart interaction. I think the connect is incredibly cool. There's some wonderful talks by Peter Lee that you should hear talking about the fact that almost every aspect of the connect, there are nine major challenges that Alex Kipman, the guy who was in charge of the connect, set for Microsoft research that were essential for the connect to work. And for those of you who don't have one, it really does work, all right? And this includes, for example, speech recognition with ADDB of ambient noise and not a close talk microphone, all right? Differentiation using a crummy little VGA caliber camera of siblings wearing similar shirts, all right? Just unbelievable computational tasks, tricks that we had no idea how to do until MSR researchers engineered research together in order to solve this problem, all right? And almost done, this whole notion of smarts leads me to a different version of the field. And this is our picture and this will be a little offensive to some people. That's what we've all been doing for 50 years, okay? So me too, it's compilers, it's operating systems, it's networks, okay? And it is every bit as important as it ever was. But what I see these days around this is a set of what I'm gonna call emerging fields of computer science and those fields couple us to a bunch of things that people really care about, right? So as important as we know compilers are, nobody but us really cares about them, all right? But people do care about energy and sustainability and smart health and transportation and elder care and a set of these things that we're essential to tackling. Right, so the argument in this PCAST report and I firmly believe this is that advances in that core, all right, made accessible by those little green blobs are absolutely essential to tackling these national and global challenges, okay? We're at the middle of all of this stuff, okay? And that's really the view I'd like you to walk out of this with, I really deeply believe it, okay? So what we did at UW this year, for example, was a boatload of hiring and machine learning because we simply weren't strong enough in what I consider to be the most important of these blobs in terms of coupling what we do, right, at the core of the field to these things that people care about, all right? So three little pitches and then I'm out of here. Number one related to that is embrace these applications as part of what we do. So what first convinced me of this was an article by Bob Lucky now, God, 15 years ago in IEEE Spectrum, I think it was, and this is the cartoon that accompanied it. The important phrase in the article was the last electrical engineer. And what he said was eventually there's a little good to be one electrical engineer. He or she is going to be the designer of the microprocessor, all right, and this inverted pyramid is all of the applications electrical engineering enables, all right? And so this one person is going to be very well fed and clothed and compensated, all right, but that's not enough to support an entire field. So the exhortation Lucky was making to electrical engineering at the time and I took it not as a ding on electrical but as a cautionary tale for us is unless you decide that that stuff is you, eventually there's only going to be one of you, all right? So we've got to stop being snotty about what's computer science and what's something else or honest to God, eventually there's only going to be one of us, all right? The second exhortation is use both parts of your brain and I could go on about this for a while but this is something that Steve Jobs really did well. There was this great thing in the onion you probably saw shortly after he died, okay? And, you know, I sometimes worry that this is true but I think, you know, the important thing is that this is somebody who really did have sort of the design sensitivity coupled with enough engineering wherewithal to figure out how to do it and I think that's really important. The third thing is the myth buster and we're over time so I won't spend more time on this but there are a set of myths about computer science and the problem with many myths is they have a grain of truth in them and therefore they're very hard to dispel, right? But the fact is we've done a lot even recently. It's an exciting field. It's great for women. There are tons of jobs. You know, this notion that somehow all the jobs are going overseas or they own a lot aren't a lot, it's just completely delusional. I don't know where it comes from. There are really compelling research visions for the future so we've got to make sure that kids and parents understand this. That's the story. I'm done. Really great to be here and I'd be happy to take a couple of questions and let's head up for the reception. But thanks for your attention.