 Kevin. Thanks everyone for having me here. This has been great all day long to see all the different interesting activities and applications of data science around the room and something we try to bring into the federal government to use the same kind of tools and innovation that you're developing out here to provide better services for the American people. So before I talk about that, I want to get into first, who are you in this room? And I think a lot of you would consider yourselves data scientists, whether you're students, professors, you're working at software companies, you're weekend hackers. And I'm curious about what you're currently doing as you use those data science skills and what you're considering doing. So raise your hand if any of the following apply to you that right now or sometimes in the next few years you're considering working in academia. Established industry, so big companies that are using data science to improve the bottom line. Startups, software startups, nonprofits, public service. All right, I like the number of hands right there. And I'm going to be pretty honest, I'm going to try and increase the number of hands right there. Maybe shouldn't have told you that, but the secret's out. So who am I? I've kind of lived in a lot of these different fields as well. I was in academia building models of clean energy technology adoption during my PhD at Oxford. I've done data science within a large organization trying to improve the effectiveness of our operations at Obama for America 2012. I've built a data science software product, a big data analytics platform at Red Alley Analytics. And now I'm working in public service at the White House of the U.S. Digital Service. So what is this thing that we all call data science? Some people say it's just statistics on a Mac. More helpfully, Drew Conway has a famous Venn diagram of data science being the combination of programming skills, statistical knowledge as well as subject matter expertise. And he kind of highlights the sweet spot of data science in the middle and some of the danger zones around the edges. So I feel like actually what is data science is pretty well established, but there's this other question of what should you do with data science? How should you spend your days? What should you work on? And first off, let's acknowledge you can do anything. This is one of the most interesting, hottest industries that we have in the world right now. According to McKinsey, there is a shortage of people who can make decisions based on analytical findings, so that's a broader definition than just data science, but of half a percent of the U.S. population. The Harvard Business Review called data science the sexiest job of the 21st century. This article was actually written by DJ Patil, who is the White House chief data scientist who also used data science to invent the term data scientist. So it's a little self-congratulatory. So I want to propose another Venn diagram. It's pretty simple, nothing surprising of what you should do with your time with data science. One is to work on projects where you're gathering new insights from data, insights that you wouldn't have from other analytical techniques. Also working on projects where you have leverage. So given those insights, you have people in positions of influence that can take them and change things with them. And finally, working to improve lives, working on causes you care about, bending the arc of the moral universe towards justice. And there are different sweet spots in between all these things that I've kind of worked on. So in academia, I was generating new insights from data about things I really cared about related to climate change. And it was kind of research that someone else later could use to put into leverage to make changes in the world. I've worked in the private sector where we built a big data analytics platform used by major banks across the whole world. It was certainly a profitable endeavor as well as intellectually interesting. I've also worked on projects where at the end of it I realized all this analysis that went into this didn't really tell us anything we didn't already know. And right now I'm happy to say I'm kind of living at the sweet spot in the middle there where the magic happens with data science working with the U.S. Digital Service. So when you start a project or when you get up every day, you should be asking yourself, what can we learn from the data that we don't know already? Who will use those insights to make change? And will those changes improve people's lives? And on that last question, I want you to be incredibly ambitious. Don't short sell yourself on that last question. So President Obama has a quote engraved on the rug in the Oval Office from John F. Kennedy that no problem of human destiny is beyond human beings. Every problem we have, everything we should do something about, we can do something about. Okay, so let's talk about data science at the U.S. Digital Service. What is the U.S. Digital Service? A few people have asked me that as I've wandered around the room today with a funny name tag. We were created after the healthcare.gov launch after that project kind of went off the rails. There was a team of six people involved in the rescue effort. So hundreds of millions of dollars have been spent and six people came in to work side by side with the agency to get back on track. We bring top design and engineering talent and practices from the private sector to presidential priorities in government. One of the first classes was personally recruited during a 45-minute sales pitch by the president in the Roosevelt Room of the West Wing. It's a pretty hard sales pitch to beat. And we help government deliver world-class digital services for students, immigrants, children, the elderly, everyone at more efficient costs. And the environment we plug into a federal information technology projects is a tough one. We spend 86 billion dollars on IT projects in the federal government per year. That's more than the entire venture capital industry spends annually on everything. 94% of those projects are over budget or behind schedule, which basically means every one of those projects is over budget and behind schedule. 40% of those projects are completely scrapped. So after hundreds of millions of dollars have been spent, no working product is delivered. This is the environment we work in. Let me talk about a couple individual projects. College scorecard is one. Imagine a young woman, Jeanine, she's an 18-year-old college senior. She's going to be the first generation college student in her family. She wants to pay a fair price for a great education. She doesn't want to spend a little bit of money and not get a good educational outcome, but she doesn't want to spend a lot of money and not get a good educational outcome. So the college scorecard project was developed with the U.S. Digital Service in collaboration with Department of Education to clean and release massive data sets on lots of different variables about college education costs and outcomes. And this is the people's data that has been gathered by the Department of Education that we're returning to the people. So over 7,000 colleges and universities are included going back 18 years, and we help students and parents identify how to get the biggest bang for their buck with information on average earnings, the true cost of college, the factoring in scholarships and financial aid, debt rates at graduation, percent loan repayment, things like that. We really prioritized extensive user testing to prioritize the students and parents' needs, rapid iteration on paper prototypes. We released it as an open data API and an open source reference implementation. And in the first weeks of this, 11 organizations were building new tools using the API to help people make one of the most important financial decisions of their entire lives. So this is what it looks like right now as a reference implementation. You can quickly plug in the state that you're looking for, different types of institutions, and pull up results ranking these schools based on percentage earnings above a high school grad, graduation rates, debt repayment, things like that. This data, all of you in this room could do incredible things with this data. It's available to you as a CSV. It's available to you via API. And the source code for the reference implication we have is entirely open source on GitHub. Let's talk about another project, immigration. So picture Sarah. She's a permanent resident non-citizen. She's a mother of two. She's extremely busy. Her green card is her proof of that status, and she's legally required to carry it at all times. But her wallet was stolen and she needs to get a replacement. Until recently, this was entirely a paper-based process. It often took six to eight months to process these replacements. Forms are physically shipped thousands of miles between six processing centers around the country. The stack of paperwork you need looks like this. About a decade ago, immigration services decided to convert to an online system to save money and improve efficiency, which is a great idea. But after six years and 1.2 billion with a B, no working product was delivered. And so the dedicated civil servants inside the agency decided to stand up and call for change in the status quo. And they partnered with six people from the US Digital Service, same size as the healthcare.gov team to work side by side with the agency to change our approach to this IT project. Within less than three months, after six years, we pushed our first products to production. You can now file a replacement green card entirely online. Without anyone touching a piece of paper, it's faster, better, cheaper, and it looks like this. And it welcomes our permanent residents saying we want to make this process easy for you. We want to welcome you. Thanks. Now I want to talk about Medicare. This is what I work on. So who here is over 65 or has a family member over 65? They're probably on Medicare. They're 55 million beneficiaries on Medicare. It is the largest single payer for healthcare in the United States. It represents about 3.5% of the American economy. This is a big deal. And the specific incidents we're trying to solve for to improve the provision of Medicare in this country, I want to describe first with an example. So think about Abdul. He's 70 years old. He receives Medicare. He's been diagnosed with N-Sage renal disease. His kidneys have permanently stopped functioning. He's on dialysis three times a week for four to six hours each session. That can be a really brutal process. And like up to 60% of people on that, he has episodes of depression due to the quality of life and quality of care. His doctor is Dr. Jane, a kidney doctor who leads a team of the dialysis center. She wants to provide quality care. She strives to do that every single day. But she needs the system to help her provide better care, not fight her from providing better care. And like the average nephrologist, she spends about 55 hours per week seeing patients and 14 more hours doing paperwork. So under the old system of fee for service Medicare payments, Medicare paid about $127 per dialysis session, regardless of the duration. So he had no financial incentive to provide more dialysis. In contrast, we paid per unit of medication. So there was lots of incentive to provide more medication. An EPO stimulating agents were the main medication, which is a useful drug, but it has some complications with people with mild anemia. And it can increase risk of negative cardiovascular events. So as an example, in 2009, Medicare paid $9 and 20 cents per thousand units of these agents. And so providers who had costs less than that, which meant that they made a slight profit on every unit sold, use the agents in about 80% of their dialysis encounters. Whereas providers who had costs more than that, which meant they've made a slight loss, used it in only 20% of their dialysis encounters. Now I want to note there's the direction of causation here is very unclear. People who are prescribing this a lot negotiate better rates. But the point is you can see how these incentives for particular types of care are dramatically shifting the type of care provided. Further, this was leading to overall skyrocketing costs. We spent about $4 billion just for these agents in 2007. That's four times the next most expensive drug in Medicare. And it was potentially lower quality with 22% of US dialysis patients signed every year compared with 7% in Japan. So we've now moved to a pay for performance model for treatment for dialysis, measuring the outcomes of that care and adjusting compensation accordingly, using the hemoglobin levels, urea reduction ratios, things like that and patient satisfaction surveys. And that allows Dr. Jane to find the right pattern of care for Abdul that she has been taught to use without having to conform to the incentive set up by the fee for service model. Now the quality payment program is a new program that's trying to take this idea of incentivizing quality care at efficient cost across all Part B payments. I'm currently serving as a Chief Technology Officer with another person from the United States Digital Service of this project. We're rewarding better care, smarter spending and healthier people. And this system gives doctors more freedom to care for patients the way they were trained to use their own discretion and and and respond to the particular needs of the patient. And also we're trying to reduce the reporting burden from scattershot programs that require compliance. This is a pretty big challenge. We're trying to promote the right incentives, reduce reporting burden so that doctors don't have to spend 20% of their days filling out paperwork. We're rewriting systems using outdated technology. For instance, much of the claims processing system is written on millions of lines of cobalt, which is a programming language literally older than Medicare itself. We are implementing complex legislation. So the rule has a 700 page preamble to the actual rule. So you know, this is a pretty challenging endeavor working on. There's an input challenge. We're processing billions of claims per year. We are allowing clinicians and groups to submit quality metrics about themselves. There's hundreds of unique quality measure categories, dozens of existing submission systems across Medicare and private payers. We're ingesting data from a variety of electronic health records, have to collect accurate information on the doctor's specialty and their payment model, which adjusts their payment adjusts, which affects their payment adjustments. And we're trying to align with private sector quality payment programs to use similar measures, similar data sources, so that you don't have to report the same type of information many, many ways. Further, we need to simplify the authentication process. Clinicians and groups right now have at least a half a dozen accounts they use to manage Medicare compliance, all with different logins. People can submit data on behalf of other people. We have to manage that process. And in terms of infrastructure, a lot of these annual measures have massive spikes in seasonal traffic. So you have to adjust your infrastructure accordingly. Once the data is in the system, there's a data analytics challenge. We're calculating benchmarks for these hundreds of measures. We are calculating individual scores for the 1.2 million clinicians providing Medicare Part B services in this country. We're finding and correcting faulty, erroneous, fraudulent data, grouping distinct claims, so the seven tests and three doctors visits you had that were part of a unique episode of care. Assigning the correct clinician so it becomes clear that Dr. Jane is ultimately responsible for Abdul's dialysis outcomes and adjusting outcomes by demographics, region condition. And then once this data is all analyzed after being input and run through the calculation engines, we have a giant feedback challenge. So how do we use the performance score to adjust all applicable claims, adjusting the payments up and down accordingly, allowing informal review when they've found errors in the calculations, providing feedback as quickly as possible. So the legislation actually allows for the actions that Dr. takes on January 1st 2017 to affect their claims payments in December 31st, 2019. So we're trying to shorten that three-year feedback window to something much shorter to improve care and give feedback quicker. And we also want to provide accessible, intuitive feedback reports so that someone actually knows, you know, what they can do to improve their quality of care and their scores. This is an example of some of the current feedback scores reporting, which is a little difficult for doctors to know what they should be doing and why in response. So we're obsessed with user research. We spend a lot of time right now on site with doctors, registries, in their clinics, in hospitals. This is something that isn't often done in traditional data science that I would highly, highly encourage. Understand that subject matter information that Drew Conway talks about in his Venn diagram. We're obsessing over making it easy to submit this data and having rapid quality feedback. So the outer ends of the pipeline of the analytics engine. And we're using commercial cloud infrastructure such as Amazon Web Services to reduce costs and increase the scalability of these services, which at times is pretty novel for the government. Not novel out here, but we're delivering functional prototypes every few weeks, trying to deliver product over paper, failing really fast. The first time we know we're really working in an agile environment is when we throw out a bunch of code that was developed that doesn't serve its purpose. And we're trying to empower the private sector to use open APIs for the submission systems, the feedback reports, collaborate with these very highly sophisticated health private sector development teams to improve the quality of the product delivered. And I'll also say, you know, working on data science at the U.S. Digital Service is also a personal challenge. There's an overwhelming scale here. I'm used to working on projects in academia or the private sector where I had a simple question that had very complex solutions. And here you're kind of reinventing the American healthcare economy. You've got one year to do it. The rules won't be finalized until two months before it goes live. You have to explain this all to hundreds of people on your own team and also 1.2 million doctors. And it's, you know, I've slept better. There's also a little bit of a slower feedback on the wins that you get. In old jobs I was used to, you know, shipping small units of code, having small wins every single day. And here you're working on a much larger scale that it can be really hard to see the feedback of your of your wins. There are occasionally random bureaucratic obstacles. I carry three laptops because they all have access permissions to different systems. I, when I'm at the White House complex, I can get locked down inside the building when there's a security alert. When I was at Medicare last week working late, I got locked out of the back gate and had to take an Uber because it was 1.5 miles around the whole gate because it's so big to the other side. But, you know, this is one of the few data scientists I'll ever work on. Few data science projects I'll ever work on that makes me look inward first, that challenges leadership and emotional management as much as intellect. And that's been an incredibly valuable and rewarding experience for personal growth. So, if this project is successful, we can help remake the healthcare sector with the single largest payer in the country. We can give doctors the freedom to provide quality care at efficient cost to some of our most vulnerable citizens. And we can save you guys the American taxpayer billions of dollars and extend the solvency of Medicare itself. Not that it's in any grave threat right now, but that's our goal. So, don't wish us luck. I want you to join us. And apologies to bids if I steal anyone from your institution. We have some of the best technical talent in the world that comes early engineers from Google, Facebook, Twitter, Amazon, you name it. They come to serve their country, often working, starting on short tours of duty. So, people often come to work for two or three months on projects. They start to love the work. They get sucked in. We had one incredible software products lawyer who came for one week and she accomplished so much in one week that a few months later she quit her job and came back full time because she was so excited about the scale of that. And it really only takes five minutes to apply. You just upload a resume, insert a paragraph, take the summer off, come hang out with us. I think you'll like it. And lastly, I want to close with a picture of a doctor from San Francisco General Hospital rooting through a dumpster. Now, that's not something you often see a doctor from San Francisco General Hospital doing. And I chose to tell this story today because the woman in this picture, I'll call her Dr. Liz right now, is married to a friend here at the Berkeley Institute for Data Sciences. These stories are one degree from you all the time. So this doctor had a patient, let's call him John, who had come to the hospital with a severe medical condition that needed further care. A homeless man, he was very attached to a stash of cans he had collected, which he hoped to turn in for a cash refund. And he had hidden them in this dumpster, which he knew was emptied only once a week. So against doctor's advice for his condition, John desperately wanted to leave the hospital to get these cans. Luckily Dr. Liz works in a system with a private sector quality payment program that incentivizes reducing hospital readmission rates and improving quality of care. She worked in a system that supported her when she said that at the end of an 80 hour work week, she wanted to take her lunch break and leave, go find these cans and check them into the patient's personal belongings. And because of that, John stayed in the hospital and got the care he needed. That's why I got up today. That's why I'll get up tomorrow. That's why I'll get up every day that I'm working on this project. And I want you to get up for that every day too. Thank you.