 Live from Stanford University, it's theCUBE, covering the Women in Data Science Conference 2017. Hi, welcome back to theCUBE. I'm Lisa Martin, and we are at Stanford University for the second annual Women in Data Science Conference. Fantastic event with leaders from all different industries. Next, we're joined by Finale Doshi-Velez. You are the Associate Professor of Computer Science at Harvard University. Welcome to the program. Excited to be here. You're a technical speaker, so give us a little bit of an insight as to what some of the attendees, those that are attending live, and those that are watching the live stream across 75 locations, what are some of the key highlights from your talk that they're going to learn? So my main area is working on machine learning for healthcare applications, and what I really want people to take away from my talk is all the needs and opportunities there are for data science to benefit patients in very, very tangible ways. There's so much power that you can use with data science these days, and I think we should be applying it to problems that really matter. Like healthcare. Absolutely, absolutely. So talking about healthcare, you kind of see the intersection, that's your big focus, is the intersection of machine learning and healthcare. What does that intersection look like from a real-world applicability perspective? What are some of the big challenges, and can you talk about maybe specific diseases that you're maybe working on to help with? Sure, absolutely. So I'll tell you about two examples. So one example that we're working on is with Autism Spectrum Disorder, and as the name suggests, it's a really broad spectrum. And so things that might work well for one sort of child might not work for a different sort of child, and we're using big data and machine learning to figure out what are the natural categories here? And once we can divide this disease into subgroups, we can maybe do better treatment, better prognosis for these children, rather than lumping them into this big bucket that... And trying to have a look at the same. Exactly. Right. And another area we're working on is personalizing treatment selection for patients with HIV and with depression. And again, in these cases, there's a lot of heterogeneity in how people respond to the diseases, and with the large data sets that we now have available, we actually have huge opportunities in getting the right treatments to the right people. That's fantastic, so exciting, and it's really leveraging data as a change agent to really improve the lives of patients. From a human interaction perspective, we hear that machine learning is going to replace jobs. It's really kind of a known fact, but human insight is still quite important. Can you share with us where the machines and the humans come into play to help some of these... Yes, a big area that we work on is actually in formalizing notions of interpretability, because in the healthcare setting, the data that I use is really, really poor quality. There's lots of it. It's collected in the standard of care every day, but it's biased, it's messy, and you really need the clinician to be able to vet the suggestions that the agent is making, because there might be some bias, some confounder, some reason why the suggestions actually don't make sense at all. And so a big area that we're looking at is how do you make these algorithms interpretable to domain experts, such as clinicians, but not data experts? And so this is a really important area, and I don't see that clinician being replaced anytime soon in this process, but what we're allowing them to do is look at things that they couldn't look at before. They're not able to look at the entire patient's record. They certainly can't look at all the patient records for the entire hospital system when making recommendations, but they're still going to be necessary because you also need to talk to the patient and figure out what are their needs, do they care about a drug that might cause weight gain, for example, when treating depression, and all of these sort of things. Those are not factors, again, that the machines are going to be able to take over, but it's really an ecosystem where you need both of these agents to get the best care possible. Got it, that's interesting. From an experimentation perspective, are you running these different experiments simultaneously? How do you focus your priorities on the autism side, on the depression side? I see, well, I have a lab, so that helps make things easy. I have some students working on some projects and some students working on other projects, and we really, we follow the data, so my collaborations are largely chosen based on areas where there is data available and we believe we can make an impact. Fantastic. Speaking of your students, I'd love to understand a little bit more. You teach computer science to undergrads. We look at how we're at this really inflection point with data science. There's so much that can be done and that to your point, intangible ways, the differences that we can make. Kids that are undergrads at Harvard these days grew up with technology and the ability to get something like that, we didn't. So what are some of the things that have influenced them to want to become the next generation of computer or data scientists? I mean, I think most of them just realize that computers and data are essential in whatever field they are. They don't necessarily come to Harvard thinking that they're going to become data scientists, but in whatever field that they end up in, whether it's economics or government, they quickly realize or business, they quickly realize that data is very important and so they end up in my undergraduate machine learning course. For these students, my main focus is just to teach them what the field can do and also what the field can't do and teach them that with great power comes great responsibility. So we're really focused on evaluation and just understanding how to use these methods properly. So looking at kind of traditional computer data science skills of data analytics, being able to interpret mathematics, statistics, what are some of the new emerging skills that the future generation of data and computer scientists needs to have, especially related to the social skills and like communication? So I think that communication is absolutely essential. At Harvard, I think we're fortunate because most of these people are already in a different field and they're also taking data science, so they are already good, very good at communicating because they're already thinking about some other area that they want to apply it. So they're getting really a good breadth. They're getting a really great breadth, but in general, I think it is on us, the data scientists, to figure out how do we explain the assumptions in our algorithms to people who are not experts again in data science because that could have really huge downstream effect. Absolutely. I like what you said that these kids understand that the computers and technology are important, whatever they do. We've got a great cross-section of speakers at this event that are, you know, people that are influencing this in retail, in healthcare, in education, as well as in sports technology on the venture capital side, and it really shows you that this day and age, everything is technology. Every company where we're sitting in Silicon Valley, of course, where a car company is a technology company, but that's a great point that the next generation understands that it's prolific. I can't do anything without understanding this and knowing how to communicate it. So from your background perspective, were you a STEM kid from way back and you really just loved math and science? Is that what shaped your career? So I grew up in a family with like 15 generations back. It was like accounting, finance, small business, and I was like, I'm never going to do any of this. I am going to do something completely different. You were determined, right. And so now I'm a data scientist. At Harvard, that's pretty good. We're working on healthcare applications. So I think numbers were definitely very much part of my upbringing from the beginning. But one thing that I think did take a while for me to put together is that I came from a family where my great uncle was part of India's independence movement. My role models were people like Martin Luther King and Mother Teresa and I liked numbers. And like how to put those together. And I think it definitely took me a while to figure out, okay how do you deliver those warm fuzzies with like cold hard facts? And I'm really glad that we're in a place today where the sort of skills that I have can be used to do enormous social good. What are some of the things that you're most excited about about this particular conference and being involved here? So I think conferences like these, like the women in data science, I'm also involved in the women in machine learning conference are a tremendous opportunity for people to find mentors and cohorts. So I went to my first women in machine learning conference over 10 years ago. And those are the people I still talk to whenever I need career advice where I'm trying to figure out what I want to do with my research and what directions or just general support. And when you're in a field where maybe you don't see that many women around you, it's great to have this connection so that you can draw on that wherever you end up. And your workplace may or may not have that many women but you know that they're out there and you can get support. Are you seeing now that there's so much data available and a lot of the spirit of corporations that use data as a change agent have adopted cultures or tried of try it. It might fail, but we're going to learn something from this. Do you see that mentality in your students about being free or being confident enough to try experiments and if they fail, take learnings and move forward as a positive? I mean, certainly that's what I tried to teach my students, my graduate students. I tell them I expect you to make consistent progress. Progress includes failure if you can explain why it failed. And that's huge. That's how we learn. That's how we develop new algorithms. Absolutely. Yeah, and I think that confidence is a key factor. Have you seen that, you mentioned that the Women in Machine Learning Conference that you've been involved in that for 10 years. How have you seen women's perspectives maybe competence evolve and change and grow as a result of this continued networking? Are you seeing people become more confident to be able to try things and experiments? I mean, certainly as people stay involved in the field, I've noticed that you kind of develop that network, you develop that confidence. And it's amazing. So the first events had less than 100 people. The last event that we had had over 500 people. It's the number of people at just the Women in Machine Learning event was the same as the number of people at the entire conference, like 10 years ago, right? And so the field has grown, but the number of women involved that you see through these events, like WIDs and WIML, I think is enormous. And the great thing that's happening here at WIDs 2017 is it's being live streamed. Right. To over 75 locations. So it's accessible to so many people. Exactly, they're expecting up to 6,000 people on the live stream. So the reach and the extension is truly global. Which is fantastic. It is fantastic. And just the breadth of speakers that are here to influence. You mentioned a couple of your key influencers, Martin Luther King and Mother Teresa. From an education perspective, when you were trying to figure out your love of math and numbers and that, who were some of the people in your early career that were really inspiring and helped you gain that confidence that you would need to do what you're doing? So I think if I have to pick one person, it was probably one, it was a professor at MIT that interacted quite a bit in my undergrad and continued to mentor me. Leslie Cabling, who was just absolutely fearless and just telling people to follow their passions. Because we really are super privileged. As was mentioned earlier, we lose our jobs so we can just get another one. Right. And our skills are so in need that we can and we should try to do amazing things that we care about. And I think that that message has really stayed with me. Absolutely. So you've got research going on in autism. You mentioned depression. What's next for you? What kind of are some of your next interests? Cancer research, other things like that? So I'm actually really interested in mental health because I think that that's, talk about messy spaces in terms of data. It's very hard to quantify, but it has a huge, huge burden, both to the people who suffer from mental health disorders, which is like close to 15%, 20%, depending on how you count. But also trying to, it has a huge burden on everyone else, right? On like lost work, on the people around them. And so we're working with depression and autism, as I mentioned, and we're hoping to branch out into other neurodevelopmental disorders as well as adult psychiatric disorders. And I feel like in this space, like it's even harder to find the right treatments. And the treatment takes so long to test, you know, like six to eight weeks. And it can be so hard to keep up the morale, to keep trying out a treatment when your disorder is one that makes it hard to keep up trying whatever you need to try. So that's an area that I'm really focusing on these days. Well, that your passion is clearly there, that intersection of machine learning and healthcare. You're right, you're talking about something that maybe isn't talked about nearly as much as some of other big diseases, but it's one that is prolific. It affects so many. And it's exciting to know that there are people out there like you who really have a passion for that and are using data as a change agent to help current generations in future to come. So finale is such a pleasure to have you on theCUBE. We wish you the best of luck in your technical talk and know that you're going to be mentoring a lot of people from far and wide. Thank you, my pleasure to be here. Absolutely, so I'm Lisa Martin. You've been watching theCUBE. We are live at the Women in Data Science Conference at Stanford University. We'll stick around, we'll be right back.