 Live from Stanford University, it's theCUBE covering Global Women in Data Science Conference. Brought to you by SiliconANGLE Media. Welcome back to theCUBE's live coverage of the Women in Data Science 4th Annual Global Conference. I'm Lisa Martin here at the Ariyaga Alumni Center at Stanford joined by a Woods speaker and Stanford alum, Madeline Udell. You are now an assistant professor at Cornell University. Madeline, welcome to theCUBE. Thank you, it's great to be here. So this is your first Woods. This is my first Woods. But you were at Stanford a few years ago when the Woods movement began. So tell us a little bit about what you do at Cornell, the research that you do, the classes that you teach, and the people, men and women that you work with. Sure, so at Cornell I'm studying optimization and machine learning. I'm really interested in understanding low dimensional structure in large messy data sets so that we can figure out ways of looking at the data set that make them seem cleaner and smaller and easier to work with. I teach a bunch of classes related to these topics, PhD classes on optimization and on optimization for machine learning. But one that I'm really excited about is an undergrad class I teach called learning with big messy data that introduces undergraduates to what messy data sets look like, which they often don't see in their undergraduate curriculum and ways to wrangle them into the kinds of forms that they could use with other tools that they have learned about as undergraduates. You say big messy data with a big smile on your face. So this is something that might be introduced to these students as they enter their PhD program. Define messy data and some applications of it. Oftentimes people only learn about big messy data when they go to industry and that's actually how I understood what these kinds of data sets looked like. I took a break from my PhD while my advisor was on sabbatical and I scampered off to the Obama 2012 campaign and on the campaign they had these horrible data sets. They had hundreds of millions of rows, one for every voter in the United States and maybe tens of thousands of columns about things that we knew about those voters and they were weird kinds of things. They were things like gender which in this data set was Boolean state which took one of 50 values, approximate education level, approximate income, whether or not they had voted in each of the last elections and I looked at this and I was like, I don't know what to do, right? These are not numbers, right? They're Booleans, they're categoricals, they're ordinals and a bunch of the data was missing. So there were many people for which we didn't know their level of education or we didn't know their approximate income or we didn't know whether or not they had voted in the last elections. So with this kind of horrible data set, how do you do like basic things? Like how do you cluster? How do you even visualize this kind of data set? So I came back to my PhD thinking, I wanna figure out how this works. I wanna figure out the right way of approaching this data set because a lot of people will just sort of hack it and I wanted to understand like, what's really going on here? What's the right model to think about this stuff? So that really was quite influential in the rest of your PhD and what you're doing now because you found this interesting but also tangible in a way, right? Especially working with a political campaign. That's right, that's right. So I'm both interested in the applications and I'm interested in the math. So I like to be able to come back to Stanford at the time we're now at Cornell and really think about what the mathematical structure is of these data sets. What are good models for what the sort of underlying latent spaces look like. But then I also try to take it back to people in industry, take it back to political campaigns but here at WIDS I'm really excited to tell people about the kinds of mathematics that can help you deal with this kind of data set reasonably. Because you have a talk this afternoon called Filling in Missing Data with Low-Rank Models. One of the things before we get into that that I'd love to kind of unpack with you is looking at taking the campaign, Obama 2012 campaign, messy data as an example of something that is interesting. There's a lot of science and mathematics behind it but there's also other skills I'd love to get your perspective on and that's creativity, that's empathy. It's being able to clearly understand and communicate to your audience. Where do those other skills factor into what you do as a professor and also the curriculum that you're teaching? Sure, I think they're incredibly important. If you want your technical work to have an impact you need to be able to communicate it to other people. You need to make, number one make sure you're working on the right problems which means talking with people to figure out what the right problems are. And this is one aspect that I consider really fundamental to my career is going around talking to people in industry about what problems they're facing that they don't know how to solve. Then you go back to your university, you squirreled away and try to figure it out. Oftentimes I can't figure it out on my own so I need to put together a team. I need to pull in other people from other disciplines who have the skills that I don't have in order to figure out the full solution to the problem. Not just to solve the part of the problem that I know how but to solve the full problem that I can see. And so that also requires a lot of empathy and communication to make the team actually produce something more than what the individual members could. Then the third step is to communicate that result back to the people who could actually use it and put it into practice. And for that, that's part of the reason that I'm here at WIDS is to try to show people the useful things that I think I've come up with but I'm also really excited to talk to people here and understand what gnarly problems do they not know how to solve yet. There's a lot of gnarly problems out there. I love that you brought that word up. But I'm just curious before we go further is understanding, did you understand when you were studying mathematics, computational engineering, data science, did you understand at that point of the other important skills of collaboration of communication or did you discover that along the way? And is that something that is taught today to those students like these are the other things that we want to develop in here? Yeah, I think we barely teach those skills. Really? I think at the earliest level, there's a lot of focus on the technical skills and it's hard to see the other skills that are going to enable you to get from 90 to 100%. But that 90 to 100% is the most important part. And if you can't communicate your results back, then it doesn't do so much good to have produced the results in the first place. But really a lot of the education right now at most universities is focused on the technical core and you can see that in the way that we evaluate students. We evaluate them on their homeworks, which are supposed to be individual on their test performance. Maybe there are projects and the projects I think are much better at helping them develop these skills of communication and teamwork. But that's not included in most courses because it's frankly hard to do. It's hard to teach students how to work on projects. It's hard to give them topics. It's hard to evaluate the results on the projects. It's hard to give them time to present it to a group. But I think these are critical skills, right? The project work is much more like what work becomes after they finish their studies. As you've been in the STEM fields for quite a while and gone so far in your academic career, tell me about the changes that you've seen in the curriculum and do you think that you're going to have a chance to influence some of those other skills communication? When I was in grad school studying biology, communication a long time ago was actually part of it for a semester. But I'm just wondering do you think that this is something that a movement like Woods could help inspire? I think it's important to help people see what the skills that they're going to need to use down the line. I think that sometimes the thing is I think that the technical foundation is really important. And I think that doubling down on that, particularly when you're young and you can concentrate on the nitty gritty details. I actually can get something that becomes harder as you get older. And so focusing on that for people in their undergrad and early PhD, I think that actually makes sense. But you want them to see what the final result is, right? You want them to see like what is the career and how is that different from what they're doing right now? And so I think events like Woods are really great for showcasing that. But I would also like to sort of pull that forward to pull that project work forward to the extent possible with the skills that the students have at any point in their curriculum. So in the class that I teach, particularly in big messy data, the capstone of the course is a class project where the students tackle a big messy data set that they find on their own, they define the problems and the form of what they're supposed to produce is supposed to be a report to their manager, right? To say, you know, the project proposal says, manager, this is why I should be allowed to work on this project for the next month because it's so important. It's really going to drive growth in our business. It's going to open up new markets. And they're supposed to describe it in industry terms, not just in academic terms, right? Then they try to figure out actually how to solve the problem. And at the end they're supposed to, once again write a report that's describing how, what they found will help and impact the business. That element of persuasion is always key. So last thing here as we wrap up, this is the fourth annual women in data science conference that I mentioned in the opening. The impact on the expansion that they have been able to drive in such a short period of time is something that I always love seeing every year. There's 150 plus regional events going on. They're expected to reach 100,000 people. What excites you about the opportunity that you have to present here at Stanford later today? I think it's amazing that there's so many people excited about WIDDS. I mean, I can't travel to 150 locations, certainly not this year, not in many, many years. So the ability to be in touch with so many people in so many different places is really exciting to me. I hope that they'll be in touch with me too. That direction is a little bit harder with current technology. But I wanna learn from them as well as teaching them. Awesome, well, Madeline, thank you so much for sharing some of your time with me this morning on theCUBE. We appreciate that and wish you good luck on your WIDDS presentation this afternoon. It was really fun to talk with you. Thank you for having me here. My pleasure, we wanna thank you. You're watching theCUBE live from the fourth annual Women in Data Science Conference of WIDDS here at Stanford. I'm Lisa Martin. Stick around, I'll be right back after a break with my next guest.