 here with theCUBE. We are at Stanford University at the Ariaga Alumni Center at a really interesting conference. It's the very first year. It's women in data science, about 500 women here for a full day of discussions, panels, breakout sessions, and so we want to come over like we like to do and find the smartest people we could find and bring them to you. So we're excited to have Monica Regotti. She's an independent data science advisor. Welcome. It's good to be here. Absolutely. So you were just on an interesting panel. Four or five women up there talking about data science and kind of the evolution of data science. And one of the topics that came up, which is very appropriate as we're here at Stanford, is really how to train better data scientists. How academics and academia can kick out better, more well-trained people. I know you have some opinions that we want to share on that. Yeah, I mean, one of the most interesting things that academia can do to produce better data scientists is to have project-based courses, right? So having a lot of real data, having a lot of real projects, having to deal with messy data and not trusting the data, being data detectives and so on. So that's something that I feel can contribute a lot to preparing data scientists for industry. So that's something that I'd like to see more of. Yeah, the dirty data, the exploring the data is an interesting kind of real-world experiment where it doesn't always come out great. You don't always have what you want. And then there's always trade-offs. There's a lot of discussion earlier today about different types of algorithms and different types of algorithms to get different types of results, depending on what you're trying to optimize on, because you might have completely different algorithms and things you're trying to optimize on the same set of data and there's always trade-offs. Yeah, and there's a ton of trade-offs when it comes to algorithm selection, but my bias has always been towards bringing in new, fresh signals into your problem. So thinking about what other data can you collect that you can use into making your results and your experience better? And sometimes that can have much, much orders of magnitude bigger impact compared to your algorithm selection. And that's something that, again, that's something that we'd like to see more of coming out of academia. Many times in academia, and when you have classes, you are handed this data set and you have to stay within those boundaries. When in fact, when you are in industry, you have to think outside the box and you have to think of what other signals you can pull in. And so that's crucial because I see people considering that cheating, when you're trying to collect extra signals, when in fact that is the right creative way to solve the problem. Yeah, absolutely, as evidenced by IBM just spent $2 billion with the Weather Channel to get that data to start to incorporate that into some of their things. So clearly that's the right way to go and to think out of the box and bring that extra signal. And you had another interesting line that was great on the panel about ROI and it's really everything good over everything bad, which got a great rise from the crowd. But it makes a lot of sense, right? People sometimes just limit the scope and how they're evaluating things. Yeah, so the question there was, how do you select what problem to work on? How do you select what questions to answer? And the way I like to think about it is about impact, right? So think about the impact that what you're working on is gonna have and then think about the effort that goes into. And if you have those two things and keep them balanced and you think through the results before you even start working on something, then that's gonna put you in the right frame of mind to be able to make those trade-offs because resources in data science, as always, are very limited. So what are the exciting kind of things that's happening right now is internet of things, right? And sensors being everywhere. And you were kind of at the cutting edge of that. You were a jawbone before and jawbone and all these devices which originally started out is pretty small. You know how far did I walk today? You know how the potential to evolve in a number of different ways. Sensors everywhere, more computing power, more compute push to the edge. So what are some of the things that you're excited about that's gonna open up with this kind of next evolution of internet of things? So there's several things that I'm excited about there. Some of them are around us being able to process time series data and being able to cross all kinds of streams and intersect streams of data and understand how they correlate with each other and how they influence each other and then using that to build smarter products. So thinking about how do we build better user experience? How do we make all this data science that goes on behind the scenes? And how do we make that useful to people? How do we have a better life for them? Now when you're out consulting with companies, what are some of the kind of typical pitfalls that you find over and over and over again that you try to help people out with? What are some just common mistakes that you just see way too often? Okay, I see a lot of data scientists in particular being very excited about particular methods that are now new and hot and that happens in engineering also when you think about new technologies that are becoming popular and you want, as a data scientist, as a technical person, you want to be able to say that you're on the cutting edge. So then people focus a lot on the technologies themselves or on the techniques themselves and don't really think about the ultimate goal about what am I trying to do, right? So that's something that's interesting and it's interesting to observe that happening and it's always good to kind of put that in perspective and say, okay, this is fun to think about, but let's also think about what are we trying to do here? Right, and to be part of the discussion, the broader discussion with the business people. So that said, what are some of the advice you give to clients on finding success, early success? What's the best way for them to get started? To become, you know, they say, I want to be a data-driven organization. We're kind of there, we're not really there. What are some things that they can do to get started besides, you know, spinning up a Hadoop cluster and starting to dump all their data in there? Oh yeah, that should not, definitely not be the first thing that they do. One thing that they would need to do is think about what data they would like to collect for the products that they would ultimately want to build, right? So that's when talking to a data scientist early on helps, even if you can't, don't have the resources to involve them at that stage, but being able to record the right data, being able to anticipate use cases that you're going to eventually want to cover. So starting on that early and recording the data early on so you have it is something that I found that it's very useful. Okay, and then the last part of the equation is we always talk about people, process and tech. And so there's the tech part, which is great, it evolves a lot, there's the processes, but then there's those pesky things people get in the way. So again, when you're advising clients, what's kind of the secret sauce to get the people on board? Whether that's from the senior management down? What is some of your experience and the crotchety old guy in the corner that says we've always done it this way and I've got 30 years of intuition. How do you help companies transform the people side of it to become more data-centric, more of a data-driven organization? I mean, as always, as with any new thing, you have to show value. We have to show why you're contributing something, why you're improving some metric, right? So with careful choice of a metric, then that makes things easier because you can actually run tests, run experiments. We saw a great talk about experimentation and you can see how it influences the metric that you're trying to track. And so that's when a lot of people come on board and they really understand the power of data science. All right, so last question before I let you go back to the conference, what are you excited about 2016? 2016 is going to be the year of what for big data? Sensors, I would say. I think there's smart products, right? So I see this growth of data, what I like to call data natives, which are people who expect their life to be improved by smart products and everything to adapt to their taste and habits. And it's similar to the parallel we saw around digital natives, right? And now we see that people really expect everything to just work and be smart and adapt and they get frustrated when it doesn't. And so I think a lot of this data coming in, new data that we're recording and all data that's around is going to contribute to making products smarter and better. All right, well thanks, Monica, for taking a few minutes. Great to be here. Absolutely, so I'm Jeff Frick, we're at Stanford University at the Ariaga Center at the Women in Data Science Conference. First ever, we'll be here next year. Thanks for watching.