 Live from Stanford University, it's theCUBE, covering the Women in Data Science Conference 2017. Hi, welcome back to theCUBE. I'm Lisa Martin, and we are live at Stanford University at the second annual Women in Data Science One Day Tech Conference. We are joined by one of the speakers for the event today, Claudia Perlich, the Chief Scientist at Distillery. Claudia, welcome to theCUBE. Well, thank you so much for having me. It's great. It is exciting. It's great to have you here. You are quite the prolific author. You've won data mining competitions and awards. You speak at conferences all around the world. Talk to us about what you're currently doing as the Chief Scientist for Distillery. Who's Distillery? What's the Chief Scientist role? And how are you really leveraging data and science to be a change agent for your clients? So I joined Distillery when it was still called Media Six Degrees as a very small startup in the kind of New York ad tech space. It was very exciting. I came out of the IBM Watson Research Lab and really found this a new kind of challenging application area for my skills. And what does a Chief Scientist do? It's a good question. I think it actually took the CEO about two years to finally give me a drop description. And the conclusion at that point was something like, okay, there is technical contribution. So I sit down and actually code things and I build prototypes and I play around with data. I also am referred to as intellectual leadership. So I work a lot with the teams, just kind of scoping problems, brainstorming what may work or doesn't. And finally, that's what I'm here for today is what they consider an ambassador for the company. So being the face to talk about the more scientific aspects of what's happening now in ad tech, which brings me to what we actually do, right? So one of the things that happened over the recent past in advertising is it became an incredible playground for data science because the available data is incomparable to many other fields that I have seen. And so distillery was a pioneer in that space starting to look at initially social data, things that people shared, but over the years it has really grown into getting a sense of the digital footprint of what people do. And our primary business model was to bring this to marketers to help them on a much more kind of individualized basis identify who their customers current as well as futures are and really get a very different understanding than these kind of broad middle-aged soccer mom kind of categories to honor the individual tastes and preferences and actions that really truly reflect the variety of what people do. And many things, as you mentioned, I mean, I published on also mom and I have a horse. And so there are many different parts to it. I don't think any single one description fully captures that. And we felt that advertising is a great space to explore how you can translate that and kind of help both sides, the people that are being interacted with as well as the brands that want to make sure that they reach the right individuals. Very interesting. As buyers journey has changed to mostly online. It's an incredibly rich opportunity for companies to harness more of that behavioral information and probably see things that they wouldn't have predicted. We were talking to Walmart labs earlier and one of the interesting insights that they shared was that especially in Silicon Valley where people spend too much time in the car community and you have a long commute as well by train. And you'd think that people would want, I want my groceries to show up on my doorstep. I don't want to have to go into a store and they actually found the opposite that people in such a cosmopolitan area as Silicon Valley actually want to go into the store and pick up their grocery. So it's very interesting how the data actually can sometimes really change. It's really the scientific method on a very different scale but really using the behavior insights to change the shopping experience but also to change the experience of companies that are looking to sell their products. And I think that the last part to that puzzle is the question is no longer like, what is the right video for the Super Bowl? I mean, we have the Super Bowl coming up, right? And I did a study like, when do people pay attention to the Super Bowl? You can actually tell because you know what people don't do when they pay attention to the Super Bowl? They're not playing around with their phones. They're actually not playing Candy Crash and all of these things. So what we see in the ad tech environment, we actually see that the demand for digital ads go down when people really focus on what's going on on the big screen but that was a diversion. It's very interesting though because it's something that's very tangible and very, it's a real world application. So question for you about data science and your background, you mentioned that you worked with IBM Watson. Forbes just has said that data scientists is the best job to apply for in 2017. What is your vision? Talk to us about your team, how you've grown that up, how you're using big data and science to really optimize the products that you deliver to your customers. So data science has really many, many different flavors and in some sense, I became a data scientist long before the term really existed. Back then I was just a particular weird kind of geek but all of a sudden it's- Now it has a name. Now it has a name, right? And the reputation to be fun. And so you see really many different application areas depending very different skill sets. What is originally the focus of our company has always been around, can we predict what people are going to do? That was always the primary focus. And now you see and it's very nicely reflected at the event too, all of a sudden communicating this becomes a much bigger part of the puzzle where people say, okay, I realize you're really good at predicting, but can you tell me why? What is it like these nuggets of insights that you mentioned? Can you visualize what's going on? And so we grew a team initially from a small group of really focused machine learning predictive skills over to the broader, can you communicate it? Can you explain to the customer and our brands what happened here? Can you visualize data? So that's kind of the broader shift. And I think the most challenging part that I can tell in the broader picture of where there is a bit of a shortcoming in skill set, we have a lot of people who are really good today at analyzing data and coding. So that part has caught up. There are so many data science programs. What I still am looking for is, how do you bring management and corporate culture to the place where they can truly take advantage of it? This kind of disconnect that we still have. How do we educate kind of the management level to be comfortable evaluating what their data science group actually did? Whether they're working on the right problems that really ultimately will have impact. So I think that layer of education needs to receive a lot more emphasis and compared to what we already see in terms of this increased skill set on just the sheer technical side. So you mentioned that you teach when we were before we went live here that you teach at NYU, but you're also teaching data science to the business folks. So I'd love for you to expand a little bit more upon that and how are you helping to educate these people to understand the impact. Because that's really a kind of a change agent within a company. That's a cultural change, which is really challenging. Very much so. What's their perception? What's their interest in understanding how this can really drive value? So what you see, I've been teaching this course for almost six years now. And originally it was really kind of the hardcore quarters who also happened to get a PhD on the side who came to the course. And now you increasingly have a very broad collection of business-minded people. I typically teach in the part times, meaning they all have day jobs. And they've realized in their day jobs, I need this. I need that skill, that knowledge. And we're trying to get on a ground where without having to teach them Python and R and whatever the new toys are there, how can you identify opportunities? How do you know which of the many different flavors of data science from prediction towards visualization to just analyzing historical data to maybe even causality? Which of these tools is appropriate for the task at hand? And then being able to evaluate whether the level that of support that machine learning can bring is it even sufficient? Because often just because you can analyze data doesn't mean that the reliability of the model is truly sufficient to support then a downstream business project. And being able to really understand those trade-offs without necessarily being able to sit down and code it yourself, that knowledge has become a lot more valuable. And I really enjoy the kind of brainstorming when we're just trying to scope a project when they come with problems from their day job and say, hey, we're trying to do that. And saying, well, are you really trying to do that? What are you actually able to execute? What kind of decisions can you make? And this is almost like the brainstorming in my own company now brought out to much broader people working in hospitals, people working in banking. So I get exposed to all of these kind of problem sets and that makes it really exciting for me. Interesting, when Distillery is talking to customers or prospective customers, is this now something that you're finding is a board level conversation within businesses? No, I never get bored of that. So there is a part of the business that is pretty well understood and executed and that's you come to us, you give us money and we will execute a digital campaign either on mobile phones or on video and you tell me what it is that you want me to optimize for. Do you want people to click on your art? Please don't say yes. That's kind of the worst possible things you may ask me to do, but let's talk about what you're going to measure, whether you want people to show up in your store, whether you really care about signing up for a test drive and then the system automatically will build all the models that then do all the real time bidding. So advertising, I'm not sure how many people are aware, as your New York time page loads, every single ad slot on that side is sold in a real time auction. About 50 billion times a day, we receive a request whether we want to bid on the opportunity to show somebody an ad. So that piece, I can't make 50 billion decisions a day. It is entirely automated. This is fully automated machine learning that just serves that purpose. What makes it interesting for me now is that now this is kind of standard fare if you want to move over into the more interesting parts. While, can you, for instance, predict which of the 15 different creatives I have for Giovanni? Should I show you? The one with the woman running or the one with the kid opening, so that there is no nuances to it and exploring these new challenges or going into totally new areas, talking about, for instance, turn prediction. I know an awful lot about people. I can predict very many things and a lot of them go far beyond just kind of how you interact with ads. It's kind of almost the most boring part. We can see people researching diabetes. We can provide snapshots to pharma, telling them here's really where we see a rise of activity on a certain topic and maybe this is something of interest to understand which population really is driving those changes. And so these kind of conversations really make it exciting for me to bring the knowledge of what I see back to many different constituents and see what kind of problems we can possibly support with that. It's interesting too. It sounds like more, not just providing ad technology to customers. You're really helping them understand where they should be looking to drive value for their businesses. That's really, has been the focus increasingly and I enjoy that a lot. I can imagine that's quite interesting. I want to ask you a little bit before we wrap up here about your talk today. I was looking at the title of your abstract and it's beware what you ask for the secret life of predictive models. Talk to us about some of the lessons you've learned when things have gone a little bit, huh? I didn't expect that. I mean, I'm a huge fan of predictive modeling. I love the capabilities and what the technology can do. So this being said, it's the collection of aha moments where you look at this and there's, this doesn't really smell right. So to give you an example from, from ad tech and I alluded to this, when people say, okay, we want the high click through rate. Yes, I mean, that means I have to predict who will click on an ad. And then you realize that no matter what the campaign, no matter what the product, the model always chooses to show the ad on the flashlight app. Yeah, because that's when people fumble in the dark. The model is really, really good at predicting when people are likely to click on an ad, except that's really not what you intended when you asked me to do that. So it's almost the, they're so powerful that they move off into a sidetrack direction that you didn't even know existed. Something similar happened with one of these competitions that I won for Siemens Medical where you had to identify in FMI images of breast, which of these regions are most likely benign or which one have cancer. And we both models, we did really, really well, all was good until we realized that the patient ID was by far the most predictive feature. Now, this really shouldn't happen. Your social security number shouldn't be able to predict anything. It wasn't a social security number, but when we started looking a little bit deeper, we realized what had happened is the data set was assembled from different sources. And one was a treatment center and one was a screening center. And that's, they had certain ranges of patient IDs. And so the model had learned where the machine stood, not what the image actually contained about the probability of having cancer. And whoever assembled the data set, possibly didn't think about the downstream effect this can have on modeling, which brings us back to the data science skill as really comprehensive starting all the way from the beginning of where the data is collected all the way down to be extremely skeptical about your own work and really make sure that it truly reflects what you wanted to do. So that's, you asked earlier, like what makes really good data scientists is the intuition to feel when something is wrong and to be able to kind of pinpoint and trace us back with the curiosity of really needing to understand everything about the whole process. And also being not only being able to communicate it, but probably being willing to fail. That is the number one really requirement. If you want to have a data-driven culture, you have to embrace failure because otherwise you will fail. How do you find the reception to that fact by your business students? Is that something that they're used to hearing or does it sound like a foreign language to them? I think the majority of them are in kind of junior enough positions that they truly embrace that. And if at all they have come across the fact that they weren't allowed to fail as often as they had wanted to. I think once you go into the higher levels of conversations and we see that a lot in the ad tech industry where you have incentive problems, we see a lot of fraud being targeted because at the end of the day, the ad agency doesn't want to confess to the client that yeah, they just wasted $5 million of ad spend on bots. And even the CMO may not be feeling very comfortable confessing that to the CEO. So being willing to truly kind of face up the truth that sometimes data forces you into your face, that can be quite difficult for a company or even an industry. Yes, it can, it's quite revolutionary. As is this event, so Claudia Perlich, we thank you so much for joining us on theCUBE today and we know that you're going to be mentoring a lot of people that are here. We thank you for watching theCUBE. We are live at Stanford University from the Women in Data Science Conference. I'm Lisa Martin, we'll be right back. A 180 from what you weren't even thinking about. That's just such an interesting area because it impacts health, everything. It's, but I like the fact that you...