 Live from Stanford University. It's theCUBE, covering the Women in Data Science Conference 2017. Hi, welcome back to theCUBE. I'm Lisa Martin, and we are live at Stanford University at the second annual Women in Data Science Technical Conference. It's a one-day event here, incredibly inspiring morning we've had. We're joined by Janet George, who is the Chief Data Scientist at Western Digital. Janet, welcome to the show. Thank you very much. We're very happy to be here. We're very happy to have you. You're a speaker at this event, and we want to talk about what you're going to be talking about. Industrialized Data Science. What is that? So, industrial data science is mostly about how data science is applied in the industry. It's less about more research work, but it's more about practical application of industry use cases in which we actually apply machine learning and artificial intelligence. What are some of the use cases at Western Digital for that application? So, one of the use case that we use is we are in the business of creating new technology nodes. And for creating new technology nodes, we actually create a lot of data. And with that data, we actually look at can we understand pattern recognition at very large scale? We are talking millions of wafers. Can we understand memory holes? The shape, the type, the curvature, circularity, radius. Can we detect these patterns at scale? And then how can we detect if the memory hole is warped or deformed? And how can we have machine learning do that for us? We also look at things like correlations during the manufacturing process. Strong correlations, weak correlations, and we try to figure out interactions between different correlations. Fantastic. So, if we look at big data, it's probably applicable across every industry. How has it helped to transform Western Digital that's been an institution here in Silicon Valley for a while? You know, we invest in digital, we move mountains of data. That's just part of our job, right? And so we are the leaders in storage technology. People store data in Western Digital products. And so data is inherently very familiar to us. We actually deal with data on a regular basis. And now we've started confronting our data with data science. And we started confronting our data with machine learning because we're very aware that artificial and intelligent machine learning can bring a different value to that data, right? We can look at the insights. We can develop intelligence about how we build our storage products. What do we do with our storage? Failing analysis is a huge area for us. So we're really tapping into our data to figure out how can we make artificial intelligence and machine learning ingrained in the way we do work. So from a cultural perspective, you've really done a lot to evolve the culture of Western Digital to apply the learnings to improve the values that you deliver to all of your customers. Yes, believe it or not, we've become a data-driven company. And that's amazing, because we've invested in our own data. And we've said, hey, if we're going to store the world's data, we need to lead from a data perspective. And so we've sort of embraced machine learning and artificial intelligence. We've embraced new algorithms, technologies that's out there that we can tap into to look at our data. So from a machine learning human perspective, in storage manufacturing, is there still a dependence on human insight where storage manufacturing devices are concerned or are you saying that machine learning really can, in this case, take more of a lead? No, I think humans play a huge role, right? Because these are domain experts. We're talking about PhDs in material science and device physics area. So what I see is the augmentation between machine learning and humans and the domain experts. So domain experts will not be able to scale when the scale of wafer production becomes very large. So let's talk about three million wafers, right? How is a machine going to physically look at all the failure patterns on those wafers? We're not going to be able to scale just having domain expertise, but taking our core domain expertise and using that as training data to build intelligence models that can inform the domain expert and be smart and come up with all the ideas, that's where we want to be. Excellent, so you talked a little bit about the manufacturing process. Who are some of the other constituents that you collaborate with as chief data scientists at Western Digital that are demanding access to data, marketing, et cetera? What are some of those key collaborators for you? Many of our marketing department, as well as our customer service department, we also have collaborations going on with universities. But one of the things we found out was when a drive fails and it goes to our customer, it's much better for us to figure out the failure. So we've started modeling out all the customer returns that we've received and look at that and say, how can we predict the life cycle of our storage? And get to those return possibilities or potential issues before it lands in the hands of the customers. Yeah, so that's one area that we've been focusing quite a bit on to look at the whole life cycle of failures. You also talked about collaborating with universities. Sure, a little bit about that in terms of what is there a program for internships, for example? What, how are you helping to shape the next generation of computer science? We are very strongly embedded in universities. We usually have a very good internship program. Six to eight weeks to 12 weeks in the summer, the interns come in. Ours is a little different where we treat our interns as real value add. They come in and they're given a hypothesis or a problem domain that they need to go after and within six to eight weeks and they have access to tremendous amount of data. So they get to play with all this industry data that they would never get to play with, right? And then they can quickly bring their academic background or their academic learning to that data. We also take really hard research ended problems or further out problems and we collaborate with universities on that, especially Stanford University. We've been doing great collaborations with them. I'm super encouraged with Fei-Fei Li's work on computer vision and we've been looking into things around deep neural networks. This is an area of great passion for me. I think the cognitive computing space has just started to open up and we have a lot to learn from neural networks and how they work and where the value can be added. Looking at, I just want to explore the internship topic for a second. We're at the second annual Women in Data Science Conference. There's a lot of young minds here, not just here in person, but in many cities across the globe. What are you seeing with some of the interns that come in? Are they confident enough to say, I'm getting access to real world data, I wouldn't have access to in school? Are they confident to play around with that, test out a hypothesis and fail? Or do they fear I need to get this right right away? This is my career at stake. Yeah, it's an interesting dichotomy because they have a very short time frame, right? So that's an issue because of the time frame and they have to quickly discover. But failing fast and learning fast is part of data science. And I really think that we have to get to that point where we are very comfortable with failure and the learning we get from the failure. Remember the light bulb was invented with 99% negative knowledge, right? And so we have to get to that negative knowledge and treat that as a learning. And quickly, so we encourage a culture, we encourage a style of different learning cycles. So we say, what did we learn in the first learning cycle? What discoveries, what hypothesis did we figure out in the first learning cycle which will then propel our second learning cycle? And we don't see it as a one stop, rather more iterative form of work. And also with the internships, I think sometimes it's really essential to have critical thinking. And so the interns get that environment to learn critical thinking in the industry space. Tell us about, from a skills perspective, these are, and you can share with us, presumably young people studying computer science, maybe engineering topics, what are some of the traditional data science skills that you think are still absolutely there? Maybe it's a hybrid of a hacker and someone who's got great statistician background. What about the creative side and the ability to communicate? What's your ideal data scientist today? What do they embody in terms of them? So this is a fantastic question because I've been thinking about this a lot. I think the ideal data scientist is in the intersection of three circles. And the first circle is really somebody who's very comfortable with data, mathematics, statistics, machine learning, you know, that sort of thing. And the second circle is in the intersection of implementation, engineering, computer science, electrical engineering, you know, those backgrounds where they've had discipline, they understand that they can take complex math or complex algorithms and then actually implement them to get business value out of them. And the third circle is around business acumen, program management, critical thinking, really going deeper, asking the questions, explaining the results of very complex charts, ability to visualize that data and understand the trends in that data. So it's the intersection of these very, very diverse disciplines and somebody who has deep critical thinking and never gives up. That's a great point that never gives up. But looking at it in that way, have you seen this, Eva, we're really here at a revolution, right? Have you seen that data science traditionalist role evolve into the intersection of these three elements? Yeah, traditionally, you know, if you did a lot of computer science or you did a lot of math, you'd be considered a great data scientist. But if you don't have that business acumen, how do you look at the critical problems? How do you communicate what you found? How would you communicate that what you found actually matters in the scheme of things, right? Sometimes people talk about anomalies and I always say, is the anomaly structured enough that I need to care about? Is it systematic? Is it, why should I care about this anomaly? Why is it different from an alert, right? So if you have modeled all the behaviors and you understand that this is a different anomaly than I've normally seen and you must care about it. So you need to have that business acumen to ask the right business questions and understand why that matters. So your background in computer science, your bachelor's PhD? Bachelor's and master's in computer science, mathematics and statistics. So I've got a combination of all of those. And then my business experience comes from being in the field. I was going to ask you that. How did you get that business acumen? Sounds like it was by in-field training basically on the job. It was in the industry, it was on the job. I put myself in positions where I've had great opportunities and tackled great business problems that I had to go out and solve. Very unique set of business problems that I had to dig deep into figuring out what the solutions were and so then gained the experience from that. So looking, going back to Western Digital and how you're leveraging data science to really evolve the company. You talked about the cultural evolution there which we both were mentioning off camera is quite a feat because it's very challenging. Data from many angles, security, usage is a board level, board room conversation. I'd love to understand and you also talked about collaboration so talk to us a little bit about how in some of the ways, tangible ways that data science and your team have helped evolve Western Digital improving products, improving services, improving revenue. So I think of it as when an algorithm or a machine learning model is smart it cannot be a threat. You see there's a difference between being smart and being a threat. It's smart when it actually provides value. It's a threat when it takes away or does something you would be wanting to do. And here I see that initially there's a lot of fear in the industry and I think the fear is related to oh here's a new technology and we've seen technologies come in and disrupt in a major way and machine learning will make a lot of disruptions in the industry for sure but I think that will cause a shift or a change. Look at our phone industry and how much the phone industry has gone through. We never complain that the smartphone is smarter than us. We just love the fact that the smartphone can show us maps and it can send us in the right, of course it sends us in the wrong direction sometimes but most of the time it's pretty good and we've grown to rely on our cell phones. We've grown to rely on the spotness. So I look at when technology becomes your partner, when technology becomes your ally, when it actually becomes useful to you there's a shift in culture. And so we start by saying, how do we earn the value of the humans? How can machine learning, how can the algorithms we build actually show you the difference? How can it come up with things you didn't see? How can it discover new things for you that will create a wow factor for you? And when it does create a wow factor for you you will want more of it. So it's more, to me, it's most an intent based progress in terms of the culture change. Because you can't push any new technology on people. People will be reluctant to adopt. The only way you can, people adopt to do technologies when they see the value of the technology instantly, right? And then they become believers. And so it's a very grass root level change, if you will. So you see for the foreseeable future that from a fear perspective in terms of maybe job security that at least in the storage and manufacturing industry people aren't going to be replaced by machines. Do you think it's going to maybe live together for a very long, long time? I totally agree. I think that it's going to augment the humans for a long, long time. I think that we will get over our fear. We worry that the humans are, I think humans are incredibly powerful. We give way too little credit to ourselves. I think we have huge creative capacity. Machines do have processing capacity. They have very large scale processing capacity and humans and machines can augment each other. I do believe that, you know, like the time when we had computers and we could be relied on computers for data processing. We're going to rely on computers for machine learning. We're going to, we're going to get smarter. So we don't have to do all the automation and the daily grind of stuff, right? If you can predict and that prediction can help you and you can feed that prediction model some learning mechanism by reinforced learning or weighting or ranking. Look at spam industry, right? We just taught the Spamagoochies to become so good at catching spam and we don't worry about the fact that they do the cleansing of that level of data for us, right? And so we'll get to that stage first and then we'll get better and better and better. But I think humans have a natural tendency to step up. They always do. We've always, through many generations, we have always stepped up higher than where we were before, right? So this is going to make us step up further. We're going to demand more. We're going to invent more. We're going to create more. But it's not going to be, I don't see it as a real threat. The places where I see it as a threat is when the data has bias or when the data is manipulated, which doesn't, which exists even without machine learning. Right, absolutely. I love though that the analogy really that you're making is as technology is evolving, it's kind of a natural catalyst for us humans to evolve and learn and progress and that's a great cycle that you're making. Imagine how we did farming 10 years ago or 20 years ago, right? Imagine how we drive our cars today than we did many years ago. Imagine the role of maps in our lives. Imagine the role of autonomous cars, right? So this is a natural progression of the human race. That's how I see it. And you can see the younger, young people now are so natural for them. Technology is so natural for them. They can tweet and swipe and, and that's the natural progression of the human race. So I don't, I don't think we can stop that. I think we have to embrace that. It's, it's, it's a gift. That's a great message embracing it. It is a gift. Well, we wish you the best of luck this year for Western Digital and thank you for inspiring us and probably many that are here and those that are watching the live stream. Janet George, thanks so much for being on theCUBE. Thank you. Thank you for watching theCUBE. We are again live from the second annual Women in Data Science Conference at Stanford. I'm Lisa Martin. Don't go away. We'll be right back.