 Live from San Jose, California. It's theCUBE, covering, innovating to fuel the next decade of big data. Brought to you by Western Digital. Hey, welcome back everybody. Jeff Frick here with theCUBE. We're at Western Digital at their global headquarters in San Jose, California. It's the Alameda campus. This campus has a long history of innovation. And we're excited to be here and probably have the smartest person in the building if not the county area code and zip code. I love to embarrass her. Janet George, she is the fellow and chief data scientist for Western Digital. We saw you at Women in Data Science. You were just at Grace Hopper. You're everywhere and great to get a chance to sit down again. Thank you, Jeff. I appreciate it very much. So as a data scientist, today's announcement about MAMR, how does that make you feel? Why is this exciting? How's this going to make you be more successful in your job and more importantly, the areas in which you study? So today's announcement is actually a breakthrough announcement, both in the field of machine learning and AI because we've been on this data journey and we have been very selectively storing data on our storage devices. And the selection is actually coming from the pre-constructed queries that we do with business data. And now we no longer have to pre-construct these queries. We can store the data at scale in raw form. We don't even have to worry about the format or the schema of the data. We can look at the schema dynamically as the data grows within the storage and within the applications. Right, because there's been two things, right? Before data was bad because it was expensive to store. Now suddenly we want to store it because we know data is good. But even then, it still can be expensive. But we've got this concept of data lakes and data swamps and data all kind of oceans to pick your favorite metaphor. But we want the data because we're not really sure what we're going to do it in. I think what's interesting that you said earlier today is it was schema on, right? Then we evolved to schema on read, which was all the rage at Hadoop some of the couple of years ago. But you're talking about the whole next generation, which is an evolving dynamic schema based on whatever happens to drive that query at the time. Exactly, exactly. So as we go through this journey, we are now getting independent of schema. We are decoupled from schema. And what we are finding out is we can capture data at its raw form and we can do the learning at the raw form without human interference in terms of transformation of the data and assigning a schema to that data. We got to understand the fidelity of the data, but we can train at scale from that data. So with massive amounts of training, the models already know how to train itself from raw data. So now we are only talking about incremental learning as a trained model goes out into the field in production and actually performs. Now we are talking about how does the model learn? And this is where fast data plays a very big role. So it's interesting, because you talked about that also earlier in your part of the presentation, kind of the fast data versus big data, which kind of maps the flash versus hard drive. And the two are not, it's not either or, but it's really both. Because within the storage of the big data, you build the base foundations of the models, and then you can adapt, learn, and grow change with the fast data, with the streaming data on the front end. And then you get a whole new world. Exactly. So the fast data actually helps us after the training phase, right? And these are evolving architectures. This is part of your journey. As you come through the big data journey, you experience this. But for fast data, what we are seeing is these architectures like Lambda and Kappa are evolving. And especially the Lambda architecture is very interesting because it allows for batch processing of historical data. And then it allows for what we call high latency layer or a speed layer, where this data can then be promoted up the stack for serving purposes. And then Kappa architecture is where the data is being streamed near real time, bounded and unbounded streams of data. So this is again very important when we build machine learning and AI applications because evolution is happening on the fly, learning is happening on the fly. Also, if you think about the learning, we are mimicking more and more on how humans learn. We don't really learn with very large chunks of data all at once, right? That's important for initially model training and model learning. But on a regular basis, we are learning with small chunks of data that are streamed to us near real time. Right, learning on the Delta. Learning on the Delta. So what is the bound versus the unbound? Unpack that a little bit. So what is bounded is basically saying, hey, we are going to get certain amounts of data. So you're sizing the data, for example. Unbounded is infinite streams of data coming to you, right? And so if your architecture can absorb infinite streams of data, like for example, the sensors constantly transmitting data to you, right? At that point, you're not worried about whether you can store that data. You're simply worried about the fidelity of that data, right? But bounded would be saying, I'm going to send the data in chunks. You could also do bounded where you basically say, I'm going to pre-process the data a little bit just to see if the data is healthy or if there is signal in the data, right? You don't want to find that out later as a training, right? So you're trying to figure that out upfront. But it's funny, everything is ultimately bounded. It just depends on how you define the unit of time, right? Because you take it down to infinite zero and everything is frozen. But I love the example of the autonomous cars. We were at the event with, just talking about navigation just for autonomous cars. Goldman Sachs says it's going to be a $7 billion industry. And the great example of the use of the two of these systems working well together, right? Because is it the car sensors or is it the map? That's right. And they said, well, you want to use the map and the data from the map as much as you can to set the stage for the car driving down the road to give it some level of intelligence. But if today we happen to be paving lane number two on 101 and there's cones, now it's the real-time data that's going to train the system. But the two have to work together and the two are not autonomous and really can't work independent of each other. Yes, it makes perfect sense, right? And why it makes perfect sense is because first the autonomous cars have to learn to drive. Then the autonomous cars have to become an experienced driver. And the experience doesn't cannot be learned. It comes on the road. So one of the things I was watching was how insurance companies were doing testing on these cars and they had a human driving car and an autonomous car. And the autonomous car with the sensors were predicting the behavior. Every permutation and combination of how a bicycle would react to that car. It was almost predicting what the human on the bicycle would do and jump in front of the car and it got it right 80% of the cases. But a human driving a car, we're not sure how the bicycle is going to perform. We don't have peripheral vision and we can't predict how the bicycle is going to perform. So we get it wrong. No, we can't transmit that knowledge. If I'm a driver and I just encountered a bicycle, I can't transmit that knowledge to you. But a driverless car can learn, it can predict the behavior of the bicycle and then it can transfer that information to a fleet of cars. So it's very powerful in where the learning can scale. It's such a big part of the autonomous vehicle story that most people don't understand that not only is the car driving down the road, but it's constantly measuring and modeling everything that's happening around it, including bikes, including pedestrians, including everything else. And whether it gets in a crash or not, it's still gathering that data and building the model and advancing the models. I think people just don't talk about that enough. I want to follow up on another topic. So we were both at Grace Hopper last week, which is a phenomenal experience if you haven't been go. I'll just leave it at that. But Dr. Fei-Fei Li gave one of the keynotes and she made a really deep statement at the end of her keynote. And we were both talking about it for each one of the cameras on, which is there's no question that AI is going to change the world and is changing the world today. The real question is who are the people that are going to build the algorithms that train the AI? So when you sit in your position here with the power both in the data and the tools and the compute that are available today and this brand new world of AI and ML, how do you think about that? How does that make you feel about the opportunity to define these systems that drive the cars, et cetera? I think not just the diversity in data, but the diversity in the representation of that data are equally powerful. We need both, right? Because we cannot tackle diverse data, diverse experiences with only a single representation. We need multiple representation to be able to tackle that data. And this is how we will overcome bias of every sort. So it's not the question of who is going to build the AI models. It is a question of who is going to build the models, but not the question of will the AI models be built because the AI models are already being built. But some of the models have biases into it from any kind of lack of representation, like who's building the model, right? So I think it's very important. I think we have a powerful moment in history to change that, to make real impact. Because the trick is we all have bias. You can't do anything about it. We grew up in a world in which we grew up, we saw what we saw, we went to our schools, we had our family relationships, et cetera. So everyone is locked into who they are. That's not the problem. The problem is the acceptance of bringing some other. That's right, that's right. And the combination will provide better outcomes. It's a proven scientific fact. I very much agree with that. I also think that having the freedom, having the choice to hear another person's conditioning, another person's experiences is very powerful, right? Because that enriches our own experiences. If even if we are constrained, even if we are like that storage that has been structured and processed, we know that there's this other storage and we can figure out how to get the freedom between the two point of views, right? And we have the freedom to choose. So that's very, very powerful, just having that freedom. So as we get ready to turn the calendar on 2017, which is hard to imagine, it's true, it is. If you look to 2018, what are some of your personal and professional priorities? What are you looking forward to? What are you working on? What's the top of mind for Janet George? Right now I'm thinking about genetic algorithms, genetic machine learning algorithms. This has been around for a while, but I'll tell you where the power of genetic algorithms is, especially when you are creating powerful new technology memory cell. So when you start out trying to create a new technology memory cell, you have materials, material deformations, you have process, you have 100 permutation combination, right? And with genetic algorithms, we can quickly assign a cost function and we can kill all the survival of the fittest. All that don't fit, we can kill, arriving to the fastest, quickest new technology node. And then from there, we can scale that in mass production. So we can use these survival of the fittest mechanisms that evolution has used for a long period of time. So this is biology inspired. And using a cost function, we can figure out how to get the best of every process, every technology, all the coupling effects, all the master effects of introducing a program voltage on a particular cell, reducing the program voltage on a particular cell, resetting and setting and the neighboring effects. We can pull all that together. So 600, 700 permutation combination that we've been struggling on and not trying to figure out how to quickly narrow down to that perfect cell, which is the new technology node that we can then scale out in the tens of millions of wafers, right? So getting to that spot. You're going to have to get me on the whiteboard on that one, Janet. That is amazing. Smart lady. Thank you. Thanks for taking a few minutes out of your time. Always great to catch up and it was terrific to see you at Grace Hopper as well. Thank you. I really appreciate it. I appreciate it very much. All right, Janet George, I'm Jeff Frick. You are watching theCUBE. We're at Western Digital Headquarters that innovating to fuel the next generation of big data. Thanks for watching.