 Live from San Jose, it's theCUBE. Presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. Welcome back to theCUBE's continuing coverage of our own event, Big Data SV. I'm Lisa Martin with my co-host, Dave Vellante. We're in downtown San Jose. This is a really cool place for your eatery. Come by, check us out. We're here tomorrow as well. We're joined by next one of our CUBE alumni, Seth Dober and the Vice President of Chief Data Officer at IBM Analytics. Hey Seth, welcome back to theCUBE. Hey, thanks for having me again. Always fun being with you guys. Good to see you, Seth. Good to see you. Yeah, so last time you were chatting with Dave and company was about in the fall about at the Chief Data Officer Summit. What's kind of new with you and IBM Analytics since then? Yeah, so at the Chief Data Officer Summit, you know, I was talking with the, one of the data governance people from TD Bank and we spent a lot of time talking about governance. Still doing a lot with governance, especially with GDPR coming up, but really started to ramp up my team to focus on data science, machine learning. How do you do data science in the enterprise? How is it different from doing a Kaggle competition or someone getting their PhD or masters in data science? What just quickly, what was your team composed of in IBM Analytics? So IBM Analytics represents, they give it as our software umbrella. So it's everything that's not pure cloud or Watson or services. So it's all of our software franchise. But in terms of like roles and responsibilities, data scientists, analysts, what's the mixture of? Yeah, so on my team, I have a small group of people that do governance and so they're really managing our GDPR readiness inside of IBM in our business unit. And then the rest of my team is really focused on this data science space. And so this is set up from, you know, the perspective of we have machine learning engineers, we have predictive analytics engineers, we have data engineers and we have data journalists. And that's really focused on helping IBM and other companies do data science in the enterprise. So what's the dynamic amongst those roles that you just mentioned? You know, is it really a team sport? I mean, you know, initially it was like the data science on a pedestal. Has it, have you been able to attack that problem? So I know a total of two people that can do that all themselves. So I think it absolutely is a team sport. And, you know, it really takes a data engineer or someone with deep expertise in there too. That also understands machine learning. To really build out the data assets, build out the, you know, engineer the features appropriately, provide access to the model and ultimately to what you're going to deploy, right? Because the way you do it is a research project or an activity is different than using it in real life, right? And so you need to make sure the data pipes are there. You know, and when I look for people, I actually look for differentiation between machine learning engineers and optimization. I don't even post for data scientists because then you get a lot of data scientists, right? People who aren't really data scientists. And so if you're specific and ask for machine learning engineers or decision optimization OR type people, you really get a whole different crowd. But the interplay is really important because, you know, most machine learning use cases you want to be able to give information about what you should do next. What's the Nespex action? And to do that, you need decision optimization. So, you know, in the early days of when we, I mean, data science has been around forever, right? We always hear that, but in the sort of more modern use of the term, you never heard much about machine learning. You know, it was more like stats, math, you know, some programming, data hacking, you know, creativity. And then now machine learning sounds fundamental. Is that a new skill set that the data scientists had to learn? Do they get that from other parts of the organization? I mean, when we talk about math and stats, what we call machine learning today has been what we've been doing for statistics for years, right? I mean, a lot of the same things we apply what we call machine learning today I did during my PhD 20 years ago, right? It was just with a different perspective. And you applied those types of, they were more static, right? So I would build a model to predict something. And it was only for that. It really didn't apply it beyond. So it was very static. Now when we're talking about machine learning, I want to understand Dave, right? And I want to be able to predict Dave's behavior in the future and learn how you're changing your behavior over time, right? So one of the things that a lot of people don't realize, especially senior executives, is that machine learning creates a self-fulfilling prophecy. You're going to drive a behavior so your data is going to change, right? So your model needs to change. And so that's really the difference between what you think of as stats and what we think of as machine learning today. So what we were looking for years ago is all the same, we just described a little different. So how fine is the line between a statistician and a data scientist? I think any good statistician can really become a data scientist. There's some issues around data engineering and things like that. But if it's a team sport, I think any really good pure mathematician or statistician could certainly become a data scientist. Or a machine learning engineer, sorry. I may just sit in from a skill set standpoint. You were saying how you're advertising to bring on these roles. I was at the Women in Data Science Conference with theCUBE just a couple of days ago and we hear so much excitement about the role of data scientists. It's so horizontal. People have the opportunity to make impact and policy change, healthcare, et cetera. So the hard skills of soft skills mathematician, what are some of the other elements that you would look for? Or that companies, enterprises that need to learn how to embrace data science, should look for someone that's not just a mathematician, but someone that has communication skills, collaboration, empathy. What are some of those openness to not lead data down necessarily? What do you see as the right mix there of a data science test? Yeah, so I think that's a really good point, right? It's not just the hard skills. When my team goes out and, because part of what we do is we go out and sit with clients and teach them our philosophy on how you should integrate data science in the enterprise. A good part of that is sitting down and understanding the use case and working with people to tease out, how do you get to this ultimate use case because any problem we're solving is not one model. Any use case is not one model. It's many models. How do you work with the people in the business to understand, okay, what's the most important thing for us to deliver first? And it's almost a negotiation, right? Talking them back, okay, we can't solve the whole problem. We need to break it down into discrete pieces even when we break it down into discrete pieces, there's going to be a series of sprints to deliver that, right? And so having these soft skills to be able to tease that out in a way and really help people understand that their way of thinking about this may or may not be right and doing that in a way that's not offensive. And there's a lot of really smart people that can say that, but they can come across as being offensive, so those soft skills are really important. I'm going to talk about GDPR and the time we have remaining. We've talked about in the past, clock's ticking, May, the fine's going to affect the relationship between data science, machine learning, GDPR, it's going to help us solve this problem. It's a nightmare for people and people don't, many organizations aren't ready. Your thoughts? Yeah, so I think there's some aspects that we've talked about before, how important it's going to be to apply machine learning to your data to get ready for GDPR. But I think there's some aspects that we haven't talked about before here and that's around what impact does GDPR have on being able to do data science and being able to implement data science. So one of the aspects of the GDPR is this concept of consent, right? So it really requires consent to be understandable and very explicit and it allows people to be able to retract that consent at any time. And so what does that mean when you build a model that's trained on someone's data? If you haven't anonymized it properly, do I have to rebuild the model without their data? And then it also brings up some points around explainability. So you need to be able to explain your decision, how you used analytics, how you got to that decision to someone if they request it, to an auditor if they request it. You know, traditional machine learning that's not too much of a problem, you can look at the features and say, these features, this contributed 20%, this contributed 50%, but as you get into things like deep learning, this concept of explainable or XAI becomes really, really important. And there were some talks earlier today at Strata about how you apply machine learning, traditional machine learning, to interpret your deep learning or black box AI. So that's really going to be important, those two things in terms of how they affect data science. Well, you mentioned the black box. I mean, do you think we'll ever resolve the black box challenge or is it really, the people are just going to be comfortable that what happens inside the box, how you got to that decision is okay. So I'm inherently both cynical and optimistic. But I think there's a lot of things we looked at five years ago and we said there's no way we'll ever be able to do them that we can do today. And so while I don't know how we're going to get to be able to explain this black box, you know, this XAI, I'm fairly confident that in five years this won't even be a conversation. Yeah, I kind of agree. I mean, somebody said to me that they will, it's really hard to explain how you know it's a dog, right? But you know it's a dog. Well, you know it's a dog. And so we'll get over this anyway. I love that you just brought up dogs as we're ending. That's my favorite thing in the world. Thank you. Yes, you knew that. Well, Seth, I wish we had more time. Thanks so much for stopping by theCUBE and sharing some of your insights. Look forward to the next update in the next few months from you. Yeah, thanks for having me. Good seeing you again. Nice meeting you. Likewise. We want to thank you for watching theCUBE live from our event, Big Data SV down the street from the Strata Data Conference. I'm Lisa Martin for Dave Vellante. Thanks for watching. Stick around. We'll be right back after a short break.