 Welcome back to EMC World and the Data Science Summit here in beautiful Las Vegas. I'm Jeff Kelly with Wikibon, covering big data, business analytics, and we're continuing our coverage of big data and data science here in our spotlight segment. Today, in this segment, we're going to talk to two data scientists from EMC. We've got Noel Cio, as well as Derek Lin here joining us. Welcome, first time in the queue. First time here. Great to have you. So I thought maybe a good way to start was just tell us a little bit about yourselves and your role inside of EMC. So my name is Derek Lin. I'm the principal scientist here at Green Plum. Before I came to Green Plum, I was with RSA doing a lot of security and analytics work. And prior to that, I was working in computer speech and language recognition processing research. So overall, I think I have about more than 10 years under my belt. And this is what I'm doing. This is what I'm enjoying the best, doing security. I'm a senior data scientist here at Green Plum. I started in grad school as an applied mathematician and decided to explore what kind of interesting problems there were in the industry. So I did a few years at Fox Interactive Media, which actually was a Green Plum customer at the time and we were ingesting petabytes of data over there and then moved over to eHarmony to do some marketing research with there as well. And so mostly my focus here at Green Plum has been on the digital media space. Great, so I thought a good way to start was to let's define data scientists because we hear a lot, that term used a lot and we hear a lot of people want to call themselves data scientists because it's an in-demand position. But help us understand really what is a data scientist? What are the core functions? What is a data scientist to you? Yeah, so to me, a data scientist is someone who works with data on their daily every day for a problem like prediction, classification, clustering, all this kind of problems using principle mathematical way of attacking the problem with a tool that is appropriate to the data size. But that would be my definition, what do you think? So I think people talk about data scientists in not only in the hard skills but also in the soft skills as well, right? So it's a given I think that data scientists are mathematicians and statisticians and that there's a certain level of technical expertise that they have as well in terms of like for us coming from Green Plum which is a database company, people know SQL and people coming from other technologies, you Java programmers, et cetera. So there's that real familiarity in terms of just having technical expertise but there's also the ability to communicate those ideas, right? And so the difference between a data scientist and a researcher who is really siloed in a corner and just tinkers with interesting problems is our focus primarily is on real big business problems, big data problems and really being able to translate what a business problem is and really try to find what the mathematical problem is underneath there doing the math which is interesting to us and the big puzzle and that's what really drives us and gets us all excited and then translating back to the business so that it sees a light a day and really has actionable. Right, yeah, so communication is a big part of the job it sounds like, so you've got to both communicate as you said on the front end understanding what the business problem is and then once you've done your analysis kind of communicating and telling a story to the business so they can kind of understand. How do you go about facilitating that exchange of information between the business and the data scientist team? From what point of view? Well from your point of view, if you were to talk to from the business side and perhaps you're not getting the kind of feedback you want, how do you kind of elicit, how do you have that conversation and then help them understand your role and the information you need to do your job better? I think for me it's not, sometimes people come in and they think that they know what solution they want before they even ask us the right question and so sometimes for us it's really rather than what is it you want us to build for you is where are your pain points? So what problems you have not how you think you should solve them already but really what problems you have and really getting to the base of how those problems manifest itself and so that we can, as you said, bring it to the mathematical part of it and really understand it in our terms how we can go ahead and solve them. Great, so why don't we back up a little bit and tell us a little bit about how you guys got into the business of data science because we've got a lot of perhaps aspiring data scientists watching today. We'd love to learn from your experience how you kind of got into this business and what brought you here? Yeah, I think there's no predefined or standard way of becoming a data scientist, right? Because there's no such thing as a certified data scientist. There's no formal education training. But I feel anyone who has a good, strong theoretical understanding of some of the mathematical disciplines, they could become a data scientist. So for myself, I started out with a master degree in electrical engineering, in specializing in signal image processing, doing image reconstruction back then. And after I graduated, I became a speech scientist working in speech signals, processing human speech, to facilitate speech recognition problem. And then I got into fraud, fraud area, fraud detection, try to detect frauds in financial service sectors. And then after that, I become, today I'm moving on to Green Plum doing big data analytics over many vertical domains. So I think to me, I mean, I didn't start out my career thinking I'm going to become a data scientist, but rather it's a progression of events. I get where I am today. How about yourself? Mine, I feel like it was almost an accident, actually. And so I was in grad school about five or six years ago pursuing degree in mathematics. And this path of do I want to stay in research, or do I want to go into the industry? And being an applied mathematician, I was encouraged you can, you should really go into the industry and see what's out there. And so I asked like, what can I do? And the options they said, well, you could be an actuary. And that was the only options I was actually given. And so I remember being a grad student and going to technical job fairs and having my resume to every single company. And I said, I'm a mathematician. What could I do for you? And I basically got shot down entirely, which was really interesting for most companies. And they said, you know, I don't know what we could do with you, can you code? Are you a coder? Do you have a CS degree? And I'm like, no, I'm a mathematician. What's interesting? And I was actually an intern at Fox Interactive Media over in the quality assurance departments. And they said, you know, we really like your skills. You know, it's something that I think the, you know, you should look into the internet. You should look at, you know, we have really big problems and even in QA, I think we could use your skill set. And it was by chance that they were starting a research, the research group had found out about me and they said, hey, your skills are exactly what we need. And now that you have seen the whole software life cycle and you understand what goes on in all parts of this organization, this is exactly the perfect place for you. And so it was kind of in a random way that it got me there. But I think now that there's so much, as you said, like the growth and the need for a data scientist has grown that people in grad school won't have the problem I had now, right, or had then. And so if you say to students now who are in grad school, you can say, you can be a data scientist. You can be an analyst here. All the key words when you do your searches that you should be able to go and search and find. Yeah, and again, I think that the key thing is to have that strong and broad and understanding of mathematics, probability theory, machine learning and all that. And that's your foundation to become a data scientist. What about the mindset in terms of being exploratory, be willing to kind of experiment, be willing to fail even when you're doing your analysis? Because it is a very iterative process. Talk about that kind of personality type that you think makes a good data scientist. Yeah, I think each one of us, a data scientist is very curious, persistent when it comes to the problem and tend to be very creative too. Because the problem we are working at is a new problem. It's not a problem that way we can go to Google and search and find a ready solution, but rather something we have to create on our own. So. Yeah, I think for me I think that all data scientists have a puzzle junkie in them somewhere, right? Like so I'm the kind of person that if you give me something, it's like a challenge. Like you issue a challenge and like the gauntlet is thrown and so you, all data scientists would immediately just jump on it and say like I think, I don't know how to solve that immediately. If you knew how to solve it immediately when you saw it, it wouldn't be fun, right? And so part of the challenge and as Derek was saying, it's things that haven't been solved before. So it's interesting and it's neat. And so somebody who, it's not just a nine to five and five I can turn my brain off and suddenly I'll just deal with it tomorrow. It's one of those like it bothers you, it nags you until you figure out what you can do. At least get to that next step and the persistence in a sense of accomplishment when you actually finish the solution and find it's so interesting. Like the payoff is so great once you can finally achieve that. And I think it takes a specific personality type who really craves and desires difficult things and challenging things and to learn something every time that they do something. Right, there are some aspects of becoming a data scientist that seems to me you can train for but there's others that are part of your, personally part of your makeup. That be sound accurate? I will agree with that, yes, yes. So tell us, give some advice out there to some of the BI professionals and data warehousing professionals that are hearing a lot about data scientists. Some of them might be a little concerned like okay, do I need to have my skills here? Is this a threat to my job essentially? Is data scientists going to overtake what we do? And are they looking to maybe take that leap themselves? What kind of advice can you give them as they look to expand their skills? Because we'd also like to talk a little bit about the difference between traditional BI and data scientists where you're not looking back, you're trying to predict things and look forward. So what's some advice you might have for BI pros? So BI is more about descriptive statistics and the role we play here, we deal with the problem like predicted modeling, that sort of thing. And for that, you again, like what I was saying earlier, you need a very strong foundation in linear algebra, statistics, probability theories. So having this core will allow you to clearly see the problem and see the nature of the problem and propose a solution accordingly, right? So the advice I would give is one, don't worry, your job is safe. Like I don't think the emergence of data science is at all negating the need for business intelligence. You know, business intelligence answers just seem me to question of what happened. You can't start planning for what will happen if you don't already know what happened. So the need for reporting and really smart reporting that's emerging out of all the innovation from data science is not going away and I think it's expanding. So you know, don't worry. So one piece of advice, don't worry. The second is also find what about BI is interesting to you, right? And so in terms of if it's really the solving of the business problem, maybe start thinking about expanding your skills there or if you really like the mathematics, start expanding your skills there. If you really like getting into the code that's behind it and really understand the algorithms, move there and really find the parts that are adjacent to your skill sets. And I think that would be kind of the gateway for them to really enter into the field of data science. Well, we talked in our intro, Dave Vellante and I about the lack of data scientists out there right now. How do you think we're going to fill that gap? Is it going to be training people to become data scientists or is it also kind of improving the technology, making it easier, abstracting away some of the complexity of the tools so that you kind of lower the barrier to entry, a little bit of both? I mean, what is it going to take to kind of fill that need that we're seeing right now? All of the above. It's one of those that I think all of those things will kind of ease the gap that we have right now. When you think about, you know, your smartphone in terms of there's so many things that you can do on it now and you're benefiting from somebody else's innovation and something that was painful before. So in talking about like lowering the barrier to entry certainly, there are things, especially as we said earlier with BI, making better BI and smarter BI, well, you know, that will be one aspect there. I think the people who have been hiding in the crevices of organizations who are secretly data scientists, certainly we would encourage those to come out of the woodwork. But I think we really, my personal focus would be in educating the next generation of data scientists, like really enforcing that you need to be a multifaceted kind of person and talent in terms of all the skills that you need in school and that's the best place for you to learn and fail. Right, it's one thing, it's learning to fail in school, like, you know, in terms of like iterative processes and understanding like clean data sets do not exist in the real world. So if you try to run any of these algorithms on something real, it won't work. And so learning all of those skillsets and how to communicate as you said across different teams and really emphasizing that in school would be a really great place. But Dad, I mean, what do you think we need to do to fill this gap? Does that... I think this should be a stronger push in terms of engineering education, especially in this country. Yeah, bridging the educational gap with a heavier emphasis on engineering, computer science and math, that's certainly a requirement. Would you like to see some more formal data science programs at the university level, both the undergraduate and graduate level? I mean, is that something that's called for, or do you think it's just a matter of more people getting into the field of mathematics and statistics and then kind of letting their career path kind of take them to that data scientist route? I mean, would you like to see, every major university have a program to actually train data scientists and make that a formal kind of program? Yeah, why not, right? Why not, yeah, that's the dream. It's always good. Great, all right, great. Well, thanks so much for joining us today. I think we learned a lot, a lot of great advice for data scientists out there. We'll be right back in just a few minutes. We're going to continue the data science spotlight here at EMC World. I'm going to be joined by Dave Vellante and we've got several more segments coming up for you, so stay tuned.