 in the business and we are talking about the big data, data science, Albarian said a couple years ago that the statisticians, the math geeks, Jeff, are gonna be the hot new job in the coming decade and that was prescient that seems to be coming true. This is the data science spotlight sponsored by EMC Green Plum Division. Now, EMC has a data science summit as an adjunct to this show. The first year was last year, it was very successful. A lot of good content, a lot of good data science content, very thought leading and thought provoking. So, Jeff, we're going to talk about data science, we're going to talk about the market for big data. I want to start by just setting up the discussion that we're going to have. Mark Hopkins has a few slides that we're going to go through. So why don't you kick it off? We've done some pretty extensive research in this area, we size the market and the kind of forecast going on about five years. So we're showing that here, what I like to call an S-curve, something called an old guy curve, basically showing that the market today is about five billion, it's growing to over 50 billion by 2017. So that's some significant growth and you'll see in the next couple of years is a real spike in growth. And that's for very good reasons. I mean, the big data really cuts across vertical industries and use cases. So the idea that it's, the big data is going to be adopted by all types of industries and the key really is you need data scientists, you need a new breed of analytic professional to really take advantage of that data. I mean, there's the infrastructure layer where we're talking about Hadoop and the new databases we're seeing hit the market like Green Plum, like Vertica's and other. And then from there, you've got to actually take the analytics, take the data and perform analytics on it. So that's where this new breed of professional data scientists fits in. Yeah, so if we go to the next slide, we talk about data science, data scientists. One of the big themes of this event is transformation, it's transform IT business in yourself and one of the vectors for self as an option for people is data science. EMC's put a lot of emphasis on training for data science and they've got a data science course work you can go through. Actually, you were going to go through it but you had to travel. Did, still hoping to go through that at some point, Sam. So talk a little bit about what a data scientist is. You interviewed a couple today. Sure did. We've had Hilary Mason on before from Bitly. Talk about data scientists, what are they all about? Well, data scientists are all about answering questions based on data and essentially going where the data takes them. Kind of you compare that to the traditional world of BI where BI and data warehousing professionals might spend months trying talking to the business, trying to understand what questions they want answered and then they'll model a warehouse, a load of warehouse to basically answer those questions. A data scientist kind of takes the opposite approach. They don't know the questions they're going to ask. They don't know what they're looking for. They're explorers really. And that's kind of one of the key skills of course is being willing to experiment with data, take it wherever it brings you. So the key though is that there's a lot of new technologies coming on board that make big data possible and they're not always the easiest to use. So data scientists, they really have to have a mix of skills. And if you'll see, I think we've got the slide up now. Let's do the Larry Ellison thing since we have these fancy mics on. Next slide please. That's good. You know what I'm talking about Mark? What he does that is beautiful, right? Next slide. Okay, go ahead. So what are the skill sets? We've got this mash up going on, skill sets. Talk about that one. Well there's a few different skill set skill areas. One of them of course technical. You've got to know how to use the tools. You've got to have a solid background in statistics and math, computer science. I mean that's pretty much a prerequisite. But beyond that, data scientists really have to understand, to be really successful, have to have some level of domain knowledge, understand the business issues, and understand a bit of what the business is trying to achieve, what their priorities are. So that's critically important. So you've got the technical, the business skills, then communication. I mean one of the key jobs of a data scientist, storytelling. You've got to come up with the insights but then you've got to translate that into a story that the business can understand and translate that to workers who can actually take action based on your insights. And then from there of course, we've talked about it a little bit, the personality. You've got to be inquisitive, persistent. We've heard over and over again you've got to be willing to fail really because in fact much of what data scientists do is fail. They ask a question. They don't get quite the right answer but it leads to a new question. And it's a very iterative approach and you've got to be willing to take that approach and embrace it. Okay, so next slide please if you would Mark. And so Jeff, talk about the skills gap. You're saying here 140,000 to 190,000 more data scientists are needed. That's a McKinsey data point needed for what? And where is that globally and in the same context? Exactly. So the idea here is that we've got the tools. The tools are being developed. It's still an immature market but it's developing rapidly. But once you've got the infrastructure, the plumbing it, you've got Hadoopin, you've got your MPP database in. You're loading it up with data from all different sources internal, a lot external. Great, so now you've got a good solid infrastructure. From there, what do you do? Well, you've got to do the analysis and without data scientists with that mix of skills we just talked about, really you've invested in this platform that's got all this data in it but you can't really do much with it. So that's really what we're talking about here and I think the question is how do we fill that gap? Some people say well it's all about training. Might be about training current business intelligence professionals that kind of make that leap to data science. But I think it's definitely that. It's also about creating tools that are easier to use that abstract away some of the complexity of data science to kind of lower the barrier of entry to make it easier to manipulate data and do the job of a data scientist without having to have the level technical skills you need now. But anyway, you slice it. There's huge opportunity for anybody in this field. If you are a true data scientist you've got a lot of opportunity right now and you are in high demand. And if you're not, you can go acquire the skills to become one because there's clearly more demand than supply. So on the next slide we talk about some of the rock stars. You know all these people? A little tongue-in-cheek but they really are. You know with the conferences we've been to Strata and others and Hadoop World. Really the data scientists are mobbed at these conferences. So Jeremy Howard, Kaggle we've had him on. Jim Scientist, Kaggle. Hillary Mason, Jeff Hammabocker who arguably invented the modern day concept. And DJ Patel also helped kind of co-invent the term data science, data scientists. In the middle there's Hadley Wickham who's a university researcher who's done a lot in the space with R analytics and is now starting to do some work in the big data space. So yeah, there's a lot of opportunity. You know these guys are really the new rock stars in the industry and they're going to be in high demand for a long time. So I'm interested to see the EMC Data Sciences Summit, EMC Green Plum Data Sciences Summit. You weren't there last year I don't believe. No I wasn't. And you're going to attend this year. I attended last year. It was very good, extremely good content. Very high quality, you know kind of a rightly like I thought you know to put a good program together. I would imagine Data Science Summit is going to attract a lot of data scientists. I would think so and would be data scientists. Right, so the question is do you think that the traditional BI DW guys will actually wander over there or are they still a little fearful? I think there might be a little bit of fear there. Or is it competition, you know, or is this a bunch of nonsense? Well I think ultimately, Data Science, big data analytics has to live side by side with business intelligence. The need for BI and looking back, doing kind of historical analysis to see why something happened, that's still needed and that's not going away. Reports that the CEO wants on his desk every morning answering specific questions, that's not going away. So you know some BI professionals may view it as a bit of a threat, but I don't think you don't need to view it that way. You know whether BI pros want to make that move to data scientists, I think it can be done. We've talked a lot with Bill Schmarzo and the MC consulting about that and some of his thoughts on how BI pros can kind of make that leap. But it is a different mindset. I mean it's very much exploratory mindset versus dealing with a structured, well-known data set, answering questions that you've known well in advance, what the answers are, what the answers you want to get to are. So it's a very different mindset. So it's not simply adding a few skills around technology or tools, it's really about you've got to change your mindset. So let's say I want to get started in this business. Somebody really, I believe all this big data hype. I think it's real. In my gut it's telling me hey, there's something here. If I can figure out a way to package this stuff, add new value, how do I get started? Well that's a great question and we're going to be talking with some data scientists in the next segment about just that question. Kind of how they got started in the business and how advice they might have for others looking to make that leap from a BI professional or could be even a line of business worker who might, every kind of marketing department has one member who's just a whiz with analytics and Excel and sometimes it might even be that person who makes a good data scientist. Doesn't necessarily have the real technical background but you can learn some of those things. So there's many different paths as far as I can tell to kind of reach that data scientist. So this is the data scientist summit where we're discussing data science, what it is, why it's important, what kind of skill sets you need to really become a data scientist and it really is a very wide definition. I mean if you need 140,000 to 190,000 people, there's that much of a skills gap. There's a real opportunity for people who are mathematically inclined, who can do some development work, who love to munch up data. So it's all part of this whole transformation theme. EMC's leading that thought leadership. So we'll be following it here. Jeff, thanks very much for coming on theCUBE and setting up this spotlight. Keep it right there because we're going to bring on some data scientists, we're going to talk more about how to get started, what kind of skill sets are required, what kind of packaging of information and data they're doing at our survey show that that's the number one biggest challenge that people have is how to actually package and monetize that information to get value out of it. So we're going to talk to some people who are practitioners in data science who are actually doing that. And so keep it right there. SiliconANGLE.tv's continuous coverage live at EMC World 2012. You're right back.