 Live from the Fairmont Hotel in San Jose, California, it's theCUBE at Big Data SV 2015. Hey, welcome back everyone. We are here live in Silicon Valley, wrapping up day two, getting down to the end of the wire here for three days of wall-to-wall coverage in Silicon Valley for Big Data SV, Big Data in Silicon Valley is part of, in conjunction with Hadoop World Stratoconference, all part of the Big Data Week we're putting on here with the community. I'm John Furrier with Silicon Angles theCUBE, our flagship program. We're out to the events that extract the signal from the noise, my co-host Jeff Kelly, Chief Big Data Analyst at Wikibon, our next guest, David Smith, who's the Chief Community Officer at Microsoft with Revolution Analytics. Now, recently joined with Microsoft of a big announcement acquisition of Revolution Analytics. Congratulations, welcome back. Thank you. Changing jerseys, got the same number, name on the back. I'm not officially a Microsoft employee yet, it's coming up very soon. Yeah, I'm really looking forward to it. Technically, it's got to close. Yeah, all those details. Congratulations. Thank you. It's so funny, I was so excited when I saw the news, I wrote a post, it was fast as I could write. I usually don't do that much anymore, but I'm like, oh, they just like, why are they R? And then the R community, R is not part of Revolution Analytics, so we've been clarified. So that's out of the way, but the success of analytics in R has been tremendous. We've talked many times on theCUBE here. So what's the update, give us the update. So how did it all go down? I'll see, R's, you guys have been successful, big growth, did they just come in saying, hey, we're going to take you off the table? What happened, give us the update. Well, I think like everybody in Microsoft just must have seen the rapid growth of over the last five or six years. It's gone from being a language that was really only known in the academic sector, today being the lingua franca of data scientists everywhere. And you've just seen the rise of data science. I mean, data science, that has been the theme of the conference this year. Every year, it's the year of the data scientists, right, every year. Yeah, but now it's about the applications and the impact. I mean, we have a chief data scientist for the US government now and talking about data science applications for a fashion, for manufacturing, you name it. We've gone from talking about the algorithms and the data to actually how they're applied. And then for me, that's the big change. And I think companies like Microsoft are recognizing that this is the way that companies are competing today and really want to be a part of that. And you think about the incumbent kind of company that Microsoft is, certainly the US government. You have DJ Patel, you have Megan Smith, you have the VM, we're a guy over there now. We've got a tech crew coming in. You know, Obama on the big screen here in the event, a big day at SV and Stratoconference going, hey, you know, this is a new era. I got to say that, that for me personally was so cool. Like, you know, I grew up being a statistician where you tell people you're a statistician at a party and they back away. And now to have the president of the United States coming up and giving a shout out to data scientists. I mean, that's just so awesome, it really is. It really exposes some opportunities to create some efficiencies where it was hard to do before. Certainly cloud, mobile, social, a lot of stuff is happening. But analytics really is a value, harnessing value, delivering value. Obama also was quoted just last week when he was in Silicon Valley, all kids should code. We had one person came on theCUBE and said, hey, you know, I'm telling my kids, you know, get your hands on R or Python if they're really techy. So data, this is going to be a new tool that like a hammer and nails to build out things is a key tool like we used to use spreadsheets and whatnot. So it's pretty cool. The other thing that's interesting besides the government being efficient and more transparent using the data is Microsoft. Satya Nattella has come on, made some big moves. We covered the open compute environment. They donated a bunch of stuff to open source. A lot of moves are Azure in the cloud. You've seen some very positive movement there and now embracing R. So I got to ask you, the new Microsoft, if you will, we're under Satya Nattella. Is it, what's your experience been? And I see they're embracing it through the acquisition, but what's the vibe there? I mean, is it like amazing? Is it like total transformation? I got to tell you, it's like nothing I expected. I've got to confess a little bit. I was a little bit skeptical when I heard that Microsoft was interested in us and open source company. But as I dived in a little closely, it's just been really incredible what Microsoft done just even over the last year when it comes to open source and innovation. When you have a look at opensourcing.nats, they had already integrated R into the Azure framework, supporting Python, you name it. And then you put on top of things like putting Office on iOS or the HoloLens goggles. I think Microsoft's really got its mojo back. So I'm really excited. I think the team, it seems to me like it's all been like, okay, this little transitions over, Balmer steps aside, now, okay, everyone's been itching to get out and be competitive. Microsoft's culture is pretty hardcore. You look at the old DNA, so it seems like that's back. So that's cool. So with all that, I got to ask you another question. I asked the big news this week is the open data platform where a lot of people are getting behind this. Community has been a big part of open source. So you got to get your take. What is going on with community? We're seeing and we're evaluating and analyzing and doing research on this new era of community governance, value creation, open source is now and it's kind of like whatever nth generation, us old guys remember the value in the first couple generations, but now it's like full on mainstream. Everything's out in the open. OpenStack's being successful. Even Cloud Foundry was successful. And now this ODP. What's the new governance model? What is the new community model? Is there anything you see that you can share with us? Like why is this working differently? Everyone's been, you know, naysayers and saying now it's never gonna work, but yeah, it's working. What's the new thing? I think what the big players have now recognized that when it comes to open source especially, community is the value. You know, when we've talked about open source in the past, it's really been about the technology, about the innovation, about the development, about its freeness. But really, and you know, I've said that there's a lot when it comes to, ah, it's the community itself that provides a significant portion of the value when it comes to open source software. Not just in terms of the social connect, when it's the networking, the marketing, the free marketing, essentially you get through communities. But the communities themselves drive the value. They contribute code. They contribute documentation. They contribute expertise. And most importantly, they provide, let me put it as a labor pool. You know, these are people from the community that you can recruit to help you develop and promote your own applications. And you know, we've seen that as one of the hugest drivers for R itself, you know. It's not just that it's an amazing language for developing data science applications. It's for companies when they're looking at putting together a data science project, they look at where are we going to find the people to actually implement this? And where do they look to? They look to the community. What has the biggest community when it comes to data science? It's R, so it's really a no brainer. How's that transferring over into the mainstream? Obviously you mentioned academic, the values there. Any bumps in the road, any revelations, any magnified learnings on your part on things that you've observed and how it's taking off now? I think experience is one part of it. I mean, especially when it comes to open source languages, which are by definition young, you're looking at younger members of the community that are going to be part of that ecosystem. What are they like? What are the young guns like? I mean, what's the psychology like? They don't have the gray beards like me around the IT experience and all those lessons learned over the years, but what they do have is a fresh set of values when it comes to working in an IT infrastructure. They're not, it's just natural for these young guns to take an open source program here and another package here and a data stream here and connect those all together. Whereas I think in the past, there was this expectation that you do all that within a single technological stack. And it's just a very different way of doing things. Are the young guns also like what? Provisioning servers? Yeah, exactly. About a week. IT is giving us a problem. Let's go around IT and do it in the ground. All those things like that. But what that means is that companies have recognized that by freeing up some of these constraints that people can get so much more done, so much faster and be able to innovate in ways that simply weren't possible without being able to work in this type of open source realm. I want to go back to something you mentioned earlier this being kind of the year of data science where data science is getting a little bit more attention this year versus the past years. And I think you're right, because we were talking about the conversation was much more around kind of the plumbing and the infrastructure and that conversation is now moving forward and now you're getting to the data science. What are we going to actually do with all this data to drive some value? Was it frustrating for you as a member of the art community that the community, the larger open source big data community wasn't moving faster and we were spending too much time on the plumbing? What was that like? Were you like charming at the bit to like, hey guys, let's talk about what the real value is and just the data science? I think I was even seeing it the other way. I think it spent such a long time working in sort of the trenches of statistics and data science doing all this really cool stuff that was really in a lot of cases only recognized in the back office and was only exposed through several layers of my team before, so nobody even realized that stats or advanced analysis was part of the picture. So I've been looking at a different way and just seeing the growing exposure of data science and the value it provides over the last five years has just been so tremendous and just generating that recognition. One of my stories is one of the things I'm most proud of as I've worked as the community officer at Revolution Analytics for R is that the work we've done to expose the benefits of R and data science in general has raised the salaries of people like me 20 years ago who were doing stats degrees or computer science degrees and now it's one of the most sought-after careers out there today. So I think it's just a great recognition that companies... You want to rewind the clock and go back to that time? Oh yeah, like if I'd had... You could walk into the parties and be like the guy. Yeah, if I'd had Silicon Valley salaries when I was back at school this would have been a very different conversation. But it's just a real recognition of the value that companies are now putting into getting information out of data and the investments they're making in the workforce to do that. Talk about the salaries, it's interesting. We had this conversation earlier with one of the PhDs up here, from simply hiring and this kind of a polarization going on between the purists, let's call them the purists, and people who are recycling themselves as data scientists. Talk about that, it's natural progression. Certainly people are retooling. We came up yesterday at our panel at our event last night. The big companies are laying off but re-hiring younger and getting leaner for this new era. So we do need more data scientists. They're about the definitions through that and clarify that and people don't know if they're really looking at a data scientist. Is there a test? Is there a degree? Is there a look? Is there a beard? I mean, come on, what is it? I mean, come on, is it? I mean, there's definitely been scope creep when it comes to resumes and that data scientist title. I mean, for me personally, I think it combines, you definitely got to have the maths and stats. Being able to apply these advanced statistical methods to data is a core part of being a data scientist. There's also the computer science chops. It's not just about having the math, you've got to be able to implement it in a computer system and do that successfully. And then you've also got to have some level of business acumen. You've got to know what is the problem you're trying to solve and what data and algorithms can be applied to that problem in a way that's really- Some more integrated discipline mindset. Exactly. Now, that's kind of the data scientist unicorn. It's hard to find all of those skills in one person. These days, people tend to be putting together the teams, but for somebody who perhaps just only knows SQL and is calling themselves a data scientist, that's a bit of a stretch for me personally. But I think where we're going on that front is not so much these days about hiring data scientists. Companies have kind of figured out that that's a core part of the institution. I think the next stage is going to be for companies to figure out how to actually take the research and the implementations of those data scientists are doing in the lab and effectively operationalize them throughout the organization. How can we make data science for the masses? Now, data science for the masses doesn't mean that an MBA is going to be implementing a generalized boosting algorithm on a data set. But what it means is being able to expose the company-specific applications that data scientists develop and make them available to the MBAs. They can apply them to the data sets of their choice in a nice interactive environment. Absolutely. But you mentioned a moment ago around the whole idea of this team concept to data science. We had on the chief data scientist from Simply Hired earlier today and he was talking about the same thing that the idea of finding that unicorn is challenging. And so the method that they've chosen is really this more team approach. Is that something you're seeing proliferate? All the time, all the time. It's a matter of matching the skills of the different people around the stats, around the computer science, around the business side, and having them work together. And that in itself is something that's relatively new because these were operations that were institutionalized in entirely different departments at one time. But now by bringing those skills together, people would understand the problems they need to solve, apply these advanced methods to them, and then deploy them into the enterprise. And in terms of operationalizing data science, which I think is a great way to look at it, because ultimately, that's where a lot of the value's going to be created. You've got to find the insights, but then you've got to apply them in the real world, right? So what are some of the ways that you see people doing that, the kind of cutting-edge approaches people are taking to actually moving data science from the back room to out front, whether that's such a customer or employees or whoever the case might be. Yeah. Well, a lot of that has been through production operationalization of data science algorithms. You have a bunch of data scientists out there in the back office, they're working with big data, in Hadoop, from the databases, in a database, they generate a predictive model, and then that gets run in real time in some kind of operational system. So you're kind of just going right from the developers, the data scientists, right to the consumer. I think where I see things going is that there's going to be more of an interim step there, in that the data scientists can develop modules. For example, they have the expertise around doing predictive maintenance for a big manufacturing company, but that manufacturing company has lots of different products that are in the field that are going to break at some point. So being able to deploy those algorithms into an application that an operations manager can then use to apply it to the aircraft engines and the aircraft wheels and the tarmac and everything else, so that they're not a data scientist themselves, but they can lean on the expertise of those data scientists. But do it in a flexible way where you're not building siloed apps every one after another. Yeah, so it's a really cool app that I've been taking a look at recently, obviously, has been the Azure ML Studio framework. It's cloud-based, it's nice drag-and-drop type of environment where data scientists can publish algorithms written in R or Python or anything else, and then somebody who can just work in a drag-and-drop environment can apply those to the data sets, rather, they can select from the environment. I think that's a nice hybrid way of doing that, exposing the expertise of those deep data scientists to the people that actually need to actually apply those models on the ground. So talk about the competition now, honestly, you're at Microsoft, you're at the big company. We were talking earlier about the difference between systems of engagement, systems of record, all the stuff that's going on and big data is a lot of land grabbing going on. The middle layer is going kind of open, you're seeing that with these announcements. But underneath the covers, conversion infrastructure, cloud, and then the apps all kind of make sense. The apps have clear visibility, conversion infrastructure is making sense. R is not owned by a revolution analyst. It's an open framework source, all that good stuff. We heard Oracle saying, hey, we're getting behind R. So everyone's getting behind R. Everyone's at IBM getting behind R. R is R. R is a beautiful thing. So what does that mean for Microsoft? What edge do they have with you guys? Can you share that? Because some people think cloud-era is Apache. Fort and Rust would differ, but it's open. So everything's happening in the open. Is R the same way? Is there any nuances that we need to know about and how does Microsoft compete with you guys? Yeah, I think it goes back to community. I mean, R is not just the bits and bytes. It's not just the language engine. It's the community as well. And I think that that's what Microsoft is looking to revolution analytics for is that community aspect, being able to bring the R community in general into these other domains where R has been embedded into other applications and being able to giving them a venue, the R community members for exposing the applications and the code that they develop into more and more frameworks. And any changes you see coming with all this market movement? I see the community is still growing. There's no real problems that we see out there. I mean, community has been growing so strongly. And in fact, R is even increasing in popularity. It's always been hard to measure the popularity of open source applications, but I've seen at least three surveys in the last six months. The rank R is the 12th most popular language, all programming languages, including general purpose. I'd love to see that survey and with R's visualizing it for us, but that brings it home. Most of them do. I don't know how to compliment him. Hope it's a good survey. What about the innovation side of it? Okay, so what's coming on? What do you see on the innovation side in the community? What's on the radar? What do you see as opportunities, white spaces, share your perspective there? Yeah, well, I think more and more of the sort of deep stats algorithms that previously existed only in the research side. What we did today talk about deep learning is really just statistical algorithms that we've known about for a long time applied to new domains. The technical aspect of that, of course, is applying these algorithms to these big data sources. There's some challenges there and being able to implement them. So I see a lot of research in those areas about taking these algorithms that we've known about for a long time and retooling them to run in Hadoop or in databases. And I think that's definitely going to continue. So Jeff, I've got to ask you, what do you think about this? You're the analyst. I mean, Microsoft, I mean, we were talking about this on the news. I mean, big move, what's your take on your research yesterday you reported kind of like the rich getting richer? Is it? Well, yeah, we're seeing acquisitions. Revolution Analytics is an example of that. I mean, I think this is an illustration that the big companies out there, the Microsofts, IBMs, SAPs of the world recognize that big data is going to be critical to their future success. They have to align with the tools like and approaches like R that are gaining popularity with data scientists. These are the people that are gonna drive this shift from kind of, you know, I don't call them dumb applications, but to intelligent and automated, intelligently automated applications using things like machine learning algorithms built with R, et cetera. So, you know, I think you're probably gonna see more of the acquisitions in this space because that's, because the other thing is that it's part of a larger stack, right? So it's not just the algorithms, not just the languages. It's the underlying infrastructure. It's the way you actually can consume that in the enterprise of whether that's cloud, and then ultimately you're building applications that you're gonna roll out on your mobile device. So when you look at mobile, analytics, cloud, and social, those are the three, those are the big four areas, right? I think the big companies are recognizing that. The challenge, of course, is, you know, it's gonna be a painful transition for some of them. I mean, we've heard about, you know, potentially pretty massive layoffs maybe coming at IBM. So they're kind of going through this transition of going from the legacy world to this new world, and it could be a painful transition for some of these big companies, but for the ones that can come out the other side and make the smart moves going through that transition, I think there's massive opportunity. Yeah, and I think IBM gets beating up a bit on the Wall Street, because obviously their performance has been down with this transition, but in the long game, they're tooling in the right area in my mind. Cloud, I mean, they have this work on Bluemix, certainly they are running, but their analytics vision is pretty solid. I mean, I like insights. In fact, we interviewed Michael Jordan from Berkeley recently on the ground segment of theCUBE in San Francisco. I asked him about the difference between math and computer science, and he says, oh, it's like, it's that one. Exactly what you said, you know, there's a creative aspect of it that you got to put that third dimension, but yeah, certainly, if you got computer science chops and math and you got some sort of creative visibility, you're good, but he says, there's a big difference between confidence, you know, sentiment bar charts and like feeling good about stuff versus getting answers to things. And I think that's the focus we're seeing, so I want to get your comment on that. This answers, this insight engine is the holy grail, which is using predictive analytics, all kinds of analytics to get what you want. Talk about that. So I think what we'll see is a move from exploration into applications. So, you know, a lot of what we've been doing with big data has been just really about getting access to data, exploring that data, visualizing it, just as you said, even being able to look at big data has been a challenge both technically and kind of philosophically. What's the right way to look and explore data? And I think we've learned a lot of great lessons out through that process, and now we're ready to actually turn those observations into applications. And I think what we'll start to see is you start getting that magic factor at the consumer level. I saw a presentation this morning from a guy at Toyota talking about a very new future when your car will tell you before it has a fault and be able to get these things corrected before things actually happen. And from the point of view of the consumer, you're not going to understand all the sensor data that led to that or the models that led to that prediction, but when you can actually avoid a breakdown, that's going to make it be kind of a difference. What I think is interesting in Microsoft is, obviously Microsoft is strong in applications. So they can bring in some of that big data magic, if you will, and infuse that in some of the business applications that people are accustomed to working with. There's a huge opportunity there. Exactly. And what I like about what Nadella is doing is really, he's really adopting what we talked about earlier today with Bill Schmarzo around open business models. You've got to open up to the rest of the ecosystem, whether that's examples of that or working now, bringing Revolution Analytics in an open source company, making office available on other platforms, those kind of things. When you start to see that, combined with the acknowledgement and the understanding that big data is going to play an important role, I think there's a huge opportunity for Microsoft. How about security now? Because this is something that's come out. I mean, I know you're not a security expert, but in visualization, that's been a big part of seeing that kind of data set. So security brings up the notion of, hey, I want to see things. That's the instrumentation to applications. That's been height. Have you seen anything there in the R community with security? Is there any updates there? That was just recently come up in our last segment with our data scientist. Not so much on the security side, but I think it's related to a similar topic, which is around data ethics. Especially once you start working with very, very fine-grained data sets, it does require a certain amount of care and knowledge, like even when it comes to visualization, to not to expose personal information, for example. This is something that, again, going back to statistics, the Census Bureau has learned about over the years. This is why you can't drill down into individual zip codes in census data. And I think this is something that, again, that data scientists learn as part of their education is how to ethically and constructively handle data like that in a way that you don't expose ethical, confidentiality and security issues. And that's why I think the expertise of data scientists is so valued by companies, specifically to avoid those types of issues. Well, I think one of the things that it comes up to is this, are we in a bubble or a wave? And it always comes up, certainly the evaluations are out of control. We had a VC panel yesterday. And I argue that it's bubble-ish, but it's not super-bubble, it's like dot-com. I mean, actually the dot-com bubble, everything that they talked about happened, just different valuations. But there's waves of other stuff coming. So I want to get your perspective as someone who's been around the block a few times and seen some waves. What waves are coming? You can mention discovery to, or exploration to discovery into applications. Okay, what other waves are coming in big data? Hadoop seems to be a done deal. We're like, okay, we're done. We agree. Hadoop's good, now move on, move faster, hence the consortium or what not. But what's the next wave? Is it internet or things? Is there other things? What's your take? I mean, just on that point of bubble versus wave, I've been through a couple of bubbles. And this feels very different to me. And the difference is that companies are really using the stuff that the vendors are talking about, just to solve real problems. They're not just pretending. So that's kind of point number one. I think just in terms of where that next wave's come from, I don't really see a big turning point here. I think it's a continuation of what we've seen. We've, I think your point is correct that we've really sorted out what happened to the data level. I think we've established now that data science is the way to make use of that data and exploit that data, if you will. And now I see, as I said earlier, on the next sort of evolution there being actually applications that hit the consumers that really exploit the data science and the data in ways that we are just starting to see now. Well, thanks for coming on theCUBE. Really appreciate it. Great to see the new success with the acquisition, changing jerseys, keeping the same mission. What's the objective? What do you got on your to-do list? I'll see now more resources, certainly Microsoft. Good budget behind you, good resource. Maybe some bloatedness, if you will, but they're working on that. What's the mandate? What's the to-do list? I mean, on a personal level, my mission is to continue to support the success of the R community, which is growing so rapidly in the R project itself, and to bring R into even more enterprises as that language we're implementing these data science applications. David Smith, CUBE alumni, Chief Community Officer at Revolution Analytics now, part of Microsoft, not officially, but soon will be. It's just dot the I's across the T's. Congratulations, great to see you. We're here live in Silicon Valley for Big Data SV, part of Stratoconference and Hadoop World. We're excited to be part of Big Data Week here in Silicon Valley. So, CUBE, we'll be right back after this short break.