 At Big Data SV 2014 is brought to you by headline sponsors WAN Disco. We make Hadoop Invincible and Actian, accelerating Big Data 2.0. Okay, welcome back everyone. We're here live in Silicon Valley for Big Data Silicon Valley or Big Data SV. That's hashtag, Big Data SV. We just had Big Data NYC in New York a few months ago. We're covering all the action here in Silicon Valley including the Strata Conference which is winding down right now. We are in day three of the CUBE coverage live, wall to wall. I'm John Furrier, the founder of Silicon Valley. I'm John with Jeff Kelly, my co-host today from Wikibon.org. And this is theCUBE. And we are here with David Smith from Revolution Analytics. Welcome back to theCUBE. Good to see you and thanks for coming up. Thanks, it's a real pleasure to be here again. So Revolution Analytics, you guys have a great brand name, you've been growing and part of the revolution, so to speak, pun intended. Talk about your view here, because everyone is talking about, we're now transitioned from full adoption of Hadoop as what people are using. But certainly the scale piece is now here, moving to data science. We're moving into that full throttle of analytics, insight, data. We just had Joe Hellerstein on from Trifacta. Talking about that data is at the center of computer science. We've had Avi Mehta on talking about the center of data for business. And everyone, and data is at the center of everything, right? Absolutely. So you guys are in the analytics business and what's your take on that? You agree? Absolutely, I agree. I think it's really the next phase. We've been talking about big data for so many years. But now a lot of the big companies especially have really got it figured out. They've got their big data infrastructure in place. They've got their Hadoop cluster online. They figured out the process of how to ingest and amass and aggregate all of this data. And the question that they're now turning to is how do we get value out of this data? And the way you get value out of big data is with data science. And for us, this has been great because the lingua franca of data science is the R programming language. And so we have companies coming to us and saying, how can you help us, help our data scientists, make sense of the big data that they're working with. Figure out what we should be doing in terms of our new products or our new systems or eliminate costs. All of this is what comes out of data scientists and it comes out of the R language. So we've just been sort of run off our heels the last few days. Well, yeah, I wonder, as you've been watching this evolve, you guys just been looking at your shops and waiting for the infrastructure to harden a little bit so that this conversation does up level to the data science level. Exactly right, I mean, we couldn't have this conversation until people had the data in place. But now they've actually got their hands around the data stores themselves. They've hired the data scientists. The data scientists are actually working with us to actually say, how can we actually bring our analytics to the data in Hadoop? And that's obviously something that we've been doing through our product lines and bringing out with the in Hadoop integration. Well, we've had you on before, but I think it's important to kind of reiterate kind of where your value proposition is in the larger world of R. R, as you mentioned, very popular with data scientists and analysts. So talk about the things that revolution specifically does to really make this ready for enterprise-grade deployments. Absolutely, so we make R work in production. So that means a lot of things. Number one is just simply providing technical support and consulting services. So companies can feel comfortable using R, integrating it in their systems. But especially when it comes to real-time production applications where performance is key and where scalability, especially with big data, is paramounts. And these are the things that we add to open source R in revolution or enterprise. So the companies can be confident that their real-time systems are going to turn around in the millisecond frameworks that they need for their online systems. And that as their data sizes grow and grow and grow, they're not going to hit a bottleneck at some point and find that R can no longer cope with that. Well, talk about that a little bit more because R really was not necessarily designed with big data in mind. And that's one of the things that revolution is doing is kind of making it scale or helping it scale. Talk a little bit about how you guys have approached that and what the kind of success you've had. Yeah, I mean, first of all, R is just an amazing framework for doing R&D and development. It's really what's made R so successful in the academic sector because everybody is doing research in R. When a new statistical technique or a new machine learning technique comes out, it comes out in R because the paper was published with the R code and associated software that you can download in R to do that. But designing a system around that flexibility and innovation has some counterpoints in terms of performance and scalability. So R was designed to be an in-memory application, which is great when you're doing R&D because all the data is right there in memory to access. But when you scale that out to bigger data volumes, you can no longer fit that in to the survey you're working on. So the way that we've addressed that is to introduce a new data type into R. We call it XDF. And rather than the traditional data frame, data type that R works with that lives entirely in memory, our XDF object actually sits out on disk. And so then we can write algorithms that work with that data on disk. And rather than having to bring all that data into memory at once, we can just stream it in row by row. And our algorithms have been very carefully designed to be able to update a logistic regression or a tree model or any of these machine learning techniques on an incremental basis. So it seems like you're working with a big data set in memory. But in fact, it scales to as much data as you can actually hold on disk. And that's gonna be critical, I mean, as we move to these production workloads, where we're getting to that tipping point where a lot of the early adopters are now looking to scale these out and put production workloads into production. And that's gonna be an area where it's critical to support that kind of scale. Exactly, and especially with the Hadoop environments. And when there's so much data sitting out there on to these distributed disks in HDFS, the previous way of dealing with that was to extract that data from Hadoop, bring it into a separate analytics environment, and analyze it there. But the problems were, first of all, the amount of time it would take to simply move that amount of data could be hours, it could be days in some situations. Didn't correspond to the sort of turnaround cycles for a data scientist. And equally importantly, you can only use the power that was available in the sort of the separate server. We've sort of recognized that the reality is of data gravity. Data is enormous, it attracts more data, it makes a lot more sense to bring the analytics to the data, just as Joe was saying a minute ago, that it does take the data to the analytics. Well, really one of the core tenants of big data is, big data doesn't like to be moved around, it likes to sit where it is. So, kind of related to that, so you've got an announcement this week about making revolution available on AWS. Which is interesting, we had treasured data on earlier, who's a big data warehousing service on AWS, and we're increasingly seeing AWS specifically. It's kind of an area where big data workloads are at least starting, maybe a lot of experimentation. The question I think for AWS is a lot of those are going to move off back to on-premise when they do production, but that's kind of a little bit of a different topic, but tell us a little bit about this announcement and why you guys have decided to make your revolution available on AWS. Yeah, there's a couple of things here. I mean, first of all, from the data gravity point of view, we were finding increasingly, we were coming across clients that were born in the cloud. So, their data is already sitting up there in S3. And so they were saying to us, like, how can we actually do analytics on that data? And the answer we would have before this was you can extract it from Amazon down to a server and analyze it there with all exactly the same inherent issues with the Hadoop use case I was just talking about. So by making revolution or enterprise available on the AWS marketplace, the data is already up there in the cloud. The analytics are now up there in the cloud as well. And it's just a very, very natural fit for those clients that are born or are running in the cloud. I think that point you just made, by the way, about, you know, moving from cloud environments to on-premises production environments applies there as well. But the nice thing about the cloud, of course, it's a great way to do that prototyping and proof of concepts with the data that's in the cloud environment. When you do the long-term cost analysis, you may will find another architectural option makes sense, but at least you can get going, you know, in the cloud setting. Yeah, those are decisions you make kind of later down the road. Exactly. Absolutely. I mean, we were at AWS re-invent and it's just amazing the number of innovative startups that are working on the platform and it just reduces the barrier to entry for a new company significantly. And, you know, from a software, enterprise software companies perspective, it's the place you need to be. And I think that barrier to entry is a really good point because, you know, from a business perspective, you know, we've mainly focused on the global 2000 market, you know, these big enterprises that do have the Hadoop infrastructure or the big EDWs have figured out the data collection problem. But I think now the really nice thing about having our software available on the cloud is that it does make it available in a cost-effective fashion, you know, to these startups that were born in the cloud, you know, to these SMBs that see the cloud as, you know, an alternative that's cost-effective for them and to actually get their hands on advanced analytics in a way that just wasn't approachable before. Absolutely. One of the things that we hear all the time is the humanization of data. I want to ask you this question because you guys have seen this with R and enable a lot of success. What are the things that people need to understand about the visualization and playing with the data that they may or may not be seeing? And obviously the early adopters are getting their hands wet all the time with this. What should the mainstream start thinking about? In terms of... And just how to visualize, I mean, how to have the mindset, the mindset and tooling. Yeah, I think that's kind of an educational process that's been going on for a long, long time. You know, I think we just see the public in general and business users specifically getting a lot more comfortable with more sophisticated visualizations and analysis of data. You could even see this in like, when you look at the New York Times, you know, five years ago, the most sophisticated visualization you might have seen was a pie chart or a bar chart. And now you open up the New York Times, you see these amazing detailed interactive visualizations dealing and diving down into issues around income and poverty and scale and environments in a way that we'd never seen before. And people in general are just becoming much more accustomed and indeed expectant of seeing these in-depth analyses and visualizations of data. By the way, a lot of those visualizations you see in the Times are done in the R language because it's a great environment for exploring and creating new things. And I think that's also becoming reflected in business user tools. Tableau is a great example. You know, really providing to users these amazing interactive ways of exploring and seeing data. Kind of the next stage that I see is bringing in the analytics, the predictions, the inferences from data scientists and putting those in the hands of those business users. And I think it's a two-set process. You know, you have data scientists in an organization that are working with these massive data stores in Hadoop and EDWs that are producing, for example, a prediction for a sale that's based on their own sort of unique data, their own unique analysis. Now what we'd never expect a business user to do that kind of analysis themselves, but putting the results in their hands interactively in a graphical environment is a great way of replicating the skills of those data scientists across the organization. So yeah, that's awesome. And I want to ask you something that's more of a, maybe a cultural question for the company and or market dynamic kind of converging together. You know, in markets like this, the pressure to compete and produce is always there because the growth is there, right? So what pressure are you guys getting that's forcing you guys to do new things faster, what innovations are coming out, forcing you guys to be better? I mean, it's always like when you have that pressure for growth and the market is obviously growing very fast right now and analytics is hot, data science. What's pushing you guys right now? What's the pressure going for revolution? I mean, honestly, for us, it's really more riding the wave of R. I mean, because R has just been growing so explosively over the last few years. There was a DICE.com survey that came out just last week that R is now the number one highest paid IT skill. So you can see just on the demand side, huge demand for R. It's the fastest- I'll repeat that coming again. R is what? R is- The highest paid IT skill today, according to DICE.com survey. Folks out there watching, they need to listen, take note. Data science are, you know, chaching, frustrating new job. Yeah. So it just shows the demand. It's like a commercial, you know. Yeah, so it just shows the demand for R on the supply side is the fastest growing. Language for data scientists is the most used language behind SQL, and everybody uses SQL. So we're just like riding that wave of everybody's using R, companies are bringing data scientists knowing R, and we're just like helping them to do that. And continuing to just grow. I mean, is there any other like integrations like with Python and other languages we're seeing out there? I mean, you see hardcore programming, then you kind of see the language for analysts. Do you see a level of simplification on top of R coming? We actually see it the other way around, you know. In the same way that SQL SQL is the lingua-franca of DBAs, R is the lingua-franca of data scientists. And actually what we see is bringing in more of the innovations from the open source community at large into the R environment. To give you an example, you know, our engineers right now are really looking carefully at Spark and Yarn. You know, we already have an R interface that enables a data scientist to do analytics with the computations being done back on Hadoop. Now our engineers are sort of working to say, well how can we take advantage of the capabilities that are coming with Yarn on the infrastructure layer and coming with Spark on the analytics layer and bring those into the same R interface? Now it's not really going to change anything from the point of view of the R user. They're still doing linear modeling, trees and so forth, but now taking advantage of all these technologies that have been developed in the open source community. That's really smart. If you think about it, that's a smart way because you're in a way, there's not a lot of bets there. You're letting the marketplace kind of fill in the complementary growth for you guys. Exactly, and it happens in the other direction as well with the business layer. You know, because so many vendors have introduced connections to R and to their software. Tableau, ClickView, Altrux, you know, hundreds more. And you know, again, we're writing that wave of helping people who are using R in those environments to connect it into these third party vendors, providing support for people that are using it and providing those additional scalability and performance factors that come with Revolution R Enterprise. Jeff mentioned Amazon before. Obviously, Cloud and DevOps is a big collision course with the data side of it. Obviously, the data is at the center, also cloud. Up and down the stack data seems to be the center of everything. How does the cloud ecosystem help you? Obviously, computations huge. Does it make it easier? I mean, what's the cloud dynamic? It really makes things easier because I mean, it's just, as you know, it's so easy just to spin up an instance in the cloud that's pre-configured with all of the software, the analytics, the data connections, you know, as opposed to having to replicate that same configuration on hundreds of nodes in a cluster, for example. So it's just a much lower barrier to entry, as we were saying before. And Amazon provides all those DevOps, provides the DevOps for, basically eliminates DevOps for this case. Yeah, exactly right. And you know, for people that are exploring this area that haven't really got into analytics with R before, it's just a really simple, easy way to go to the Amazon AWS marketplace, use the 14-day free trial and start analyzing the data that you already have in S3. I think it's a home run with Amazon and the cloud, obviously, and the stuff you do in the open source is great. My final question I have for you is, describe the moment in time right now in the industry, big data SV, we're in Silicon Valley, Stratoconference, all the stuff going on behind us here across the way at Stratoconference. A lot of things going on, a lot of new stars, a lot of weirdness going on, a lot of people are growing, high valuations, just a great growth opportunity. But share the folks out there, the state of the market in this moment, the key news and the story. What's the big story in this moment? I think it's maturity. It's the recognition that getting value out of data through analytics is the way to drive some of the key business problems that are out there today. How do we make customers happier? How do we reduce costs? How do we increase profitability? How do we increase revenues? All is driven by what we can get out of the data that we've already collected, and all the pieces have now fallen into place to make that vision a reality. David, thanks so much. This is theCUBE Revolution Analytics. The revolution is continuing here with data at the center of the value proposition. I'm John Furrier, HFKLA. Stay tuned, keep watching. We have a few more interviews. We're going to close it out strong here. Day three of Big Data SV, that's Big Data Silicon Valley, hashtag Big Data SV. We'll be right back.