 long side or next to architecture, which means that they've got an analytic server sitting outside of Hadoop. They've got data sitting inside of Hadoop, and what they do is they use some kind of connector, whether it's ODBC or a high-speed connector, we support that type of architecture as well. So we bring extract data out into a local data mark to build a model, and then once you have your predictive model, you score it back in Hadoop. That's traditionally how you do that. Last year we introduced a support for Hadoop in a open-source package called R-Hadoop, and what we provided at that point was native support for extracting and importing data into HDFS, as well as HBase, so you had native support as far as data connectors. The second piece that we introduced at that time was the ability for an R programmer, an R data scientist to be able to develop mappers and reducers inside R, and then stream them, run them streaming inside of Hadoop. That was a very powerful capability. It's the third most active download on GitHub today, and the reason that it's so popular is because, as John was saying just a few moments ago, people are looking to do their data distillation in Hadoop and not have to extract data out, and this gave them the ability to do that very easily inside of Hadoop. So that's really in a beside architecture. Now what we're introducing, and you can come by or both see a demonstration of it, is an inside architecture where what we're doing is we're bringing both the data and the analytics layer together so that people can build models at scale inside of Hadoop without having to extract the data. So it's sitting there and it's residing. You can build your model and deploy it at the same time inside of Hadoop. Very powerful capability. So Michelle, you guys are playing with all the different vendors. Obviously Hortonworks, Cloudera, Green Plum, and obviously there's a little bit of a scuffle going on right now. Green Plum had the big announcement yesterday. I saw you guys were on the ecosystem slide at Green Plum, and I got to say, it was very impressed with Green Plum's announcement. It was impressive. It was a great show. That's Jeremy Burton's DNA, and let's see if they can carry that out as a startup, but I think they will. I mean, obviously, Paul Moritz is there. They had a lot of chops, so they are flaunting their show and they're flexing their muscle. What I want to ask you about some specific things. The data warehousing market, there's a camp that's kind of throwing Hadoop against it. I mean, my only criticism of Green Plum is that it feels like cheap data warehouse, but I like their software approach. There's no more appliance. I like that. So a different approach. We'll find out when the world plays out if it's going to work or not, but there's a different approach. So I want to ask you specifically, like, how do you guys deal with the different approaches? I mean, obviously, data warehousing is changing. How far along are the customers? Do they really understand what's going on between EMC and Hadoop? Or is it so early to tell? I mean, it's hard to tell. So what I see out in the marketplace when I talk with customers is any enterprise shop has a data warehouse and they have to have that coexist. They're not going to be able to throw that out immediately. So what they're doing is they're introducing Hadoop into that environment and what they're looking for are a class of tools where they can write it once and they can deploy it no matter where their data happens to reside. Today, you don't see that. That is what Revolution Analytics is really bringing to the table. People can write an application, our based predictive analytics application, once. They can deploy it in a Hadoop environment, whatever their favorite flavor is. They can deploy it inside the database. They can have it out running on a server. You'll see in the future, there'll be cloud implementations of this as well because people want to have elastic bandwidth for computationally intensive problems. So I always talk about, you know, relative to the markets, you know, the horse and buggy and then the car, evolution of the car. And, you know, we're, I think, a little bit further along, but, you know, in a way, the IT world has the old ways. SQL and Hadoop, like what Green Plum was doing, was interesting, but it seems like it's making something faster that might be older. What Charles was talking about is much more interesting, this notion of enabling new use cases that don't exist yet. So I want to ask you, because this gets down into the weeds around people's claims, especially Green Plums, for instance. I'm going to drill down on their benchmarks on day three, hopefully with them here in theCUBE. But, you know, when you move stuff around the network and you transform data, move it around, there's issues there, right? So there's trade-offs, right? So can you talk about what that means, moving data around and the notion of transformation? Meaning you got to pull it out, make it work, and there's a lot of structure involved, overhead. Can you break that down for us? Yeah, so it depends on what your use case is and the scale of your particular data, right? And then you have to look at whatever that particular architecture is to determine where is the best place to actually do the transformations, right? And so what we're trying to do is enable, no matter what the use case, right? Whether it's a small enough data that it fits in memory and you want to be able to do your transformations there, let's not forget, the reason why Hadoop vendors are supporting SQL interfaces is because there's some things that SQL is incredibly good at doing, okay? It was designed to manipulate data, and so when you look at just data transformation, if you have a structure, right? And you know the, well, it's not even just, but think about it from an analytics perspective. If you are going to be doing transformations, those are known transformations that you're going to be doing. You can run them very easily where it takes many, many, many lines of code to do it in many other things, right? Now, within our platform, we also have Datastep, and Datastep is a very nice capability, okay, to do your transformations in place so that you don't have to use ETL tools or you can use them in coordination with the R language inside of Datastep. So that's a huge differentiator. Now, I want to come back to the news a little bit. So we talked about the in Hadoop processing, also in the Intel and Green Plum announcements, they really push an ecosystem, obviously, with the space that you're in, and you've got also partnerships with Hortonworks and Cloudera, is that correct? That's correct. So Hadoop is obviously where the world is going, and with Yarn, you see that there's going to be multiple processing frameworks, right? And so what I see happening in the marketplace is a convergence towards a coexisting strategy between SQL, MapReduce, real-time analytics, all of that's going to have to coexist. And so part of our long-term strategy is to enable our enterprise customers and whatever, it's too early to say who's the clear winner in the Hadoop world, and everybody has a different strategy that they're going after in terms of the Hadoop world. And so we're supporting multiple Hadoop implementations as well as the enterprise database vendors, because that's the reality of where our customers are. So you really have no, what you're saying is you have no choice, you have to sort of place your bets across all the horses, but there's a lot of Hadoop distributions coming out. Does the world really need another Hadoop distribution? What's your angle on that? So I think that there's different angles. Like if you look at the Intel distribution of Hadoop, what they've done is something I think is very radical and was very bold move, which is they've actually tied the Hadoop distribution down at the silicon layer. So they're going to get specific performance increases. For some customers, that is going to be enough, okay, that they're going to choose to use the Intel distribution, even though it doesn't have all the other value added components that cloud error has around. So it's open source, it's open source Apache, massive distribution, maybe clearly better performance if it's at the silicon level, and then you've got Intel's management capability on top, but you don't have to use Intel's management capability, obviously. So that is pretty radical. Okay, so they're adding value. All right, good, and then I also wanted to just talk about R a little bit, and who uses R? I mean, it's obviously popular with data scientists, academics, talk about the base, the users. So there's over two million users that use R today. So if you are at a university, whether you're an MBA program, or whether you're in a stats or a math program, if you're an applied program in terms of one of the guys that works for me, his son is studying to be a neuroscientist, he loves R. I mean, it's just, it's been, it is the de facto modern statistical language today, and that is what everybody that's coming out of school is learning, and it's not just a language, it's a different approach. It's a modern day approach to actually building predictive models. So it has foundations in statistics, but it's very heavily weighted towards machine learning as well. People tend to approach the problems very differently than they have in the past, where if you look at legacy systems, they tend to be much more statistically oriented. So there's a synchronization between the mindsets of people using R and the mindsets of the Hadoop world, even though R predates Hadoop, right? So talk about what you're doing to make our, increase the affinity with huge data volumes. Yeah, so with R today, there's over 4,000 open source packages. That's analytic functions. There's tens and thousands of those that are available out in the CRAN repository. And what Revolution R Enterprise does is we provide a distribution, and we start really at the base. So we actually rebuild the R distribution using some performance enhanced libraries that get us anywhere from five to 50 times performance increase, depending on what the analytic application is just at the base core platform. Then we support the open source packages. In addition to that, we provide the high speed connectors that I was mentioned about in the enterprise landscape. We also provide our own package called Scale R. And that is a parallelized external memory algorithms. And so it's a library of about 50 analytics today that are all parallelized. And what that does is it's completely linear scalable, no matter what data you have. So you don't have to, R is a language that forces all the data to be in memory in order to process it. And so what we've done is we've built this architecture, a Pima architecture, that allows our algorithms to linearly scale no matter what size data it is. So when I talk with customers, I say it's essentially a fancy queuing system. If you really want to think about it, it's a streaming in memory analytics architecture that allows our algorithms to process bits of the analytic, then combine all the interim results correctly. So it's completely distributed. And so that the customers don't have to do the heavy lifting of writing their own algorithms in a parallel architecture because that's hard work to do. So we're taking that burden off of them. So R is the kind of cool new system, right? It's the HIP platform. But at the same time, you're competing with guys like SAS and SPSS who maybe not as HIP but huge, huge install base, massive support infrastructure with SPSS now at IBM and obviously SAS been around since I was a little boy. And so how do you compete with such established franchises other than the coolness of the platform? So there's a couple of ways that we compete with them in terms of our high performance analytics outperform most of the analytics that you see in SAS. So we could go head to head with SAS HPA and we're going to win, okay? So when it comes from a performance perspective, we win in those kind of battles. We have some data that we could share with you that have us doing. And it isn't really even about, even if the performance benchmark we have that's a logistic regression, we happen to be twice as fast. But it's 80 seconds versus 44 seconds, who cares, right? So you were twice as fast. The big difference is if you look at the underlying architecture, the amount of memory, the amount of cores that they had to use, it's about a $2 million system. We're running on commodity hardware, that was a fraction of that, that's more in the 200K kind of range. It's so much more efficient use of resources. So my final question, Michelle, is going to be around what is your take on the preferred approach for analytics? And I know you're kind of straying on a lot of political lines here and you're kind of Switzerland in this situation. So it's a hard answer, I think, for you. But you've been around the block, you've been at IBM, you know the industry, you've seen the old way and you're now pioneering the new way. It's almost that clear, old way, new way. With some migration, that was not a throwaway sequel. I mean, Green Plum has a good approach. Get a beach head, it's their fourth try. Get a market and win there, right? I mean, they can get a nice little cottage industry in the sequel business there. What's your take on the analytics, the preferred analytics on the data platform? What is your take on that? So I'm going to be politically correct again, John. You're not going to like that, but what you're going to see is that packaged applications is the way that's going to lead us out of this, okay? So packaged applications is all about domain expertise. Those packaged applications by Gartner's standards are that 65% of the applications that are coming out will include Hadoop, okay, as part of their predictive analytic packages. You will see R embedded inside of that. When you go to that game, games over, okay? Because that's where- That's like a graphics card in a PC. Everyone has built in. Exactly. When you run your business on an application, which is what you do with applications, okay? You become dependent upon that. And the legacy environments, pricing models don't lend themselves very well to those environments, nor do they do that from an intellectual property perspective. When you're looking at open source R, it's a radical shift for those, and it will fuel that. Okay, let me try to be politically incorrect and translate that. Very good answer. We survey all masters, but really the game hasn't been determined. You know, the gladiators are in the Coliseum. You know, the horses are on the track. As Dave says, I mean, we're going to see a competitive marketplace. I mean, we've seen this movie before, an emerging marketplace, some consolidation, winners, losers, a lot of ones coming out with their own little fork or version of distribution. I mean, it's a community here. I mean, we'll see, right? I mean, it's one of those things. Yeah, well, we'll see. Come on, say it. I have my bets. I've placed my bets. Yeah, what do you think? You have to place your bets, right? Yeah, what's your take? I mean, obviously, you know, we've all been around the block. We've been in many cycles, you know, going back mainframe client servers up through today. I mean, you know, it's a good battle. I mean, people aren't necessarily, you know, I mean, people are happy about Green Plum coming to the marketplace like this. I think, John, it's like our friend Jim Long describes it. You've got a lot of innovation going on. You've got the top down and bottom up. And the top down is the vision. And the bottom up is where the rubber meets the road and people are actually deploying and delivering business value. And when those two meet, that's who wins. And so I think that to evaluate, you know, the horses on the track, you got to look at who's got who's the visionary, who's actually executing on that in a way that is actually going to deliver business value and they will get critical mass first. And Green Plum did say on the network last night on Twitter, yeah, we have a different business model. They're not hiding anymore. They're not trying to play nice. Hey, we have a different business model. We're going in this direction. So I guess my take is this. I mean, I look at a company like MapR and I really like the fact that they've said, you know what, we're just going to go add value. Damn the torpedoes. Yeah, the open source thing. We'll play where it makes sense, but we're going to do business. I think that that's a great strategy to build value, but I do think that if cloud era doesn't get taken out, you know, if they stick to the knitting, if they really stick to the vision, that has greater long-term potential, but those are big gifts. And scalability. Yeah, but yes, but you know, again, if Oracle buys cloud era, then all bets are off. That will change dramatically. I love what Hedapt is doing, but I think the Hedapt's total available market is not as potentially large as a cloud era is, even though I think that they're going to be able to, to develop a tremendous amount of value. Well, we know data visualization is dependent upon a platform to enable it. It's not, you know, it's decoupled in a way, but it's being more tightly integrated for enablement. So, you know, these platform conversations matter and it impacts visualization. Absolutely. Visualization is the piece that is, I think the whole data artist piece is totally underplayed today. Yes, I agree. Because understanding the data in a way that makes sense, that's consumable, it's a totally different way of thinking about it. It really, today, it's in an artesian camp where people are doing one-offs, okay? Nobody has yet figured out how to mainline that and you're going to start to see that it has to come about, right? It's an art in science, that's part of our segment. You know, data visualization, data-driven visuals and is art in science. And I think that, you know, there's been a big emphasis of designers in the consumer web 2.0 that has not translated yet to visualization. You know, I mean, whether you're talking about micro strategy or R or whatever tools are out there, Tableau, I mean, it's sexy looking, but it's better than what it was. Well, yeah, the last thing I want to add in the horse of the track is, you know, we just did this study, I was talking about earlier, 11.4 billion dollar market, growing to 50 billion. Half of it is services. And so there's still a huge market for services. That's really where there's still tons of money being made and we live in a world of whales. It's IBM, HP, Oracle, guys like EMC are going to be buying up all these companies and incorporating them in and trying to figure out how to make money at open source and services is a way to do that. IBM has proven it, it endorsed open source long ago, so I think that's another space to watch. It's sort of boring but important. Yeah, so you're actually playing into, so my first book with my co-authors is out and I have a second book coming out and it's going to talk about exactly because what you see out in the marketplace when people do their custom applications today, analytic applications, is I say that people keep doing the same old things. They don't do anything really innovative with their analytics, right? And it ties back to their business strategy and you have to tie it back to your business strategy so that you're enabling your bottom line or your top line revenues or ideally both of those. And most organizations don't have a methodical way of being able to identify what those really gems are in their organization, where they can really exploit the power of analytics and how they can do it in a way that is really modern because modern analytics is more like an organism, okay? It's going to grow and change and shift. And today, the way we think about predictive analytics is very much like a closed system where you do it once and it doesn't change very frequently. And that is radically different than it's going to be in 10 years from now. Doesn't cut it in this world, does it? No, it doesn't cut it. All right, good. Well listen, we're way over the time. Mark, we need what they do with the Oscars. They blast in the music and they drown out the guests. So, John, we got to roll. Okay, we'll be right back with our next guest right after this short break. It's silkenangle.com's exclusive coverage of O'Reilly Stratoconference in Silicon Valley, California, Santa Clara, California, where all the action's happening right now. This week, we'll be right back with our next guest.