 Today, March 20th, 2012, EMC Green Plum drew back the curtain on Chorus, its collaborative analytic platform for data scientists. Importantly, the company announced a new open source business model for Chorus. As well, EMC acquired Pivotal Labs, a software development consultancy. Hello, everybody. This is Dave Vellante of Wikibon.org, and I'm here with my colleague Jeff Kelly, Wikibon's big data analyst, and on the phone, we have David Floyer, CTO of Wikibon. We're going to discuss what did EMC Green Plum announce exactly? What does it mean to customers in the industry, and why does it matter? Gentlemen, thanks for joining me. Great to be here. So, Jeff, why don't we start with you? Thank you, David. David, remote from Mountain View, as always. Jeff, kick it off here. What did EMC Green Plum specifically announce? Three things. Chorus, as you mentioned, their social collaborative analytic platform that they've been talking about now for about a year is going to go GA on March 23rd on Friday. Second, that they're going to open source Chorus, the second half of this year, although some details on that are still a little bit vague. And third, that they're acquiring Pivotal Labs, a consultancy that helps organizations build the big data applications, and also the company that helped, that worked with EMC actually to develop Chorus. So those are the three main news points that were announced. So talk a little bit more about Chorus. So where does Chorus fit? Peel the onion a little bit. A little bit more detail on what it is and where it fits into the whole Green Plum platform. Sure. So Green Plum's Chorus platform, essentially, it's, as we mentioned, a collaborative analytic platform, which means it's designed for data scientists to manipulate and otherwise do their day-to-day work with data using the tools of their choice, and it kind of surrounds that, or places that in a collaborative environment. So it makes it easier to connect with colleagues, collaborate on research, find new data sources that might be strewn about the organization here and there. And essentially, it's really to embrace the collaborative nature of data scientists, which we know from surveys, data scientists are among the most collaborative IT workers or business analysts. So essentially, that kind of layer that surrounds the analytics tools to enable collaboration. Chorus is part of what Green Plum calls what, UAP, what is that, universal analytic? Unified analytic platform. Unified analytic platform. And so where does Chorus fit and what else is in there? UAP has essentially got three main components. It's the Green Plum analytic database. It's the Green Plum Hadoop distribution, and sitting side by side. And you can kind of picture Chorus as the third component layered on top of the two. And essentially allowing you to access both systems to integrate the data into Chorus and otherwise manipulate it with your tools. And the Hadoop, is there a Apache Hadoop or some other distribution? Yes, it's their Apache-based distribution as opposed to their MapR-based distribution. Okay, so the UAP does not include the MapR piece? That's correct. Okay. In Pivotal Labs, you mentioned there are software consultancy, software development consultancy. I know they're focused on Agile, rapid development. They were sort of early proponents of that whole methodology. What else do you know about them? Well, you know, they worked with a lot of basically building web-based applications and they've developed something called Pivotal Tracker, which is their own essentially application framework for Agile software development. So they're pretty well recognized in the software development industry. And as I mentioned, they were working closely with EMC and developing Chorus. And over the course of that relationship, EMC came to the conclusion that Pivotal Labs could really help their clients essentially build big data applications on top of UAP, hence the acquisition today. So, Jeff, let's talk a little bit about what this means. And then I want to bring David Floyer into the discussion. So the Green Plum announced, the EMC Green Plum announced that it's open sourcing Chorus. Exactly what does that mean? Because, you know, people talk about open source. There's different gradations of open and free and so to clarify exactly what this is about. Well, EMC was a little bit vague on the details, but they have committed to open sourcing this in the second half of the year. And they talk mostly about open sourcing it in the sense of making the API as available to developers. They don't envision, I'm not even quite sure if they're going to allow developers to fiddle with the actual code, the platform itself Chorus. But the idea here is to make Chorus easily available to application developers so they can plug their applications, or I should say plug Chorus directly into their application development processes. So I thought, so what's vague? I thought that they had used the metaphor of Android versus Linux, for example. So Android is essentially controlled by Google, right? People make contributions, but they've got to go through a Java sort of similar type of model. They've got to go through now Oracle was Sun. Well, I think what I took away from the announcement is that they're really focused on, they don't see, they don't believe there's a lot of work that needs to be done on the Chorus platform. That's not kind of why, it's not with the Hadoop model where people need to make the platform more robust. It's a lot of white space. Right, this is, they believe the platform's enterprise ready, ready to go hence the GA later this week. The idea of the open source component is to make it easy and to attract developers to build applications on top. And then in terms of how much they're going to control that, they mentioned there's going to be, there's still a lot of work in progress of how tightly they're going to control or basically control the applications that are built on top. But they have stated that they're committed to keeping this an open environment and trying to make it as available to as many people as possible. David Florey, let's bring you into the conversation. What's your take on all this? Well, for green thumb customers, this makes the additional chorus on top of it enhances its value. Essentially, it seems a better way of being able to populate the green thumb database from multiple sources, be able to pull those in more easily from Hadoop sources or from other database sources around the organization. And that would allow a degree of automation about that being kept up to date. And that essentially is going to allow some automation of what the data scientists are doing and make it much more production friendly. In terms of unique capabilities in the industry, there are other ways of combining these databases around there. Quest, for example, have a product called Toad, which is an open source product that's very, very truly open source product that's very popular in the development industry. So there are other approaches around. But allowing automation and automatic updating of data in a production setting is a good thing and will allow more robust implementations of this as a large centralized database. And for people who want to take that approach, and obviously it's a cheaper approach than taking a large Oracle centralized database, then there's some attractions. So it seems like there's always a trade-off here with open, right? So you've got the totally open, open source, free. Anybody can contribute. And there's some kind of body that adjudicates what gets delivered. And it's sort of quirky. There's obviously risks involved. It's a lot slower, actually, versus the single vendor controlled. Like in Android, I would put Java in that category where you've got an overall authority. The platform's open. The API is open. The exits and the entries into the platform are well understood. It's about growing an ecosystem. But the core platform is controlled by a single vendor. And then you've got the other end of the spectrum, which is a proprietary, which still could have sets of APIs. But it's not considered open source in the sense that we're talking about here. So what are, from a customer standpoint, when should you go with which model? Is that a question for me? So my input would be that this is much closer to the last model that you mentioned. It's an API which is essentially free to use. But you have to obviously take the core software from EMC. And you have to take the Green Plum database, et cetera. So it's an ability to use the API and build applications on top of that particular platform. So for ISPs of some value, it's possible that end users themselves would like to use those APIs. But in terms of the traditional open definition, I think it's pretty close to a closed system and should be regarded as such by end users. That doesn't mean to say it's bad at all. I mean, the Green Plum is an excellent database. And obviously, time will tell about the core, et cetera, and how effective it is compared with other products. But given that those are good products, then there's a good, strong reason for taking this approach. Didn't EMC Green Plum announce as part of the announcement that it would enable other databases besides the Green Plum database? Or am I mistaken about that? They have the ability to pull in into the Green Plum database other databases, pieces from other databases, keep track of that. So it's a way of creating a bigger database from multiple sources, including Hadoop, as part of the data science project part of this, if you like, the automation of that. Is that it? I thought there was more of an indication that they would, over time, open it up to them. Right. Well, I mean, they said, on the one hand, they're absolutely committed to enabling integration with other databases. On the other hand, they didn't say exactly when or how far along they are in that process. So for now, we're starting here. It's really targeted at the existing base of customers. Absolutely. And then, over time, we'll see. OK. Why does all this matter, Jeff? Well, for a few reasons. I mean, really, the last mile of big data is the applications. So any platform or service that makes it easier to build applications and to do the analytics that feed those applications, to build the models that feed applications, is a good thing. So in that sense, this is important. I think data scientists working with this platform are going to embrace chorus, at least initially. As David said, time will tell how effective it truly is. But EMC is definitely on the right track in terms of making it collaborative, allowing data scientists to use whatever tools they want. For instance, EMC has not created or is not embedding an existing, particular existing analytic technology or tool into the platform. You can bring SAS DataMiter into it if you want. Or I believe they also mentioned Alpine data labs that they're working with. So the idea that you can bring in the tool you're most comfortable with, I think, was also very important for them to take that approach. But yeah, I think when you combine the social with the analytic capabilities, it's at least from a messaging or from a vision standpoint, I think they're going down the right track. So you said the last mile's applications. Applications are hard to build on top of Hadoop. We know that. So you have predicted that this is the year when we're going to see a lot of application activity. Is this an example? Are we on track for that prediction? Yeah, I think what needs to happen is more platforms that enable the building of applications has to occur before we actually see a plethora of new applications emerge. So this is definitely a move in the right direction there. But it is difficult to write these applications in parallel environments such as Hadoop. So any kind of we're hoping to see more platforms like this that essentially abstract away some of that complexity in terms of integrating Hadoop and other data sources into your application development processes. So what about Pivotal Labs? I mean, I know they're a big Ruby on Rails shop. They hopped on that trend. My guess is they're going to hop on Node.js and any hot emerging language, a company like Pivotal Labs is going to be there. Very early edge, leading edge development company. What does that mean? What does that acquisition mean for customers in the industry? Well, again, it's critical to think about an EMC customer that's invested in UAP. So they've spent some money on Green Plum. They've got a Hadoop infrastructure that they're now supporting. Now they've got Chorus where their data scientists can be more productive. But what's next? You need to make that data available, that analytics available to the application so business users can actually make decisions with it. Or perhaps you could automate business processes. So I think this is key. This really gives EMC the full stack, really, from the storage and processing, up through the analytics, to building the application layer. All right, David, anything you would add to? Why does it matter? Yes, I think Jeff has summed it up very well. This is from Green Plum's point of view and the ability of Green Plum to compete in the marketplace. This is an important capability. The ability to orchestrate all of the data sources and manage those effectively and make it into a production workflow is critical. Because the applications built on top of that have to be able to rely on the quality of the data and ensure that it's the right versions, et cetera. And the idea of this, obviously, is to make it a lot simpler and a lot more automated to actually do those operational processes. So it's a prerequisite for being in this marketplace. Obviously, the quality of it and how easy it is to implement time will tell, but it's a prerequisite really to be effective in operational. But it doesn't add, in essence, anything dramatically new in terms of capability. It's stuff that could be done, but would have to be done manually. This is EMC at its best, which is putting in the framework and the automation around it. All right, Jeff Kelly, we'll give you the final word. Closing thoughts. Well, like I said, it's a big data application. It's kind of the last mile here. You need to build applications ultimately to derive real value, excuse me, from big data. So I think it's certainly a welcome move, I think. We're hoping to see more vendors embrace this kind of social collaboration type model for data analytics and data science. Of course, as David said, time will tell if this particular product kind of meets all the expectations, but I think this is a good sign that we're moving in the right direction. Okay, so data science, collaboration, openness, applications really are the key. Guys, thanks very much for helping us break down this announcement. And you guys have a great day. Thanks for watching everybody and we'll see you next time. Okay, bye.