 Live from the Fairmont Hotel in San Jose, California, it's theCUBE at Big Data SV 2015. Welcome back, this is theCUBE. We are live at Big Data SV in San Jose. I'm Jeff Kelly with Wikibon and I'm joined by my co-host Jeff Frick. We've got another great guest. The great guest just keep on coming here at Big Data SV. A frequent guest as well. Many, Barbara, CEO, Cloudwick, friend of theCUBE. Welcome back. Thanks, thanks a lot for having me guys. Absolutely, great to have you and obviously we're here at Big Data Week. We've got Strata Hadoop World going on. You've probably been over the show. What are your impressions this week? What's the vibe like? Oh, I think this 2015 seems to be where the buildup is complete. I think the industry has mature. You can ingest every kind of data which you have. You can basically have real-time feeds. You basically have streaming. You have enterprise data. So every aspect of data can be ingested, transformed and visualization. I think one of the big things is Spark. I mean, this is the first time we are seeing that in the Big Data Analytics market that you can visualize something. It is the data scientist market now. The IT basically job has been done for the last five years is complete. Starting in 2010, all the stack has been built and now you can, Big Data is available for the data scientist, I would say. That's interesting you mentioned that. We heard something very similar from, we had the chief data scientist from Simply Hired on yesterday. He's something very similar. It's hearing a lot more about data science. He was at the Data Science Day on the Wednesday. And he says something similar. It's kind of the infrastructure is kind of hardening. And now it's about what are we going to do with all that data, some of the analytics and data science? Yeah, I mean for the last five, four years of we basically have in this journey, it is all building. We never basically are able to look at the data and hide in the big kind of scenario. But what Spark does, it allows you to basically feed the data now. And that I think is the next, what is the sort of the end of the journey as for the infrastructure build up for the big data stackers. That's how I think the feeling is this year. So you had a few announcements this week. Tell us about that. You've got some new announcements around Hortonworks, around Cloudera and DataStacks. Yeah, so we basically launched the managed services from all these three platforms. Cloudera, Hortonworks at the Hadoop site, and Cassandra for Cassandra for DataStack site. We feel the market is kind of matured that the companies which have the infrastructure for the last three to four years have kind of come to the point where it's becoming pretty stable and they need to have basically our sources to the professionals. So that's what we're launching strata. We are also going to launch in coming months sort of a starter package. Big starter package, big data starter package. Whether you want to use Cassandra Spark or you want to do Hadoop Spark or you want to do Hadoop Cassandra Spark in all the ingestion, aspects, everything we can manage for you. Both on the starting from the platform side to data development side to the analytics side. Yeah, walk us through that a little bit. What is a typical journey for an enterprise getting started with big data in terms of the first steps you have to take whether it's planning, finding use cases, what does that journey look like? What typically what is happening now in the industry is that you have this four weeks initial package which basically goes into install a platform, ingest a couple of data, show them the value. Once they get psyched on that then you basically start building the data pipelines. So ingestion, you can basically take it through scoop from the existing enterprise data. You can now start taking the social data through the flume of the worlds and then you basically have the streaming data with Spark streaming, Strom Kafka, ingestion mechanism. Once it comes to that, now the data available on the platform where you can join these things, you can analyze, you can transform that and then basically take it to the visualization level. So all the stack is kind of built right now and 30 days for the first, I would say, in just the initial aspect and then I think six to nine months for the development or the data pipelines and I think another three months where it's done to the point where the data scientist can start playing. It's a year long journey. And are there some patterns emerging as to where people like to start in terms of the applications, the business cases, the types of divisions within the company? Who either has the budget, the mojo or is willing to take the risk to kind of get it started? I think it is always the line of business which is pushing the IT to do it. The IT basically never came up and said that okay, we're going to do it as well as the line of business. But this year we have seen the things coming out of the Hortonworks and the cloud areas of the world, their specific vertically used cases being developed. So now the discussion is reaching to the line of business, the board level guys, the IT guys. And that speaks to the maturation of the industry right now. But even in line of business, can you give some examples of necessarily companies? You probably can't talk about companies. But of kind of the specific projects that's the thing that actually gets the company to the tipping point that says, okay, we got to get involved in this. This is how we're going to start and really to have that proof case for the rest of the company of why we need to be involved in big data analytics. I mean, if you look at the retail, 360 degree customer view, I think that gets excited everybody. Every retail customer in the world basically wants to have where their customers are going, whether they're shopping online, whether they're shopping off-site in the stores, they want to have one view of that. That's a very, across the use case you've seen, whether you go to retail, you go to telecom, you go to any other industry, you see that. So those kind of use cases that I think elevated to the level now. And we will see in 2015, from the major guys, like Claudia and I, Heartland works, a lot of the vertical use cases. I think the discussion from the where they were setting to the IT is gone. Now it is the line of business. Certain specific guidelines on how to build these use cases using the tool sets. So talk a little bit about what's going on in the industry this week. Obviously we've had some big announcements from the technology supplier side with the establishment of the ODP, the open data platform. I'm curious to get your opinion. I mean, you've been in this market now for a while. You're kind of well informed. You know, you've got a lot of relationships with the different players. What is your opinion of the open data platform? Does the industry need such a consortium? I don't know. I don't think so. I mean, I think that the tool sets is there. I think it's a more of a marketing, looks like, more of a marketing aspect. I think the tool sets which are emerging from Heartland was Claudia Spark on the NoSQL side. I think basically he does this thing. Now the question comes is you approach the line of business, you know, to solve their business problem. So I think the, you don't need any other consortium. It always confuses the market. I mean, it's extremely difficult to navigate now in the big data journey. I mean, there's so many components to it. So you put another one in the mix, it'll confuse the message. Good point. I mean, so one of the, we were just talking to Sunny Modder from Pivotal and one of the things he mentioned was we're seeing these mini ecosystems emerge around the different distribution providers where some of the applications and the tooling that might work with one does not work with the other and that's one of the things they're trying to address. Do you find that? He's right in that aspect. There are certain things. Let's take an example. If you basically are using, let's say a high frequency data coming out of the Internet of Things or something, then you need sort of a high persistence engine which is like NoSQL can do that and you can analyze that to Spark. So you don't need Hadoop on that one. Hadoop sort of become the repository at the end of the day, you know, for that one. And you might never need that data. You just need the signals, exceptional signals which are coming from the Internet of Things, you know. So yes, ecosystem on that one basically is different. So it depends from the use case to use case. But does it mean that okay, we basically need to have a combination of all those things and confuse the market? I think it might not work. And that's what my thought is. So tell us a little bit about, you mentioned data stacks in Cassandra, one of the databases you're working with now with your new offering. You know, we're trying to parse through the kind of the NoSQL market. There's a lot of interesting things happening there. We saw MongoDB raise a bunch of money last month but it was a series G raise, which raises some questions. You know, we're hearing data stacks in Cassandra is gaining momentum. But you're trying to build up business for those companies in a NoSQL space is pretty a little bit challenging. But I'm curious from your perspective, why did you just, for this announcement in particular, choose to go with data stacks in Cassandra? Are you seeing particular momentum with Cassandra in the enterprise? How much you're taking on that market? The level of the fortune 500 we see Cassandra has more momentum where the use cases are a little bit more complex and the scale is a little bit higher. When you say complex, what? I mean, the complex at the time sees this data. You know, you're trying to analyze and there's some real aspect to it, the business aspect to it. I think from the Mongo perspective is that it's a departmental database. It's very easy to use and they are doing their own managed services kind of components. I think we basically feel, and what we've seen from the market from our customers, is that Cassandra is becoming a real replacement for Oracle Accidata's of the world. And we have seen, we have implemented, we have current implementation of two at those places of the Fortune 500 where Oracle Accidata is replaced by Cassandra. So there's a real momentum for those guys. Is, I mean, so there's obviously, talking about fragmentation, there's a lot of different NoSQL databases out there. Do you see one or another of those databases starting to merge as more of a general purpose NoSQL database that's going to be able to do things you know, that previously you needed multiple different databases do is Cassandra that databases or something else emerging or are we going to see this different databases for different purposes? I think different databases for different use case. I think that's going to happen. I think you basically have a lot of HBase in the Hadoop camp, you know. But if you'd like to have high frequency data independent of Hadoop, you know, Cassandra does it. You know, I think HatchBase made announcement with Hortonworks residing on the YAM. I think that's a good one. So I think, but it'll be a different use case, a different data, NoSQL database and also what infrastructure does the company have? They have already, you know, Hadoop running, they would go for the HBase, you know. If you don't have Hadoop to that extent and you want to lightweight, and then you go to Cassandra. But we've seen some real significant large Cassandra clusters. Well, I mean, it's an interesting sign that we're starting to see these workloads start to really scale. And then it's, as you mentioned, the infrastructure is kind of hardening and now we're moving to that, more of that data science phase. But does that open up, does that create new challenges for the enterprise where maybe they were previously focused more on the infrastructure? Okay, now that's starting to get solved. But now the data science, that certainly no easy problem. Again, data science is a domain specific. I think you require, I mean there are enough intelligent people, I think in those domains, which will take on the advantages now that they have platform to work on, you know. I mean, we've seen what the Databricks guys are doing in Spark, on their own platform with the building on AWS, a very powerful platform where you can really go and create the data in SQL or Python. You can build these notebooks and these notebooks are basically, you can pre-build them and you can put them together to form solutions. Very, very easy to see and feel the data. Yeah. So you talked about the infrastructure's done, you're the data scientist, which is great. But it's the line of businesses that are now driving the purchase decision. When do we move beyond the data scientists and really start to push more of this down into the line of business people being able to see the data, analyze the data and take action on the data. We had Bill Schmarlser on from EMC and he was talking about, you know, kind of hospital admittance cases where you can use the factors to determine whether somebody's got a higher risk of a problem or not and potentially change the track that you get them on. What's it going to take to get to that kind of next level of execution to use this data in these two years? I think you'll see that in 2015. We basically have great partnership both with Cloud Adder and Hot and Works and we see them the focus on vertical industry, the vertical use cases. I mean, you can now see that, okay, what kind of data you're going to ingest, what transformation you had to do for each vertical use case and what utilization are required. So that all those discussions and all those templates are kind of being done right now, as you speak. So I think in 2015, you'll see a line of business kind of feeling that they have a big data solution available for them, not for the IT, but for them. Right. How do you feel in terms of the business side of the house among your customers, understanding the potential of big data, understanding some of the nuances of how to apply it because as we move from the infrastructure to the data science, the business has to get more involved because as you said, you talked earlier about that's where the conversation should start actually. What is the level of education, generally speaking, do you think out in the mainstream enterprise about the power of big data and analytics and the potential? I think the Fortune 1000 gets it. The enterprise gets it because it also, in not using big data, poses an existential threat to their business, to core to their business because you see a lot of the upstarts happening in the Silicon Valley using data as the core. And when you're competing with those companies where they capture each and every data at every instance, it becomes very difficult if you don't have that insight. So I think they get it. I think that means it's been five years, so I'm pretty much, I think, I'm pretty sure that they get it, right? Very good. And the challenge is beyond Fortune, I think 1000. Yeah, well, that's a good question. I mean, it certainly falls in line with our research and what we're finding. We're seeing, specifically if you're talking about Hadoop and some of the big data solutions, you're seeing Fortune 1000, Global 1000, they're all in. They get it, I think I agree with you on that point. Then on the other end of the spectrum, you're seeing obviously these kind of born data-driven startups. Uber is the poster child of that, but there's others, it's kind of in their DNA around big data. And so those two ends of the spectrum get it, but there's this huge middle of the market, still very large enterprises, but not maybe Fortune 1000 down to mid-sized companies and certainly small enterprises that still, I think, are very either confused about it or maybe even not even thinking about it yet. What's it going to take to break through that? Is this just a natural evolution of a market? It's going to take time. I think the natural evolution of the market and the other thing also, I think the cloud is going to become very much as a big data distation. And so we have seen a huge workloads migrating to AWS this year, last year. We've seen that. I mean, something that's... So you're talking about Hadoop workloads and... Hadoop workloads, Cassandra workloads and the new, like, thousands of instances running. So huge workloads are going into the cloud. I think the enterprises are getting really, really comfortable with the cloud. In spite of all the security and aspect, and they feel that the cloud is a better protector of the data than their IT is. Because of this challenge for keeping the security within the IT is very, very difficult. Yeah, yeah. Well, it's interesting because of course that was the knock on cloud for a while. It's, oh, well, it's the security. You can't, you trust your data outside of your own four walls, but when you actually take a step back, AWS has got better security than the vast majority of enterprise data centers. Yeah, because they have the people that can do that. Well, you don't have the disgruntled employee that works there, that goes on with this, gets a laptop. And I think what we've heard is that some of the challenges around the, not necessarily that the level of security, but does their security framework fit without your enterprise security is more of the challenge. Yeah, and if you look at the number of components you have in the Hadoop and the NoSQL stack and you have to basically secure all those components, it's extremely challenging for any enterprise, you know. What almost begs the question is, as you said, is the frameworks, the industry kind of frameworks are not being defined and kind of being hardened. If you have somebody like an Infor or somebody that makes, you know, industry specific applications for small, medium business and, you know, kind of sub 1,000 will start to make, finally, your big data applications around a specific industry vertical because they've got a pretty well-defined speeds and feeds and inputs that they can give to you. I am going to limit this thing. I think the way the Databricks is going, you know, they basically are trying to kind of suck the data from any other source directly into their platform or AWS. You can see a subscription-based analytical models for the enterprises where you can get the data from their core businesses onto the platform and the data scientist can do it or you can rent a data scientist running a Databricks platform, do the analysis for a month. Every month you get a report at the end of the day if you're a taxi driver, where are your taxis at being playing most? What is more profitable about those kind of things? I think it's going to happen. And that's how you open it up to more enterprises. More enterprises. Yeah, I think science is a service. Yeah, I think that's a really interesting angle. So we're close on time. I want to give you the last word. So over the next, you know, six, 12 months, what's kind of top of mind for you? And what's on your roadmap? I think for us, it is the starting of the solution. Mad roadmap in the first half of the set 2015. I think in 2015, second half, you see us coming along with the big guys, some vendor-specific services offering. And also the vertical-specific service offering, you know, on the use cases. I think that's what the market needs in real recent years. And this is where it really starts to get interesting, because now you're talking more about the business value, the applications, and this is where you're going to see industries starting to be really disrupted. I think so, yeah. It's going to be exciting here. Well, Manny, thanks so much for joining us on theCUBE. It was always a great conversation. We'll have you on again soon, I'm sure. Thanks everybody for watching. We'll stick around. We'll be right back with our next segment after this. Thanks.