 Live from the Fairmont Hotel in San Jose, California, it's theCUBE at Big Data SV 2015. Hello everyone, welcome back. This is theCUBE, our flagship program. We go out to the events, they extract the seed from the noise. I'm John Furrier. We're here with IBM to talk about big data, big data analytics and we're doing a first ever crowd chat simulcast of live feed with IBM. So guys, we're going to try this out. It is that go to crowdchat.net slash Hadoop next and join the conversation and our guest here, Rob Thomas, Vice President of Product Development, Big Data Analyst at IBM and Beth Smith, General Manager of IBM Analytics Platform. Guys, welcome to theCUBE. Thank you. Welcome back and so IBM, obviously, we're super excited to next week is obviously interconnect your big event. You guys mashed up three shows for the mega shows and Aerosmith's playing. So I'm just going to say I'm from Boston Air so I'm really excited about Aerosmith and all the activities of Social Lounge and whatnot. But we've been following you guys. The transformation of IBM is really impressive. You guys have certainly taken a lot of heat in the press in terms of some of the performance size in the business but it's pumping right now. You guys seem to have great positioning. The stories are hanging together. Huge customer base, huge services. So we're at the big data world which tends to be startup driven from the past few years over the past phase one. The big companies came in and started saying, hey, there's a big market here. Our customers see demand in that. So I've got your take on as we're coming into interconnect next week. What is the perspective of big data? Obviously, Watson has garnered headlines from powering toys to jeopardy, to solving huge world problems. That's a big data problem. You guys are not new to big data. So when you look at this big data week here in Silicon Valley, what's the take? Sure, so I'll start off with Beth can I add in. So our big focus is how we start to bring big data to the masses. And we start to think in terms of personas. Data science plays an increasingly important role around big data and how people are accessing that. The developer community, and then obviously the line of business community which is the client set that IBM has been serving for years. But the announcers that we've made this week around Hadoop are really focused on the first two personas in terms of data scientists, how they start to get better value out of Hadoop, leveraging different tools. So we'll talk about what some of those are. And so we're really starting to change it about, Hadoop is ultimately about insight. It's not about infrastructure. Infrastructure's interesting, but it's really about what you're getting out of it. And so that's why we're approaching it that way. And what's your take? Well it ties naturally to IBM strategy around data cloud and engagement. And data's really about using the insights, which like Rob said, it's about the value you can get from the data and how that can be used then to transform professions and industries. And I think when we bring it back to big data and the topic of Hadoop, I think frankly it has gotten to a point that clients are really beginning to say it's time to scale. They're seeing the value and the technology, what it can bring, how it gives them some diversity in their data and analytics platform, and they're ready to now scale their workloads as a part of it. So the theme is Hadoop next, okay? So that takes us right to the next point, which is okay, what's next? So phase one, okay, we've got some base position validation, okay? This new environment's customers are at want though. So what is next? I mean we're hearing things like in memory's hot, obviously Spark has proven that there's an action in memory that kind of says, okay analytics at the speed of business is something that's important. You guys are all over that and we've heard some things from you guys. So how do we get to the next part where we take Hadoop as an infrastructure opportunity and put it into practice for solutions? What are the key things that you guys see happening that must happen for the large customers to be successful? So I think that actually ties into the announcements we made this week around the open data platform because that's about getting that core platform to ensure that there's standardization around it, there's interoperability around it, and then that's the base. And that vendors and clients are coming together to do that and to really enable and facilitate the community to be able to standardize around that. Then it's about the value on top of that, around it, et cetera. It's about the workloads and what could be brought to bear to extend up that. How do you apply it to real-time streaming? How do you add things like machine learning? How do you deal with things like text analytics? I mean, we have a client situation where the client took four billion tweets and we're able to analyze that to identify over 110 million profiles of individuals. And then by integrating and analyzing that data with the client internal data sources of about seven or eight different data sources, they were able to narrow into 1.7 million profiles that matched at least 90% precision. Now they've got data that they can apply on buying patterns and stuff. It's about that. It's about going up the stack. We're gonna talk for hours now as my mind's exploding, privacy, creepy. I mean, a persona is as relevant because now you're talking about personalization. I mean, collective intelligence has been an AI concept. We try not to be creepy. You know what I mean? The users go, okay, cool. But now, so that brings us to the next level. I mean, you guys were talking about cognitive as a word you guys kick around. Also systems of engagement. Systems of record is an old term that's been around in the old data warehouse and dates fenced off resources of disk and data. But now with systems of engagement, real time in the moment immersive experience, which is essentially the social and or kind of mobile experience. What does that mean? I mean, how do you guys get there? How do you make it so it's better for the users, more secure? I mean, these are hot button issues that kind of lead us right to that point. So I'll take it that a couple ways. So first of all, your first question around Hadoop next. So Hadoop was no longer just an IT discussion. That's what I've seen change dramatically in the last six months. I was with the CEO of one of the world's largest banks just three days ago. And the CEO is asking about Hadoop. So there's a great interest in this topic. And so why? So why would a CEO even care? I think one is people are starting to understand the use cases of the place. So Beth talks about entity extraction. So how you start to look at customer records that you have internally and your systems of record to your point, John. And then you, how do you match that against what's happening in the social world which is more of the engagement piece? So there's a clear use case around that that changes how clients work with their customers. So that's one reason. Second is huge momentum in this idea of a logical data warehouse. We no longer think of the data infrastructure as, oh, it's a warehouse or it's a database. It's also meaning not physically tied to something. Not tied to just what a relational store. So you can have a warehouse but you can scale and Hadoop. You can provision data back and forth. You can write queries from either side. That's what we're doing is we're enabling clients to modernize their infrastructure with this type of a logical data warehouse approach. When you take those kinds of use cases and then you put the data science tools on top of it, suddenly our customers can develop a different relationship with their customers and they can really start to change the way that they're doing business. Beth, I want to get your comments. We have the crowdchat, crowdchat.net slash Hadoop. Next some commentary coming in. Obviously transforming industries, billion tweets, killer for customer experience. So customer experience and then also the link about the data science into high gear. So let's bring that now into the data science. So the logical stores, okay, it makes sense with virtualization. Things are moving around. You have some sort of cognitive engines out there that can overlay on top of that. Customer experience and data science. How are they interplaying? Because this came out in some of the retail event at New York City that happened last week. Point of purchase, personalization, customer experience, data science. It's all rolling together. What does that mean? Unpack that for us and simplify it if you can. It's kind of, it's complex. It's a big topic, you know, it's a big topic. So a couple of different points. So first of all, I think it is about enabling the data scientists to be able to do what their specialty is and the technologies have advanced to allow them to do that. And then it's about them having the data and the different forms of data and the analytics at their fingertips to be able to apply that. I think the other point in it though is that the lines are blurring between the person that is the data scientist and the business user that needs to worry about how do they attract new customers or how do they create new business models and what do they use as a part of it. I do think we're also seeing that line blurring. One of the things that we're trying to do is help the industry around growing skills. So we actually have big data university. We have what, 230,000 participants in this online free education. And we're expanding that topic now to again go up the stack to go into the things that data scientists want to deal with like machine learning, to go into things that the business user really wants to now be able to capture as a part of it. So I'm going to ask you guys kind of more because it could be a product question and or kind of a market question at IBM's TED at IBM event in which you talked about a big medical example and one of her favorite use cases. But she made a comment in there, active data. Active data is not a new term for the data geeks out there but we look at data science, lag is really important. Near real time is not going to make it for airplanes and people crossing the street with mobile devices. So real real time means like the second latency is really important, speed. So active data is a big part of that. So can you guys talk about passive active data and how that relates to computing and because it's all kind of coming together. It's not an obvious thing but she highlighted that in her presentation because obviously with medical, medical care is obviously urgent in the moment kind of thing. So what does that all mean? I mean, is that something customers should be paying attention to? Is it viable? Is it doable? So certainly viable. I mean, it's a huge opportunity and I'd say probably the most famous story we have around that is the work that we did at the University of Toronto at the Hospital for Sick Children where we were using real time streaming algorithms and a real time streaming engine to monitor infants in the neonatal care facility and this was a million data points coming off of a human body, monitoring in real time and so why is that relevant? I mean, it's pretty basic actually. If you extract the data, you ETL it somewhere, you load in a warehouse, then you start to say, well what's going on? It's way too late. We're talking about at the moment you need to know what's happening and so it started as a lot was in the medical field with some of the examples that you mentioned but real time is now going well beyond the medical field. Places from retail at the point of sale and how things are happening to even things like farming. So real time is here to stay. We don't really view that as different from what I would describe as Hadoop next because streaming to me is part of what we're doing with Hadoop and with Spark, which we'll talk about in a bit. So it's certainly, it is a new paradigm for many clients but it's going to be much more common. Yeah, Ashley, if I can add, there's a client, North Carolina State University, it's where I went to school so it's a client that I talk about a lot but in addition to what they do with their students, they also work with a lot of businesses on different opportunities that they may have and they have a big data and analytics sort of extended education, business education project. As a part of that, they are now prepared to be able to analyze one petabyte in near real time. So the examples that you and Rob talked about of the real world workloads that are gonna exist where real time matters are there, there's no doubt about it, they're not going away and the technology is prepared to be able to handle the massive amount of data and analytics that needs to happen right there in real time. You know, that's a great point. I mean, these flagship examples are kind of like lighthouses for people to look at and kind of the ships that kind of come into the harbor, if you will, for other customers. Ashley, you always have the early adopters. Can you guys talk about where the mainstream market is right now? Ashley, from a services standpoint, you guys have great presence and a lot of accounts. Where are these ships coming into? Which harbor, where are the lighthouses? Obviously medical, you mentioned some of those examples are bringing in the main customers. Is it the new apps that are driving it? What innovations and what are the forces and what are the customers doing in the mainstream right now? Where are they in the evolution of moving to these kind of higher end examples? So I mean, so Hadoop, I'd say this is the year Hadoop where clients have become serious about Hadoop. Like I said, it's now become a board level topic. So it's at the forefront right now. I see clients being very aggressive about trying out new use cases. Everybody, really across every industry is looking for one thing, which is growth. And the way that you get growth if you're a bank is you're not really going to change your asset structure. What you're going to change is how you engage with clients and how you personalize offers. If you're a retailer, you're not going to grow by simply adding more stores. It might be a short-term growth impact, but you're going to change how you're engaging with clients. And so these use cases are very real and they're happening now. So Hadoop is a boardroom discussion or big data? I mean, I just can't see boardroom. I thought we should have more Hadoop. Do you see that? I've seen it over and over again. I'll tell you where you see a lot from is companies that are private equity-owned. The private equity guys have figured out that there's savings and there's innovation here. Every company I worked with that has private equity ownership, Hadoop is a boardroom discussion. And the idea is, how do we modernize the infrastructure because it's because of other forces, though. It's because of mobile. It's because of cloud that comes to the forefront. So absolutely. So let's take Hadoop. So Hadoop is a great batch. It's great. There's a lot of innovations going on there. Boardroom in these private equities because, one, they're cutting edge. Probably they're like an investment they want to see and realize pretty quickly. Speed is critical, right? I would infer that was coming from the private equity side. Speed is critical, right? So speed to value. What does that mean for IBM and your customers? How do you guys deliver the speed to value? Because that's one of the things that comes out in all the premises of all the conversations is, hey, you can do things faster now. So value on the business side, what do you guys see that? Sure. So a lot of different ways to approach that. So we believe that, as I said, when I said before, it's not just about the infrastructure. It's about the insight. We've built a lot of analytic capabilities into what we're doing around Hadoop and Spark so that clients can get the answers faster. So one thing that we're gonna be, we have a session here at Strata this week talking about our new innovations in Big R, which is our algorithms, which are the only our algorithms that you can run natively on Hadoop where your statistical programmers can suddenly start to analyze data and drive that to decision-making, as an example. So we believe that by providing the analytics on top of the infrastructure, you can change how clients are getting value out of that. So how do we do it quickly? We've got IBM software, so we've got our Hadoop infrastructure up on the cloud. So anybody can go provision something and get started in hours, which is not something that was the case, even a couple of years ago. And so speed is important, but the tools and how you get the insight is equally important. How about speed to value from a customer and deployment standpoint? Is it the apps or is it innovating on existing? Beth, what do you see? Well, I think it's both, actually. And so you talked earlier about system of engagement versus system of record. And I think at the end of the day, for clients it's really about systems of insight, which is some combination of that, right? We tend to think the systems of engagement are the newer things and the newer applications, and we tend to think the systems of record are the older ones, but I think it's a combination of it. And we see it show up in different ways. So I'll take an example of Telco, and we have a solution, the Now Factory, and this is now about applying analytics in real time about the network and the dynamics, so that for example, the operator has a better view of what's happening for their customers, their end users, and they can tell that an application has gone down and that customers have now switched all of a sudden to using a competitive application on their mobile devices. You know, that's different, and is that new applications or old, or is it the combination? And I think at the end of the day it really comes to a combination. I love these systems of insight. I'm just gonna write that down here inside the crowd chat. So I gotta talk about the holy grail for big data analytics and big data from your perspective, from the IBM's perspective, and two, where you guys are partnering. Obviously here is a show of rich targets of AccuHires, acquisitions, partnerships. I mean, it's really a frill ground, certainly Silicon Valley and in the growth of big data, cloud, mobile, and social kind of these infrastructure and things coming together. What is a holy grail from your perspective? Obviously systems of insight teases that out. Cognitive is a message we've heard. So what is the holy grail? And then what are you guys looking for in partnerships and within the community of startups and or other alliances? Sure. Do you wanna start with the holy grail? Yeah, so, so you know, I think at the end of the day it is about using technology for business value and business outcome. I really think that's what set the spirit of it. And so if I tell you why we have, for example, increased our attention and investment around this topic, it's because of that. It's because of what Rob said earlier when he said the state that clients are now in. So that's what I think is really important there. And I think it's only gonna be successful if it's done based on standards and something that is in support of heterogeneous environments. I mean, that's the world of technology that we live in and that's a critical element of it, which leads to why we're a part of the open data platform initiative. So on the piece of analytics, I saw a comment about R, for example. I was just mentioning the crowd chat. I have Microsoft's just brought revolution analytics, which is not R, which is a different community. Is there a land grab going on between the big guys? I mean, IBM's a big company. What do you guys see in that kind of area in terms of acquisition targets? Yeah, man, I think the numbers would say there's not a land grab. I don't think the M&A numbers have changed at a macro level at all in the last couple of years. I mean, we're very opportunistic in our strategy, right? We look for things that augment what we do. I think it's related to your question around partnering, but what we do acquisition is not only about what that company does, but it's about how does it fit within what IBM already does? Because we're going after a rising tide in terms of how we deliver what clients need. I think some companies make that mistake. They think that if they have a great product, that's relevant to us. Maybe, maybe not. But it's about how it fits in what we're doing. And that's how we look at all of our partnerships, really. And we partner with global systems integrators, even though we have one within IBM. We partner with ISVs, application developers. The big push this week, as I described before, is around data scientists. So we're rolling out data science education on Big Data University, because we think that data scientists will quickly find that the best place to do that is on an IBM platform, because it's the best tools. And if they can provide better insight to their companies or to their clients, they're going to be better off. So I was, yes, that I was commenting on, and certainly the end of last week and earlier this week about Twitter, you know, it's a lot of commentators figuring it out and people are confused by Twitter versus Facebook. And I know IBM has a relationship with Twitter, so that's why I popped in my head. And I was saying, hey, Twitter's got great value. And so I was on the side of, you know, Twitter's a winner. I love Twitter. I love the company. Misunderstood, certainly, I think. In this market where there's waves coming in more and more, there's a lot of misunderstanding. And I think I want to get your perspective, if you can share with the folks out there, what is that next wave? Because it's confusing out there. You guys are insiders, IBM, I would say, like Twitter is winning, doing very well. Certainly, we're up close to you guys. We're deeply reporting on IBM, so we can see the momentum and the positioning. It's all in line with what we see is where the outcomes will end up being for customers. But there's still a lot of naysayers out there. Certainly, you guys have had your share, as Twitter's as an example. So what is the big misunderstanding that you think is out there around the market we're in? And what's the next wave? I mean, there's always waves coming in. If you're not out in front of that next wave, you usually drift with as the old expression goes. So what is that big misunderstanding in this kind of converged, kind of hyper-targeted with analytics? This is all new stuff, huge opportunities, huge shifts and inflection point, as Bob Pachiano said on theCUBE. It's kind of both going on at the same time. You shift and an inflection point. So what's misunderstood and what's that next big waves? So let me start with the next big wave and then I'll back into the misunderstanding. So the next big wave to me is machine learning. And how do you start to take the data assets that you have and through machine learning and the application of those type of algorithms, you start to generate better insights or outcomes. And the reason I think it's the next big wave is it may be one of the last competitive modes out there. If you think about it, if you have a corpus of data that's unique to you and you can practice machine learning on that and have that either data that you can sell or to feed into your core business, that's something that nobody else can replicate. So it becomes incredibly powerful. So one example I'll share with you and I wanted to bring you my book but it's actually not getting published until next week. So maybe next week. But so Wiley's publishing a book I wrote and one of the examples I give is a company by the name of CoStar, which I think very few people have heard of. CoStar is in the commercial real estate business. They weren't even around a decade ago. They have skyrocketed from zero to $500 million in revenue. And it's because they have data on four million commercial properties out there. Who else has that? Absolutely nobody has that kind of reach. And so they got a unique data asset. They can apply things like machine learning and statistics to that. And therefore anybody who wants to do anything in commercial real estate has to start with them. So my point is you're starting to get the point where you have some businesses where data is the product. It's not an enabler, it's the actual product. And I think that's probably one of the big misunderstandings out there is that data is just something that serves our existing products, our existing services. We're moving to a world where data is the product and that's the mode. I wrote a post in 2008 called data is the new development kit. And what you're basically saying is that's the competitive advantage. A business user can make an innovation observation about data and not be a scientist and change the game. That's what you were saying earlier. Similar? That's right, that's right. Okay, so next big wave misunderstanding. What's your take on, what are people not getting? What is Wall Street? What is potential, well the VC is usually on the front end of some of the innovation but what is the general public not getting? I mean we are in a shift in an inflection point. What's the big shift in misunderstanding going on? So I would tend to actually agree with Rob that I think folks aren't yet really appreciating and I guess I would twist it a little bit and say the insight instead of just the data. But they're not realizing what that is and what it's gonna give us the opportunity for. I would retire early if I actually could predict everything that was gonna happen. But if you think about it, if you think about mid to late 90s and what we would have all thought that the internet was gonna allow us to do compared to what it actually allowed us to do is probably like night and day. And I think the time we're in now when you take data and you take mobility and you take cloud and you take these systems of engagement and the way people, individuals actually want to do things is similar but almost like on steroids to what we were dealing with in the mid 90s or so. And so the possibilities are frankly endless. And I think that's part of what people aren't necessarily realizing is that they have to think about that insight, that data that actually has some value to it in very different ways. Yeah, there's a lot of disruptive enablers out there and there's a lot to look at but finding which ones will be the biggest. It's hard, I mean, you get paid a lot of money to do that, I guess. If you could figure it out and keep it a secret. What you didn't, your machine learning is now out there. You just shared with us a competitive advantage so everyone knows. No, everyone kind of knew that, kind of in the inside. But not everybody's using it, right? I mean, I think another example, a company like Intuit has done a great job of, they started off as a software company, they've become a data company. I think what I've observed in all these companies is you can build a business model that's effectively recession proof because data becomes the IP in the organization. And so I actually, I think for us, those that live in the world, we think this is well understood. I don't think it's that well understood yet. Inside of Mike. Right, and when we first started doing big data research and working with thousands of clients around the world, there were six basic use cases. It started of course with the customer, the end customer and the customer 360 and that sort of thing. It went through a number of different things around optimization, et cetera. But the additional one is about those new business models. And that clearly in the last 12 to 18 months has become a lot more of what the topic is when I'm talking to clients. And I think we will see that expand even more as we go in the future. Well we have a lot of activity on the crowd chat, crowdchat.net slash hadoopnext. And I'll mention that we can probably extend time on that. If you guys want to keep it going, the conversation is awesome. And we're getting the hook here. So we'll move the conversation to crowdchat.net slash hadoopnext. Great thought leaders have been, I can go on this for an hour. You guys are awesome. Great to have you on theCUBE. Thank you. So much to talk about. A lot of ground will certainly see it interconnect go. Final question for you guys is what do you guys see for this week? Real quick, summarize. What do you expect to see unfold for Big Data Week here at Silicon Valley, Big Data SV? So I think, a lot of what we talked about. Machine learning is going to be a big topic. I think there'll be a lot of discussion around the open data platform that Beth mentioned before. It's a big move that we made along with another group supporting the Apache Software Foundation. I think that's a big thing for this week. But it should be exciting. All right. Guys thanks for coming on theCUBE IBM here. Inside theCUBE we're live in Silicon Valley. We'll be right back with our next guest with a short break. I'm John Furrier. This is theCUBE. We'll be right back. theCUBE crates are a business operations manager at LinkedIn and you're watching theCUBE. I'm Chris Selland, VP of Business Development for HP Big Data and you're watching theCUBE. Hi, I'm Stacey Slaughter, Senior Vice President of Communications for the Giants. I'm in the garden at AT&T Park and you're watching theCUBE. I'm Thomas Minick, Business Intelligence Consultant within our works and you're watching theCUBE. Hi, my name is David Tishkar, Director of Partner Marketing at Cloudera and you're watching theCUBE. Hi, I'm Jim Yu, Founder and CEO of BrightEdge and you're watching theCUBE. Citizen scientists are very important in the data collection efforts in studying climate change because there are not enough resources by the National Park Service to do these data collections. I love data and I love National Parks and I'm here to help in any way I can through data science. EarthWatch is an organization that does citizen science giving people intense one or two week long experiences where they come and participate actively in a science project and that can be a transformative experience. Engaging the public in doing the science gives them a deeper understanding of the problems and they're able to collect data in the inner title or watch birds. People who are not professional scientists out there asking questions, helping to answer those questions, analyzing the data. I think it's such a treat to be a citizen scientist. It reminds you of your childhood of getting in touch with these things that were all around you and are so easy to let fall by the wayside. It's so easy to pick up your phone and forget. Going out and touching and interacting with science it's just like being a kid again being that invigorated to learn. We picked periwinkles off of the inner title and learned about different species and then we put them back gently. We're very good at collecting the data and sending it off to these different databases but we're bad at making those data discoverable. We are compiling the list of data sources so that we can put together a data lake where all data related to climate change will be stored and will be queried and will be analyzed. It would be data about weather. It would be data about the abundance of different species. The disappearance rate of different species. It is gonna help them solve those problems at a massive scale which they so far haven't been able to do by looking at smaller chunks of data. We're on a highway for bird migrations. Most of those birds are arriving later so they're migrating later in the fall but the fruits are ripening earlier in the fall and so these birds are arriving after many of the fruits have ripened and maybe aren't even around anymore and so they don't have the food they need to fuel up to finish their migration on their way to Central or South America and so you can end up losing a lot of those birds. That's the sad part of it and what I try to do is also think of the optimistic side, the part that we can do something about and that's where I think this partnership with EMC could help us really make a meaningful difference. Points in your life like this just reignite that fire. Within a few hours I think people got it. How EMC technology and how Pivotal's technology can result in these people being able to do new science, to be able to get new discoveries, to be able to do their job, not just better but in a way they've never been able to do it before. A real data lake solution that EMC is already in the process of building. It gives us a really great opportunity to dig in and give them access to tools and to visualizations that can inspire the future generations of scientists. Imagine you entering a portal where you see interesting relationships between different climate variables and its impact on a certain plant and animal species and see the impact visually. Every person in the room is thinking about how it grows, how it becomes a bigger effort, how can we incorporate more data, what's coming next. I want kids to look back in textbooks and think that we were the generation that had the opportunity and we took it. We did everything we could. We didn't say, oh well, somebody will deal with it later. When we first started talking about partnering with EMC on a project like this to help with citizen science, help with conservation, I was surprised that they were interested in it and have been really impressed by the level of staff and the quality of the people that are here, really talented people that bring a lot of experience and different perspectives, really exciting to see the engagement and how much EMC really does want to make a difference. Tell me and I may forget, teach me and I may remember, but involve me and I learn and to be able to make a small dent which reverses this process would be a huge satisfaction for me as a data scientist and more importantly for me as a human being on this planet. Live from the Fairmont Hotel in San Jose, California, it's The Cube at Big Data SV 2015. Okay, hello everyone, I'm John Furrier with SiliconANGLE, The Cube and I'm here live in Silicon Valley for Big Data SV 2015. This is our second time in Silicon Valley, our fourth big data event where we are out getting all the data from what's happening in the industry and sharing that with you in conjunction with Stratoconference Hadoop World which is going on right across the street again. We are bringing live interviews, coverage analysis here with The Cube, our flagship program. We go out to the events and extract the seeds of the noise. I'm with Jeff Kelley, your co-host for the week here at Big Data Anthem Wikibon.org and we're excited to share with you all the exciting news analysis events and there's a ton to talk about. So Jeff and I will be here all week bringing in three days of wall-to-wall coverage here in Silicon Valley and San Jose around Big Data and a lot of news, people going public, new startups are coming out of the woodwork, big players and the theme really is follow the money and we're excited to have a big event tonight. Jeff Kelley will be sharing some research and we're having a party tonight at seven o'clock here at the fair months of your watching, spring by at five o'clock for our presentation and then party at seven here at The Cube again. Our fourth event, second time, second year here in Silicon Valley, Big Data SV. We have a crowd chat, crowd chat.net slash bigdataweek is the URL, come join the conversation. Jeff, exciting week, we have tons of news. Obviously Pivotal and Hortonworks and Industry announced an open data platform. Cloudera has kind of hinted they're going to go public, release some numbers around their earnings and just in general the philosophy for the startups is I got to find a home and I got to find some customers. So I want to get your take. I mean, what do you see? You've been here on the ground with Pivotal. What's the story, what's happening? Yeah, so I was at the Pivotal event yesterday when they announced a few things. The open data platform and sent to a new industry consortium focused on hardening the Hadoop core to enable more enterprise adoption. Some pretty big names that are part of that group, the ODP, so you've got Pivotal, you've got Hortonworks, IBM, you've got GE, Verizon, some others. So that was a pretty big announcement, as you mentioned. From continuing on Pivotal and with their news, they've open sourced their entire big data suite of products. That's the Green Plum Database Hoc, which is essentially their massively parallel analytic database that runs native on Hadoop. Gemfire for essentially more transactional big data. So they're going all in with open source between the ODP and with that announcement around open sourcing those tools. They've tightened their alliance with Hortonworks. They're going to make all those big data tools. The Pivotal tool is now able to run on Hortonworks data platform, their Hadoop distribution. So a lot happening there. Almost, you could say in the other camp, we've got Cloudera who made some announcements as well, announcing that they've hit $100 million in revenue in their fiscal year, 2015. So you're interesting, that's the first Hadoop distribution player to hit that mark. So that's a big milestone for them. I think what we're seeing here is the swim lanes are starting to really solidify. You've got the open data platform and that group of companies on one side. And then you've kind of got Cloudera and Intel kind of in their camp, which is actually not a bad thing for the market. You're starting to see some contraction, some consolidation, I should say, in terms of the different players. And it's good that there's a couple different options that customers have when it comes to Hadoop and big data, but I think you're seeing this market mature and that's a good sign. From a practitioner standpoint, what we're looking to talk about this week, of course, is where does it stand in terms of adoption of Hadoop generally, but big data even more broadly in the enterprise. So we'll be very interested to hear from some of the practitioners that are going to join us on theCUBE over the next couple of days about that. What we're seeing in the market, talking to the Wikibon community and the research we're doing, we're finding there's two ends of the spectrum in terms of adoption of Hadoop. You're seeing the big global 1,000 companies, they're all going in with Hadoop for sure, there's no question about that. On the other end of the spectrum, you've got some of the smaller, what I would say, born data-driven startups where they have big data kind of built into their DNA. And they're going in certainly with Hadoop and some of the other developing technologies, Spark, et cetera. So you've got those two factions, but then you've got this big middle, the rest of the enterprise landscape. And there's not a lot of action happening right in that space. We'll be talking to practitioners, we'll be talking to the vendors, we'll be talking to the VCs, how are we gonna move that forward so we start to see some adoption beyond those really big enterprises and then of course those really nimble, exciting data startups. And then in terms of the state of adoption or the success of those that have adopted Hadoop, we'll talk about that. And the fact is, even some of those early adopters are struggling. It's still a challenge, there's still a lot of different pieces that you need to put together to make big data work. And even some of the big banks and some of the big retailers who are kind of the big name brands that are out there and are touted by the Hadoop companies is, hey, they're using Hadoop. Even those companies are struggling a little bit. So we'll talk about that and what it's going to take to kind of move that forward. Let's take a step back and just look at the big picture. So obviously we've been covering big data since really the inception of the industry. We saw Hadoop come on the scene and explode. Obviously that became a huge trend. Seeing a standardization now happening with Pivotal and Hortonworks in the industry rallying around a standard that's in direct conflict with what Cloudera is saying. They obviously didn't join the alliance. Then you have the general enterprise market and then also the capital market. So I want to get your perspective. We had the big data of New York City event which really focused on Wall Street, the capital markets in New York. Here in Silicon Valley, we're looking at the money side of the private side. So you now see Cloudera is not yet public. Hortonworks is public. And then you have a slew of startups out there trying to find a position, if you will, for Swimlanes, as you say, for the big guys and the little guys are trying to find spots. So I mean, where are we? Obviously phases of evolution. We've seen phase one. We've been talking about that on theCUBE, you know, the early adopters. Are we in phase two? Are we crossing the chasm? You know, we were speculating that big data might not be an industry but might be more cloud driven. Where's the. The side. So factions kind of solidify. I would put, you know, one bucket of the ODP. That's Pivotal, Hortonworks, IBM and GE and some of that group. On the other side, you're seeing Intel and Cloudera. I might even throw Amazon into that bucket as well. So I think you're starting to see that which is just, I think a natural thing in the market. You start to see a couple of dominant players start to emerge. In this case, not individual players but factions. Now the interesting question, of course, is where Strata Hadoop World is happening this week and if you go down to the show and you go to the show floor there's dozens of startups out there in the space. Many of them focused on fairly niche tooling in the larger big data stack. And the question is, what's gonna happen to those players? You know, one of the challenges of this market if you're a startup is that big data requires and we've learned this, you know, by talking to a lot of practitioners in the enterprise but big data requires more of a platform approach. There are a lot of moving parts and for this to go mainstream most enterprises don't have the internal resources bandwidth, expertise, what have you to bring together all these disparate tools bring them together, cobble them together into their own platform. They are looking for more of a one-stop shop for big data. We saw this in the data warehouse space to some extent where the appliance model bringing the hardware and software together drop kind of data warehouse in a box drop it in your data center. You know, that really has become the standard approach for data warehousing. And we starting to see that in the big data space. Now the interesting caveat there of course is open source where you didn't have that in the data warehouse space. And the good news is that innovation I believe is gonna continue even as we start to see some consolidation and more of a platform play around big data. But back to those little startups I think the challenge for them is going to be now how do you build a business around a particular tool that needs to fit into a larger platform the big data stack if you will that itself needs to fit into a larger data management stack needs to fit into a larger infrastructure and some of the innovation that's happening on that side in terms of cloud and virtualization. So it's gonna be hard for those players to build I think long term sustainable businesses. What I think the good news is I think some of those players out on that floor today will have some good exits. You're gonna because there is gonna be consolidation in this market and some of the tooling that they're creating is very valuable. So for some of those there's gonna be a good exit for others you know there's not gonna be such a good exit but you know that's the nature of a new market and venture capital flowing in and that's just the way market works. We can't we can't not talk about the big three Cloudera Horton works and MapR obviously the innovators early in phase one honestly are they gonna get the leverage in this phase two what's the early adopters on the enterprise look like and also where the capital market obviously those big three are heavily funded what's your take on those three guys? Well if you look at those three players like I said I think you're seeing a couple of different factions play out or develop I should say they are extremely well capitalized between the three of them I think raised over $1.6 billion over the last several years and of course Hortonworks has gone public and raised another 100 million plus in their IPO so they're very well capitalized the question is where's all that money gonna go and why is there a need for that much capital in this market because isn't software supposed to be not so capital intensive in terms of building out a business the challenges in the Hadoop business are a few things one from a distribution perspective you've got to it takes time to build a business based on open source software you've got the community to contend with and then you've also got to deal with your channel you've got to break into the more traditional way of enterprises buying software so that takes time and of course you've got to compete with the mind share you've got to compete from a mind share perspective with all the big players who have lots of money and are very invested in this business as well whether it's Microsoft or Oracle or whoever so that's where a lot of this money's going you know what's gonna play out in terms of who's gonna be around in five, 10 years it'll be interesting to see I mean I think this market can sustain one maybe two players at a large scale but you mentioned three vendors so you know we'll all there's a lot of other little guys I mean there's a lot of consolidation going on that's your premise of your research you're sending that tonight at five o'clock tonight I have an event, what's your take on consolidation? Garnage, AccuHires, IPOs, what's going on? I think you're gonna see continued consolidation we've already started to see it kind of on the periphery we've seen companies certainly in the open source business intelligence space with Ventaho, Jaspersoft, being acquired Revolution Analytics I think you're gonna continue to see that happen in the Hadoop space in particular because like I said a lot of these different tools while very valuable really only become extremely valuable when they're part of a larger platform so you're gonna definitely see some consolidation there some acquisitions and I think that's natural it's good for the market and the enterprise is telling us this this is what they're, at the pivotal event yesterday there was a, they had a customer panel one of the customers I think said it pretty well Mitsubishi Financial and they talked about well we need to take a platform approach because we don't have the time we're not in the business of piecing together all these little tools and technologies into a platform we do wanna kind of a one stop shop at the same time they don't want lock in and the good news is open source to some extent alleviates that risk of vendor lock in and that's what the ODP is all about and of course there's some debate going on with ODP you've got Clara making some comments about how it's kind of antithetical to the open source way and I think there is a legitimate question that the ODP has to ask, has to answer I should say which is why you need such an organization what is ODP going to do that the Apache community can't do? So I gotta ask you yesterday we had a crowd chat it's still open right now crowdchat.net slash big data week and obviously Dave Vellante laid down some interesting comments who you can make up this week it's snowed in in Boston so Dave if you're watching we miss you wish you were here to add some of the commentary because you know what's clear is that big data seems to be announced but I was speculating and Dave was kind of teasing it out that it's kind of transitioning to a new wave coming in and you know things happen in waves wave one, wave two is the big data wave dying out or being diluted if you will and moving to a whole nother level whether that's virtualization you're seeing a lot of conversations on our crowd chat about converged infrastructure you still have storage you got VMware doing some things we covered that at recent announcement so is the dynamics at infrastructure scale affecting the big data business and if Hadoop can become standardized and you still didn't see a rally around that is this next wave going to be something different is it going to look different the same what do you see, obviously big data it's not just data warehousing that's one aspect of it you know I'll see there's other things that could propel this growth and so I'm speculating there's a bubble that's going to burst is there another wave because bubbles don't burst if there's another wave coming and that's what Dave was basically saying I think there's definitely another wave coming and it's characterized by three things so number one you're starting to see this consolidation around Hadoop as really the core foundation of a big data platform things like the ODP are going to accelerate enterprise adoption I think there's no question that's kind of part of the next phase like I said the good news is you know the innovation isn't going to stop and that's because of the open source community so there's going to be continued innovation from no one else from the practitioners out there who are developing these tools themselves the data born startups companies like Facebook and Google and even Netflix is now open sourcing some of their Hadoop related tools so you're going to continue to see innovation and that's what I think one of the key factors that open source brings to this market and then the last and really important part of this phase is where it gets really interesting is new applications and specifically applications around the internet of things and the industrial internet that's where it's going to get exciting where you're actually we're moving beyond kind of the container oh how do we store and do some processing of all this data how do we save a little bit of money on our data warehousing moving from that to how do we do things in a totally different way how do we develop new lines of business drive new revenue how do we find efficiencies do you think about some of the things GE is doing around predictive maintenance kind of things you just a little bit of a shift in terms of your efficiency there can mean big money so I think that's where you see a lot of the action and that's what's going to be very exciting in the next phase of big data well it's going to be exciting week I'm looking forward to looking at a couple key areas on my radar which is the overlay between different markets I see cloud is exploding you got the converged infrastructure mark I'll call that infrastructure and you got big data I'll call that apps and others where you got DevOps you got Internet of Things big data warehousing and also infrastructure with virtualization and stuff overlaying so I think it's going to be a concentric circles around those markets and what's interesting is is that the interplay between those forces is going to be very interesting to watch and I see big data kind of diluting into its own little market where Internet of Things traverses DevOps and infrastructure so I think that's clearly a big way but I'm really interested to see what I call infrastructure software software eating the world by Mark Andres and certainly a relevant soundbite but the reality is is that are we just in a new market called infrastructure software and I think that is where this concentric circles come in so I'm really interested to dig into these forces we'll be covering it for three days this is theCUBE live in Silicon Valley in conjunction with Strata Conference and Hadoop World this is big data SV our exclusive event here Silicon Angle, Wikibon we'll be right back with our next guest at this short break stay tuned for three days of wall to wall coverage we'll be right back it's a business operations manager at LinkedIn and you're watching theCUBE I'm Chris Sellen VP of Business Development for HP Big Data and you're watching theCUBE Hi, I'm Stacey Slaughter Senior Vice President of Communications for the Giants I'm in the garden at AT&T Park and you're watching theCUBE I'm Thomas Minnick Business Intelligence Consultant within our works and you're watching theCUBE Hi, my name is David Tishkart Director of Partner Marketing at Cloudera and you're watching theCUBE Hi, I'm Jim Yu Founder and CEO of Bright Edge and you're watching theCUBE Citizen scientists are very important in the data collection efforts in studying climate change because there are not enough resources by the National Park Service to do these data collections I love data and I love national parks and I'm here to help in any way I can through data science EarthWatch is an organization that does citizen science giving people intense one or two week long experiences where they come and participate actively in a science project and that can be a transformative experience Engaging the public in doing the science gives them a deeper understanding of the problems and they're able to collect data in the inner title or watch birds people who are not professional scientists out there asking questions helping to answer those questions analyzing the data I think it's such a treat to be a citizen scientist it reminds you of your childhood of getting in touch with these things that were all around you and are so easy to let fall by the wayside it's so easy to pick up your phone and forget going out and touching and interacting with science it's just like being a kid again being that invigorated to learn we picked periwinkles off of the inner title and learned about different species and then we put them back gently We're very good at collecting the data and sending it off to these different databases but we're bad at making those data discoverable We are compiling the list of data sources so that we can put together a data lake where all data related to climate change will be stored and will be queried and will be analyzed it would be data about weather it would be data about the abundance of different species the disappearance rate of different species it is going to help them solve those problems at a massive scale which they so far haven't been able to do by looking at smaller chunks of data We're on a highway for bird migrations most of those birds are arriving later so they're migrating later in the fall but the fruits are ripening earlier in the fall and so these birds are arriving after many of the fruits have ripened and maybe aren't even around anymore and so they don't have the food they need to fuel up, to finish their migration on their way to Central or South America and so you can end up losing a lot of those birds that's the sad part of it and what I try to do is also think of the optimistic side the part that we can do something about and that's where I think this partnership with EMC could help us really make a meaningful difference Points in your life like this just reignite that fire within a few hours I think people got it how EMC technology and how Pivotalist technology can result in these people being able to do new science to be able to get new discoveries to be able to do their job not just better but in a way they've never been able to do it before a real data lake solution that EMC is already in the process of building it gives us a really great opportunity to dig in and give them access to tools and to visualizations that can inspire the future generations of scientists Imagine you entering a portal where you see interesting relationships between different climate variables and its impact on a certain plant and animal species and see the impact visually Every person in the room is thinking about how it grows how it becomes a bigger effort how can we incorporate more data what's coming next I want kids to look back in textbooks and think that we were the generation that had the opportunity and we took it we did everything we could we didn't say, oh well somebody will deal with it later When we first started talking about partnering with EMC on a project like this to help with citizen science, help with conservation I was surprised that they were interested in it and have been really impressed by the level of staff and the quality of the people that are here really talented people that bring a lot of experience in different perspectives really exciting to see the engagement and how much EMC really does want to make a difference Tell me and I may forget Teach me and I may remember but involve me and I learn and to be able to make a small dent which reverses this process It would be a huge satisfaction for me as a data scientist and more importantly for me as a human being on this planet Hi, I'm George Matthew, president and CEO of Altrix and you're watching theCUBE I'm Dimitri Zamin I'm the CTO and co-founder of StackStorm and you're watching theCUBE Vice President of Product Strategy at QLogic you're watching theCUBE Hi, my name is Ben Jones Senior Product Manager of Tableau Public Tableau Software and you're watching theCUBE Hi, this is Francois from Tableau and you're watching theCUBE