 Okay, we're back here. This is siliconangle.com's exclusive coverage of O'Reilly Media Stratoconference. We're in the heart of Silicon Valley in Santa Clara, California for the Stratoconference. This is Silicon Angle and Wikibon's co-produced event with O'Reilly Media. This is our new segment where we drill down on the top news here at Stratoconference on day two. I'm joined by co-host Dave Vellante at wikibon.org. Go to wikibon.org for all the free research on anything to do with big data and enterprise technology, all the emerging trends. We offer free research there. And of course, go to siliconangle.com for all the latest and greatest in this show. RSA, other events happening. All the most important stories are happening on siliconangle.com. Of course, if you're watching live, you can watch the reruns on youtube.com slash siliconangle. Dave, this is an hour, a news hour, our power news and our prime time. Top story today is the Hadoop Wars are on. We heard Ed Dunbell, program chair say, I'm sick and tired of all this bickering between distributions. There's a big swath of value creation. But the top story again today is the Hadoop Wars are on for distributions. Your comments and analysis of that. Yeah, I mean, I can understand why Ed's saying that, but the fact is that if somebody's going to announce a distribution, the world is going to say, well, why? Do we need another one? And I think, I have to say, when I look at WANDISCO, I look at Intel, I look at Green Plum. There are definite reasons as to why they're doing it. WANDISCO is dealing with the active-active problem and mission-critical data. Intel, in my view, is really going hard after a security issue, which the world needs. Green Plum is, you know, doing EMC's thing. They're trying to get their slice of the pie. So I think that ultimately it's not going to matter when we look back on this, but there are going to be winners that emerge and they're going to be losers. And I think that- Do you think Green Plum overplayed their hand? So I think Green Plum overplayed their hand. I mean, I think that- We had Hortonworks on with Microsoft saying, hey, we're going at the SQL server. We have, you know, Excel and other tools out there as well for business intelligence. That's fine, it's a marketplace. Green Plum is basically going after the data warehousing market. I mean, there's not much more than that. I mean- I think if you use a poker analogy, they've got to have a little bluff on because, you know, they've got to really puff up their chest and make the world think that they're number one at this, right? And that's what EMC does. So I think if that's what you mean by overplayed their hand, yeah, I think it's a smart move. We don't know what's under their pocket aces. All right, exactly. So I think that's the strategy. That's a good one, frankly, from a marketing and positioning standpoint. But I do think that, you know, the key question that comes back, John, is is there going to be a red hat of Hadoop that emerges? And when we asked Pat Gelsing that question, he said, no, there won't be. Now, what does that mean? Does that mean there's not a big opportunity? No, in my view, that doesn't mean there's not a big opportunity. What that means is we're not going to, we, EMC, IBM, et cetera, are not going to let the world run away, you know, or red hat, the Hadoop world run away the way red hat did. We're actually going to start participating in, hey, the red hat has a $10 billion market value and we want a piece of that action. Other top stories here at Stratoconference is the sequel meets Hadoop. We had that segment on this morning. We had Ed Dunbill on the chair talking about the data science being an art and more design focused all the way down to the scientific and high performance computing and everything in between. We also had introduced two new concepts here we've gathered from the show. One is data as code, which really talks about the developer framework around big data. And the question on the table was, how can I use my unstructured data? Is it, how do I deal with data quality, data governance, data stewardship? And what's come out of that conversation here at Strata is that the developer community needs to be enabled to act on that code to get the insights and that's what we call data as code. And also the other concept that came out of Strata today that's newsworthy is this notion of shadow data. Like shadow IT has been an emerging innovation for IT although kind of in the shadows with Amazon. You're seeing the concept of shadow data. They've really hit the mark here and we're hearing rumblings that this trend is catching some steam and obviously we're reporting on that exclusively here on theCUBE. And that's really, really cool. Validated by IBM's tweet chat. We had IBM on this morning on a tweet chat validating that from their standpoint this data as code and shadow data is real. We've cross validated it here at Strata. Yeah, you know, John, I put this in the category of boring but important. We participated in that tweet chat this morning and not a lot of people talk about this sort of the management of big data, the governance of big data. And so the first question really is, okay, do the concepts that we use in the enterprise, do they apply to big data? And I think there's no question that they apply. Things like data quality and obviously security who's got access to data, being able to audit that, having some kind of taxonomical framework for your data so that you can defensively delete it or not or you can hold it. And all those concepts apply, but they don't apply in the same way that's which we've implied them technically and organizationally in the past. Why? Because we've got so much data, it's so much faster. The texture of the data is so much more diverse. So we really need new thinking and probably new technologies to solve this data management problem. Now the other thing I'd say, John, is I don't think it was solved in the traditional enterprise. So I think it's going to be less solved in the big data world. So there's a huge opportunity there. Well, that's why I get worried about some of these siloed approaches. But again, the world is coming back to a theme that we drilled down on Oracle Open World a few years ago which is purpose-built solutions versus multi-general purpose or multi-vendor supported. So obviously even here, you got, I'm going to build a purpose-built solution. We heard, Green Plums Announcement, heard Ed talk about Facebook. Building a software drives the data center design. Really interesting, right? So I don't think they're mutually exclusive and that's what I find very, very interesting. So what was your take on the tweet chat? I mean, you were very active in there, kind of holding court there for a while. What was your takeaways? Well, I mean, first of all, I love doing tweet chats because it's an emerging environment where you got crowdsource data opportunities but it's still early. I think we're building some tools but I'm really, really impressed with IBM because IBM has a lot of DNA in big data because as I said earlier, again, another trend here strata that we're going to highlight is that the word data processing is a term that's been kicked around and amplified in a relevant way. That's an old term, Dave. We can go back and remember data processing is an IT term in the glasshouse days of mainframe and kind of got diluted with client server and PC because there's a lot of distributed data but now the notion of data processing is really, really a big deal. So IBM has their finger on the pulse of data processing and that tweet chat, they assembled some really smart people on that tweet chat and Wikibon was really on there contributing a lot as value as well and they understand the issues around governance. So they're dealing with all these issues. Now we're here at strata on the emerging side dealing with the developer side but it's clear they see this trend as developer friendly, multi mashup data sets as a real big thing, not a siloed approach and I think IBM was very impressed with the fact that they get it. So I think from IBM's standpoint, the business model is really about taking technologies that are in a huge massive ridiculously large portfolio and making them work through services. I mean we know from studying this business that services is almost half, about half of the opportunity, the revenue opportunity from the TAM perspective and we all know who's the whale in services at IBM now. So a lot of people look at that and say, Scott McNeely used to say services is where companies go to die. I think what's interesting about this marketplace is that it's so explosive that it's still not yet, there's no standard general purpose platform in big data. There's no data fabric, there's a lot of land grabbing going on but more importantly, a lot of those paradigms in the past really do are relevant. I mean, I talked about the semantic web is going to be a 20 year project but data quality is a database challenge it's just going to be viewed in a different lens and my comment was earlier, dirty data is a bad user experience. So these are really, we talked to the gentleman from Berkeley last night and we're talking about the hard problems and I gave him the example of speed of light he says oh we can solve for that algorithmically and of course Google has done that with Spanner but and then we heard Ed say the real hard problems are organizational, data organization, data management, those are the really tough problems. My point I want to make about services is it's all about as Ed Dumbbell was saying it's all about the business value. If you can go into an organization and demonstrate clear business value and you can show them a path to achieve that business value they'll pay through the nose for that and that's really what services is all about. Yeah, but Dave here's the problem in this market right now and so again to kind of up level this in terms of this marketplace, you know a lot of the fragmentation around putting out new distributions is a stall tactic right in my mind people are trying to catch up in a growing market it's like putting a big rock on the river the water's going to flow around that rock and so the innovation is going to come from places no one's ever seen before and what Ed brings up is that the own IT department and corporations own governance issues is the bottleneck that's why we're seeing shadow data emerge because the data issues in the company is very complex so companies are not going to be aggressively deploying new data paradigms well in privacy, security so I think Intel's got a good bet that this is going to be years on the security side and on the performance side so I see corporations lagging on the adoption of new data approaches so because of the governance issues because of the lawsuits and compliance and what's going to force is a shadow data market so I expect the innovation is going to come in from new approaches yeah well so we're sort of jumping between topics the Hadoop distribution and the data management they're related and the shadow IT piece is actually interesting because it's just it's a nightmare from a data governance standpoint yeah can you imagine hey I want to put all our data in the cloud and I want to do this really kick-ass mobile app and no well that's why the Intel distro was interesting to me when I first heard actually a couple months ago when Intel was getting to readying a distro I had no details and I was like really, I wonder what that's all about why I couldn't like sort through the angle and of course started thinking about it and the security piece to me is the most interesting there especially when you start thinking about internet of things you need a whole new security model and I think as we've talked about many times security is in a way a do-over and the way Intel is doing it over is the right way well we're here we're going to come back on our next segment soon but just other things in the news Google continues to pimp up their single sign-on with Google Plus a new startup backed by Peter Thiel Thinkful Personalized Education Platform again, not talked about here we're trying to find some folks to drill down on this but education is going to be disrupted by big data we're seeing with Khan Academy you'll see it in a variety of other solutions learning will be impacted just like everything else and anti-trust proceedings in the works against the nation's cable companies possibly signaling the end of pay TV again, Netflix was highlighted here LinkedIn was talking about big data big data is going to be a lever of innovation from TV to applications to governance and David's exciting so a great time to be in big data and whoever said big data is dead that was venture beat and pointed out is really, really off base so we're here on the ground covering it obviously big data is what was our wheelhouse Wikibon, you have the most research out there go to wikibon.org for more research we'll be right back with our next segment here on our prime time news hour Stratoconference day two lift out all the programs out there and identified a gap in tech news coverage those shows are just the tip of the iceberg and we're here for the deep dive the market begged for a program to fill that void we're not just touting off headlines we also want to analyze the big picture and ask the questions that no one else is asking we work with analysts who know the industry from the inside out so what do you think was the source of this missing so you mentioned briefly, there are if that's the case then why does the world need another software we're creating a fundamental change in news coverage laying the foundation and setting the standard and this is just the beginning good morning I'm Kristen Folletti and welcome to news desk on SiliconANGLE TV for Wednesday February 27th, 2013 Kim.com and Apple have two very different approaches when it comes to email here to discuss the latest trends in cloud services is SiliconANGLE contravene editor John Casoretto good morning John good morning thanks for joining us we've discussed dot com and his mega service on the show before and with its launch in January Kim.com's mega has been open to the public for a while now the cloud storage site offers more than a place to host your files but also gives each user a personal email account dot com has recently announced plans to incorporate end-to-end encryption for the email service specifically saying we're going to extend this to secure email which is fully encrypted so that you won't have to worry that a government or internet service provider will be looking at your email. John why do you think dot com wants to offer this encrypted email service to users? Well I think that there's a number of others number of people that worry about a government or a company or internet service provider that's gonna be looking at your mail so for those that are concerned about privacy dot com seizes as an opportunity to provide a service that people want you know we know about the recent Google situation where they provided countless emails over to the government without so much as even a warrant so and this goes on all the time that's just the tip of the iceberg there's many others that likely do the same it just hasn't come to light yet. Since mega's launch in January and now has more than three million registered users who have stored a total of 125 million files in the first month of operation according to Kim dot com it took competitor Dropbox two years to achieve that. Why do you think mega has had such a successful launch? Do you think it's because of Kim dot com's massive free storage offering of 50 gigs or is there more that users are finding appealing about his philosophy on cloud storage? What's your opinion? Well I think it's been a number of things I think that there was a lot of curiosity when the service was first announced with much pomp and circumstance you know they gained quite a bit of notoriety more than more than ever since all of this stuff happened since you know the shutdown of the original mega so you know a lot of people interested in what's going on and what this service is and I think that many people however I think what's really happening is a lot of people are concerned about the privacy of their files but you know you're right that the storage offering is certainly something that's attracting people to the cloud storage philosophies they have to help I think that it's just you know it's a whole mix of things that are bringing people over to this mega and these announcements are constant and it seems like they're quickly evolving and you know trying to find a user base and they're finding it. Speaking in London on Tuesday Francis Moore the chief executive of the IFPI which represents record labels internationally said that the move against.com's former file sharing service mega upload has had a positive effect saying action by governments and courts have had a major impact and cloud locker services have seen a major reduction in traffic since the action against Kim.com John what's your reaction to Moore's statement do you believe the action taken against Kim.com has overall had a positive or a negative impact on the cloud community. Well clearly Moore has access to information that we don't so you know we have to take that a face value and just kind of look at what he's saying and kind of talk about that but you know I think that there was probably an attrition of some casual users for sure but there's been a lot of spreading into other services as well. I would have to estimate that the community is probably back on the upswing at this point while you know whatever statistics he's looking at from whatever point of reference or period of time that he's referring to we just don't really know what that what he's really talking about there so we have to take that with a grain of salt. In a contrasting move from mega secure email system Apple is decreasing the privacy in their iCloud service in order to prohibit its use for activities they deem inappropriate. Their new measures include a system that audits emails for specific language and will interpret the emails transmission if the email includes language that Apple has declared unsavory. John do you think this policy will have an effect on iCloud users? Well I think absolutely at any time privacy is lost that that's a tough thing for people to accept especially when it comes to live in a very public way and some may find that unsettling. The question is you know why is Apple interrupting these email transmissions as you know that they catch these things coming out? You know was it an attempt to prevent spam or are they actually censoring materials? It's hard to say. We've talked time and time again about how important it is that users read the terms of service. Technically what Apple is doing is completely within their bounds but at what point is it invasion of privacy to have a system which scans your emails and automatically emits content that they find inappropriate even though the messages could be completely legitimate? Well I think that's a great question for the community to ask. You know what is acceptable? You know what's the limit here? Do you want your messages filtered for content? I mean if it's spam that's one thing the mechanisms are quite often very much the same and I think that when you think about your provider of email you know how do they address that as far as you know their official policy and you know that's the thing that we're looking at is you know this whole question of you know what are the terms of service go through read those things and just figure out you know is there any type of filtering that you're not comfortable with? Do you think this is an honest attempt by Apple to remove spam from their system or in trying to project an image are they forcing users to stay clean or opt out of their service? What's your take? Well you know that's the thing. You know I personally I think that this is probably an attempt at blocking out spam this particular incident that has brought this to light but you know it raises a good question that Apple simply just hasn't answered yet. You know do they aim to have some kind of censorship or message control or is this you know truly a spam tactic that was just happening to get caught in this particular incident? Thinking back to Instagram's terms of service issues that we've discussed on the show consumers obviously want ownership over their content. We know that. Do you think companies like Apple will see consumer pushback due to this especially with services like Mega offering the polar opposite a completely private and encrypted email service? You know that's a great question and it opens up it opens up the debate of whether Mega you know could it be used for spam and other types of email abuse? You know that's the other side of the coin there. Freedom versus abuse you know and Mega has traditionally run in a way that you know that they're not accountable that they've sought out that pocket of legality and culpability and they count on that great area to operate. I think that in the end I think that we'll see that Apple comes out of this saying hey you know we do some filtering or it was just the spam incident and you know the pushback I mean I think Apple's popularity and the amount of abilities that people have a take email with them on the go and under MacBooks and this and that I think that they'll look past us because most people quite frankly don't behave and you know have certain phrases that would be caught and most people would probably not even really notice. Well John it's definitely been a very interesting conversation this morning thanks so much for taking the time with us. Thank you. And your social angle daily news roundup is next here on SiliconANGLE TV. We looked at all the programs out there and identified a gap in tech news coverage. Those shows are just the tip of the iceberg and we're here for the deep dive. The market begged for a program to fill that void. We're not just touting off headlines. We also want to analyze the big picture and ask the questions that no one else is asking. We work with analysts who know the industry from the inside out. So what do you think was the source of this missing so you mentioned briefly there are if that's the case then why does the world We're creating a fundamental change in news coverage laying the foundation and setting the standard and this is just the beginning. Welcome back to Stratoconferences to SiliconANGLE's exclusive coverage of Riley Media Stratoconferences. This is our one hour news breaking analysis a concept that we've introduced early the year where we take the breaking news and we analyze it. I'm John Furrier the founder of SiliconANGLE.com joining with Wookie Vaughan analyst Dave Vellante and Jeff Kelly to break down the news here on day two and an aggregate to the show. Jeff Kelly, I want to ask you about the Hadoop distribution market in particular right now obviously the hot top story and then we're going to get to business value but let's talk about what's going on with the Hadoop distribution market. We sat down with Wendisco and a variety of other companies AMC and others. What's your take on the thing? Dave, you want to jump in on that? Yeah, so Jeff you were talking to Wendisco you were talking to Hortonworks. Now Wendisco announced the Hadoop distribution focused on active, active, high availability really trying to solve the problem of mission criticality within Hadoop. Hortonworks, other end of the spectrum totally open source really trying to push that so what's your take on these two companies and then you can introduce the segment. Sure, well there's different approaches to making Hadoop enterprise ready that's one of the keys to this market. The other of course is adding some of that functionality that SQL like functionality into Hadoop to make it more accessible to business users and of course just you know there's things like security and of course making contributions back to the open source community. So these are some of the topics we talked about went out onto the floor. Wendisco in particular was focused on really the nonstop name known that's really their innovation taking the active, active data replication technology that they've been applying across the way and with their customers for years applying that now to Hadoop with their acquisition of AlphaStore and they've got their own distribution to do that. We've got a demonstration really showing very actually kind of dramatic running Hadoop live. It killed one of the name nodes and you can see dramatically the activity drop off and another name node just picks right back up. So watch this, watch Jeff Kelly doing the booth crawl on the show floor here at Strato right back. Hi everybody, this is Jeff Kelly from the Wikibon Project. I'm here at the Santa Clara Convention Center where Strata Conference 2013 is taking place this week. We're about to head out and talk to HP, Hortonworks, Wendisco and DataDirect Networks about what's going on on the Hadoop market. Specifically we're going to touch on the Hadoop distribution wars which are really heating up and of course the question on everyone's mind is Hadoop Enterprise ready? My name is Konstantin Budnik, I joined Wendisco about six weeks ago from KarmaSphere and before that I was doing Hadoop development at Yahoo and Cloudera and in Wendisco I'm in charge of Hadoop distribution, Wendisco Hadoop distribution, which is not simply yet another Hadoop distribution but an engine that allow us to deliver a very interesting bleeding edge, I would say, technologies to the Hadoop market. And these technologies are namely nonstop name node with my colleague Konstantin will talk about a little bit later and a technology that allows you to use Hadoop clusters for private clouds. So basically the applications that are run on S3 file system from Amazon could be seamlessly transferred to Wendisco distribution using our proprietary S3 HDFS bridge. Among the other advancements we put into the Hadoop distribution is much better user experience for cluster users. And the main thing is that we are the first pretty much commercial company that provides Hadoop to support and Hadoop to base distribution of the full Hadoop stack. So we are fully committed to open source and we using another Apache project called Big Top that I am one of the co-authors actually to build a distribution. And as a shameless plug we've built the full distribution from ground zero to the working commercial product in just 28 days using open source technologies Big Top particularly. Thank you Jeff. So what we have here is the industry's first multiple active name node solution for HDFS. And we have three name nodes serving the entire data space. So what we're going to do is start Terrasort. The clients are going to connect to all three of the name nodes. Then we're going to kill one of the name nodes. We'll see Terrasort continue, fine, no interruptions. We'll see the other two name nodes pick up the load and we'll see the dead name node flat line basically. That's the demo. Okay, what we have here are three graphics applications showing the activity in each of the name nodes. They're showing RPC bytes in and bytes out. I've prepped it so we already have Terrogen data in the HDFS. I'm going to run Terrasort now and we'll see the activity pickup. In the next few minutes we'll see activity on the name nodes pickup. There we go. The orange lines indicate bytes sent and the green lines are bytes received by the name nodes. Let's give it a few minutes to get really active. Then we'll go and do the unthinkable, which is to kill one of the name nodes. That's a catastrophic failure in most HDFS. In our solution, we simply switch to the other two name nodes and life goes on. Terrasort is uninterrupted. So obviously you guys have been putting a lot of hard work into this. As we kind of look forward to the rest of the year, what's on your roadmap in terms of initial development you're looking at to the product? We have planned for a world file system. This is HDFS that runs across multiple data centers. The result is a single namespace that's spread across multiple data centers and you can run your jobs on whichever data center is near your data. The result is something like, if you have a data center go down or multiple data centers go down, you can still have access to your data and you can still run your jobs. That's coming up next. We also have plans to use our Active Active Replication Technology for the Yarn resource manager and for the HBase master. Excellent, all right, great. Well guys, obviously a lot more coming from WinDisco this year, so keep your eyes peeled for what they're doing. Thanks for joining us. Signing off. I'm here at Stratoconference with Jim Walker at the Hortonworks booth. Jim is a product manager for Hortonworks. Welcome. Good morning, Jeff, how are you today? I'm doing great. So obviously a lot of news happening here at the show. Hortonworks among the companies really releasing some interesting things. One thing recently was the Stinger project. Why don't you tell us about that? Yeah, so Stinger is really, it's an initiative I guess if you will within the Hive community. So Apache Hive has really been the de facto standard for SQL interaction with Hadoop for years, right? It originally came out of Facebook, I guess, in about the 2008 timeframe, placed in the open community so that the community could build it out and that's really kind of the genesis of SQL interaction. There's been a lot of conversation about SQL the past, oh god, couple months, right, Jeff? So within the community itself, we really kind of led an initiative to kind of lead a couple of different prong attack to really speed up Hive. Hive is great for kind of batch processing. Everybody thinks of Hadoop as batch processing, all these things. So people are using Hive to do things like dashboarding or kind of the use cases that don't need instant kind of human interactive time frames, yeah? So really our approach is to really embrace Hive and let's fix Hive so that it can move into the more interactive use cases and so that's really what the Stinger initiative is all about. And it's really about kind of optimizing the engine within Hive. It's about optimizing the way the data is stored. We have something called the optimized RC file which is the file that lays in there. I know Owen O'Malley on our team working with a lot of guys at Facebook and again the wider community to actually bring this to the market and bring it to the community really. So you mentioned a few partners, Microsoft among them. So obviously partnering is a core part of your strategy and it goes right along with the open source nature of ACP. So why don't you talk to us about really the philosophy behind that open nature and why that's so important to what you guys are doing. Yeah, so open source is hugely important, Jeff and I've been in this open source space for quite some time. I think what's different here at Hortonworks is really we think about it as more of an open community. And I think the term open community better describes what Hadoop is really all about. I mean, I'm a next developer, right? And so I'm really proud that I'm in a space where a developer is king. There's kind of a little bit of a match in terms of who has the committers to Hadoop, right? Because that core bit, the bits are being developed by a bunch of guys and they all work together and they're all friendly and that's all fantastic. And we're all building companies on top of this stuff. And so, you know, I'm really proud of that, but when I take a step back and talk about the ecosystem and how this all comes together, you know, it comes down to our fundamental difference in the way that we go to business and the way that we think about what we're doing is, you know, our job is to make sure that we understand the enterprise and we gather the requirements that are necessary for widespread adoption, you know, and either by talking with customers, by working with the projects that I've already been using, Hortonworks data platform, and then the experience of our team, you know, the operations experience at Yahoo, you know, the data science guys that have been working on this for years, understanding those requirements and then taking those requirements and putting them into the open community. You know, we really believe the fastest path to innovation is that open community. And if you put the right requirements into that funnel, what comes out the other end is just really stable, reliable, and quite honestly, vetted across a wider community. So you think about like, you know, like the Stinger stuff we were talking about, right? It's, you know, it's Facebook, it's us, it's Microsoft, it's a bunch of different people involved, not just Hortonworks, right? We really feel that, you know, I mean, the stuff that we can do in that engine is really, really fast. And then, you know, quite honestly, the third bit of this is we have a responsibility. If we're gonna talk about the enterprise, we're gonna be enterprise to do, we need to apply enterprise rigor to our distribution. So, you know, stable, reliable, rock solid distribution is absolutely what we're all about. It's about making sure we have the most stable releases across all the projects, right? That make up a distribution. You know, version one, of course, we're working on version two. The fact, you know, that we aren't gonna bring out version two until we know that it's stable and reliable. And we're gonna know because we're testing it at Yahoo. You know, Yahoo is an investment partner, but they're also a development partner. And, you know, that's huge for us because when we come out with a new version of our distribution every three months, by the way, or quarterly cadence, you know, we're testing on 400, 600 nodes at Yahoo. And that just really allows us to do some things with the software that, you know, it's not really easy to do. And that's a, it's a huge, huge benefit to us and really the overall community, I think. So, and they're helping on, you know, making sure that that Hadoop is stable and reliable and use ride spread. So that's, I mean, that's really what we're all about. All right, fantastic. Well, those are some great conversations. Thanks so much to Hortonworks, HP, DDN, and WAN Disco for their insights on Hadoop and the evolving landscape. Dave and John, now back to you. Okay, we're back here live at the Stratocomps, this is Silicon Angles News Hour. We're blank at coverage here at the Stratocomps at Raleigh Media and we have an analyst, Dave Vellante, chief researcher and big data research analyst, Jeff Kelly. Jeff, you're out on the floor digging into all the data. What can you share? What did you learn on the floor out there today? Well, you know, in addition to WAN Disco's, you just always spoke with Hortonworks as well and they're doing some really interesting things around bringing that SQL-like access to Hadoop. Of course, that's a big theme this week. Really bringing that kind of SQL-like capability to Hadoop really opens up the platform to a lot more developers as well as, I should say, a lot more business users as well as more traditional business intelligence-type tools. So Hortonworks is working on really making Hive enterprise-ready to take on that job and with something called the Stinger Initiative, which we just heard a little bit about. So they're doing some very interesting things and of course they're really focused on that open source strategy when it comes to partnering and on the technology itself. So really, some really interesting things. What's the general sentiment on the show floor in general, I'll actually, Data Stack's got a really cool booth in there. It's Sandra's obviously a part of the Big Data Survey. What other things are you seeing out there on the floor? What's the activity like? Well, again, I think it's tracking very closely to the things we've been talking about. You know, this is, it's really interesting to be out there when you've got, you know, it's a collegial atmosphere but at the same time, there is some tension growing as you can tell. This is really turning into a shooting war when it comes to Hadoop and the whole sequel aspect of it. You know, you've got Intel next to Green Plum who've just been competing in announcements next to Cloudera and Hordeworks and Wendisco with their distribution. So it's really, you know, everyone's kind of smiling on the surface but you can tell there's some real competition happening now. There's a lot of money to be made and clearly this is a real market. There's no question about that. What's some feedback? We have 20 seconds. Feedback, you're getting on your market's sizing, second research reports, free at wendisco.org slash big data. Obviously, another groundbreaking report. What's the general feedback? Overall, you know, happy to say the feedback's been positive. I think the market is really looking for this kind of information which is really why Dave and I decided to undertake this last year. But you know, overall it's been positive feedback. You know, there's some question around, I think the biggest, the biggest, the most controversial part of it is, of course, how do you define big data? What are the parameters to include when you're, including revenue from larger companies where only a small portion of their revenue is big data? So that's kind of been the most controversial part. The feedback I get is thanks for all that great research for free, that's what people tell me. You guys do some great work. Okay, we'll be right back with our next guest or your deep dive, Steve Kenison, Storage Alchemist coming on to do a deep dive on the storage implications and storage innovation because big data would be nothing without the storage, fast storage, cold storage, hot storage, whatever you want to call it. Without storage, you can't do what the people are doing here. So a lot of new technology to look at and that's going to be a key part of the next segment. We'll be right back after this short break. We looked at all the programs out there and identified a gap in tech news coverage. Those shows are just the tip of the iceberg and we're here for the deep dive. The market begged for our program to fill that void. We're not just touting off headlines. We also want to analyze the big picture and ask the questions that no one else is asking. We work with analysts who know the industry from the inside out. So what do you think was the source of this missing? So you mentioned briefly, if that's the case, then why does the world need another software? We're creating a fundamental change in news coverage, laying the foundation and setting the standard. And this is just the beginning. Hey, welcome back. This is a Stratoconference O'Reilly Media's exclusive coverage with SiliconANGLE Wikibon. I'm John Furrier. I'm joined by my co-host. I'm Dave Vellante at wikibon.org and this is a special deep dive segment. We're here with the storage alchemist who is Steve Keniston. Steve's actually with IBM, but he's here. He's a longtime friend of theCUBE. Really used to be an independent analyst and brings that perspective. He's been at the show, crawling around, talking to practitioners, talking to technologists, former CTO, and Steve, thanks for coming on. Thanks a lot, Dave. I appreciate it. Yeah, so I think this is your first strata, right? This is my first strata. So what do you think? I mean, what have you seen? Maybe give us the take from the show. We get the deep dives on theCUBE, but we don't spend as much time in the side sessions as you have. Yeah, Dave, so I think it's really interesting. I came here with, again, it being my first strata, not having a lot of expectations and trying to understand where the world is going with big data. My approach has really been from the aspect of what's the hardware side look like? What is the, you see EMC, you see NetApp, you see a lot of the vendors talking about the big iron that it's going to take to drive big data and what it's going to take. And I really wanted to understand that and understand what the technologies were, the underlying technologies and data storage technologies were going to be for that. I haven't heard a lot about that at the show this year. What I've been hearing a lot of is what's the software? How is the data going to really be valuable to your organization? How do you extract the value from it? Where does the data go? How does it live? Where does it die? What's the governance of that data? A lot of that, but not a lot around the where is it stored and how do I manage that? The data management of big data I think is going to be very crucial to how IT shops actually then turn that information set over to the folks that need to consume that and actually extract the value. It's going to be difficult. Well you mentioned software, I mean you're seeing everything as software led, right? I mean we've been talking about that a lot and it sounds like that's exactly what you're seeing at this event. Exactly, it's been all about the software components that actually help you suck the data from the different open source communities, pull it into some data repository and then go and apply the analysis on top of that. And it's very interesting. This whole software led X, whether it be software led infrastructure, software led storage, you're hearing more and more about the capabilities and I always talk about the storage services, the value components that sit on top of the spinning, IBM doesn't necessarily always like it, but I call it the spinning rust, right? But at the end of the day it's the software that adds the value and we're hearing a lot more about the software, these software applications or software capabilities that for example, like inside of Hadoop or MapR, having things like snapshot and capabilities, replication capabilities. Now if that's the case, do I want to spend for an expensive storage where? Well in some cases you do, right? From a performance standpoint or from a security standpoint or from a speed of getting things done standpoint. But if I can, I know it's going to grow exponentially, so hard, so fast, and I really want to control my costs, where do I end up spending my money? I want to spend it in the valuable stuff and that typically ends up being the software. So I mean, you've worked at some big iron companies, right? IBM obviously sells a lot of big iron, EMC sells a lot of big iron. You've also worked at some software companies, you were Veritas and others. So what do you think? You've seen for the last couple of decades, the industry has marked up Seagate disk drives, right? But the value has been in the software, but that software's been locked inside of the array, for example. So what do you see happening going forward? How do companies continue to thrive and continue to get the types of margins that they've been used to, or does that go away? I think you hear a lot more about the commoditization of everything and the commoditization of data, the commoditization of software, but more importantly, you're hearing more and more about the commoditization of the hardware, right? The commodity hardware is becoming where I start to, especially in the big data world, where I start to deploy a lot of my applications and extract a lot of the value from the data. It's not the spinning disk that actually sit on the floor. So I think you have to play this balance. You play this balance between getting the margins you can from the value that's in the software and offsetting it from the value or the margins that you're not going to make necessarily in the hardware. So is it the case that the traditional storage guys are not here in force because they're not selling their hardware and they haven't figured this model out yet? Is that a different part of the organization that has to figure that stuff out? I mean, for instance, obviously Green Plum's here in a big way. Is that what companies have to do is to sort of transform themselves to take advantage of that software-led infrastructure or get crushed? No, I do think there needs to be a little bit of a transformation, a big transformation. So you think about the fact that data protection, and we've talked a lot about data protection in the past, you talk about data protection capabilities slowly moving into the software, but at the end of the day, I need some back-end repository in order to be able to store that. Well, the question is maybe for data protection, it's kind of an offshoot, but when I store that information, where is it going? Dave Floyer, Wikibon analyst, wrote a great piece called Flape. This combination of flash storage and tape. So tape is very inexpensive. It's an inexpensive media for storing vast amounts of information, and then you have flash, which is really important for extracting the value out of your big data repository in a very, very fast manner. So if I can then move what might be then stale information to that secondary repository with some smart software, I still need to invest in that secondary repository. And then that brings to light the question, right, so now I have flash-based capabilities sitting up front on all flash array, for example, where I can really start to leverage and extract the value from the software. So before you say, you know, do the vendors really need to start to understand or kind of worry about them themselves that are selling the big iron get taken out? I don't necessarily think that's the case. We've all seen technology kind of evolve. You know, and the next evolution is going to be, for example, these all flash arrays. It's now becoming, and again, David, Floyer wrote a nice piece about this, where these all flash arrays can be so much less expensive when trying to drive, you know, big data or data that sits on top of a database, you know, it can be a lot less expensive to run it in an all flash array versus perhaps maybe running in a spinning disk array, right? So now you're going to see these transformations start to take place combined with software and then how do I continue to leverage the software, the hardware in the back end, for example, like tape? And we've heard a lot about in-memory databases this week, John, and Flash is obviously playing a role for metadata. Well, I think, David, what's here is interesting. What George Alchemist is kind of teasing out here is, and Steve, is that software lead is a thesis that we've put forth, Wikibon in particular, wrote the first research paper, again, another similar moment in groundbreaking research from Wikibon, that's not software defined. It's not, because it's not defined. Software-defined networking, software-defined infrastructure is where everyone's going. It's certainly not defined, so I think software leads a better term. But here, appliances are kind of passing, although DDN is more of a higher end one on HBC, which totally makes sense, but in general, the storage equation is interesting. Software's a big, big, big part of it, but no one's talking about some of the things like compression, and we've had these conversations with Steve before. And the storage aspect is really, really impactful here because developers just want it to work. They don't want to deal with, under the covers, BS around management and dealing with data transformations and different data sets working differently because of some proprietary look. So Steve, I want to ask you that on that. Comment on that from a perspective of, storage should be just a place where data goes, but the latency and throughput are huge issues. You've got virtualization out there, you've got some compression. How do you look at the latency and the throughput issue? Some are claiming great performance is insane, but they're not counting latency. The throughput numbers are great, but latency's horrible, right, so. Yeah, I think what you're seeing is, what we're hearing a lot about, and I think you brought up a very good point, John, right? At the end of the day, what is the customer looking for, right, and at the end of the day, they don't want to have to deal with all the integration components. I think about 10 years ago, I was an analyst with the Enterprise Strategy Group, and we talked about, and that's when the IBM SVC platform was originally launched, okay, so 10 years ago. The first thing everybody came out and started talking about was, hey, this is a really great set of capabilities, it would be awesome if we had a disk array, or a set of disk shelves that were underneath it, so that it was just easy to all manage in one kind of block of capabilities. Now what's happening, you're starting to see that smart software, for example, be extracted further into a platform like an SVC, and now it's heterogeneous, so now I can do all the things that I really want to do, one simple set of management capabilities, sitting on top of all of my disk, and then I can make the more difficult decision, so now I say, okay, now what am I looking for? As a consumer, I'm now looking for performance. Well, if I can virtualize everything underneath that, if I can virtualize all of my storage and have my flash also play a part of that entire virtualization tree, and then migrate data down to my lower cost storage, I can control my costs, it's simple to manage, which is what consumers want, and it's all, and I'm getting the performance that I require, and then at the end of the day, John, you're exactly right, when it comes to this big data thing, speed is you're going to end up being your competitive advantage, right? So if I can get better performance, and I'm going to need that better performance, to get that answer sooner, to make that business decision sooner, to get that data so that I can drive my business, that's what I'm looking for. What have been some of the most interesting conversations that you've heard in the marketplace out here and the hallways, I mean, obviously, O'Reilly media is known for having really amazing thought leadership conferences, obviously, the program's phenomenal, a lot of alpha geeks, but now you have the bottoms up community growing, fast top down, and all the action's kind of happening at the intersection of that, but it's the hallway conversations that people want to know about, okay, what are you hearing? So I think, actually, I want to take it from a different perspective, I want to talk a little bit about what I'm not hearing about, right? So what I'm actually not hearing about is the fact that if we take a step back and we look at how cloud really evolved, right? Cloud was this concept, and then a lot of the big iron folks, they got involved in cloud, and then they started selling cloud to very large vendors, right? So big companies were deploying cloud, and then now you're starting to see three, four, five years later, it's trickling down to the small to medium enterprise, and at the end of the day, when you kind of break apart all the fanciness and you figure out what does cloud really mean, you hear it means like I need agility in my business, I need flexibility, I need scalability. Well, I think big data's the opposite, and what I'm not hearing is I'm hearing a lot of, big data's huge, it's big, it's massive, I need to extract information, I need to figure all this out. But what I'm not hearing about is the fact, so let's take a quick example, I was just in Vietnam about three months ago, and they asked me, what were you here to speak about? And I said, I'm here to speak about big data, and they said, well, we don't really have big data here in Vietnam, we're a small country and we're not collecting a lot of information. But the reality is, is people just don't understand of all these open APIs, I actually have the ability to suck in important pieces of content, munge that with some of my business analysis tools that I have already, maybe my sales database, and extract some real value from that, and it doesn't have to be hundreds of terabytes or petabytes of information, it can be a few terabytes of information. So I would really like to see, and I'm not hearing this, but I actually really believe, and I did hear a little bit of this walking around, is that I think small companies who take advantage of big data, you know, the concept of big data and utilizing big data and utilizing that information to make better business decisions, can really compete, small banks could now really compete with folks like Bank of America or Citibank, and actually, you know, I don't want to say do damage, but actually grow and be very successful, because they understand a lot about their businesses, they understand a lot about their customers, that sort of thing. You said to me last night, one of the things you heard at the conferences that you said, some marketeers at large organizations, certain large organizations are afraid of big data. On the other hand, you have this meme of how the CMO is going to outspend the CIO in big data. So what did you mean by that, and where did that come from? So I heard a really interesting conversation, or a really interesting presentation from someone from Accenture, right? So Accenture, you know, you were just talking in your last segment a little bit about the fact that services, services led, right? That's pretty big. Well Accenture is really all about services, but they had some statistics about the fact that you can go out and gather all of this information, and a lot of this information is being driven by IT. So you might have a line of business manager come in, and they might start talking a little bit about, you know, okay, we've helped you, we've collected this information, then they hand that data over. But, you know, okay, I've got this information, but I wasn't necessarily the one, that you know, the marketeer wasn't the one asking the question to the data set. So now I've got some answers, but I don't really know what to do with those answers. So I quickly have to retreat to what I know best, and just go do my marketing project. So I do think there's a lot of services that need to be involved to help folks ask better questions of that data set, in order to be able to then turn that into real value and make decisions using that information. So you also said that you feel like startups are a better position to leverage the data. So what did you mean by that, and what are you saying? I think startups have a lot of flexibility. I think they have a lot of innovation, and I think they actually have, they know better what questions that they want to ask to data sets, and they know what data sets they want to go out and collect, to then ask that question, to then move their business to the next level. Whereas I think big corporations kind of get sedentary and kind of stuck in their ways. They don't necessarily always know what to ask. Okay Steve, final comment I want you to break here and end our news hour, power hour here. Final comment, make it brief, bumper sticker, storage innovation here at Strata. What's happening and what needs to be worked on? Give us your quick sound bite. I really want to see what we can do in a $150,000 range, sub $150,000 range to implement a solution, and that's from a hardware standpoint, to implement a solution for small to medium sized businesses to be able to be super competitive in this highly competitive world where it's just not all about the cash. It's really about being flexible and about being knowledgeable. Okay, this is SiliconANGLE's exclusive coverage. That's our power hour prime time here, 12 to one at every CUBE event. You'll see us, we're going to break down the news analysis for one hour, and that's going to be the full package. And then we're back to our in depth blanket coverage day to day. This is day two of day three days of wall to wall coverage, nine to five every day live here at Stratoconference co-produced with O'Reilly Media. This is the CUBE, SiliconANGLE's flagship program. We go out to the event, extract the signal from the noise. We'll be right back with our next SWAT like segments after the short break.