 From San Jose, in the heart of Silicon Valley, it's theCUBE, covering Big Data SV 2016. Now your hosts, John Furrier and Peter Burris. Hey, welcome back, we are here live in Silicon Valley for theCUBE's special presentation of Big Data SV in conjunction with Strata Hadoop. It's Big Data Week here in Silicon Valley and we have the companion New York City we did earlier in the year and then now in Silicon Valley we talk to all the smartest people around Big Data and Jill Horowitz is here, Strategy and Business Development Executive at IBM, CUBE alum, welcome back. Thanks guys. Things are rocking and rolling. You did a lot of work as a developer, now you've got to increase the role at IBM, a little bit broader scope because the world is going cloud. You guys got Blue Mix, a lot of things going on with analytics, Watson. So IBM is taking a different approach than say some of the other vendors because you've got the weather company. Who buys the weather company, right? I mean, a lot of good stuff, but it's a different approach. What's the update? I mean, because data's still the heart of it, big time for you guys. Yeah, I would say that, Doug Cutting said it very well this morning. I don't know if you caught his keynote over across the way, but he described, we've been working with Hadoop for the past 10 years and what he described was it's a really, it's an enabling technology and in the past 10 years, things have changed quite a bit for Hadoop. We have better hardware that's leaning more now into memory as opposed to disk. We have a huge ecosystem that has grown, not only around, I would say Hadoop, but introducing a number of other capabilities. So what all of that is actually enabling and what he said today, which resonated with me is talking about digital transformation. And I know that's kind of maybe an antiquated or maybe older theme that we've heard before in the internet age, but this time I would say there's new merit here because what's fueling that transformation is the data and the value that you can gain from digitizing your business that you can reap through data. So that's exciting to me. The other talk really quick that I heard this morning that I thought was great was Ian Andrews from Pivotal. And I think Pivotal was actually one of the earlier groups that focused on the application development side. And as we all know, and I think as we'll all appreciate, building applications on Hadoop was never easy. And now I would say with the introduction of Spark, with Apache Spark, as well as a number of other folks that are coming together, there's a new, I would say, application development stack that's emerging. So on the digital transformation, be specific, because they've been talking about ecosystem, open source innovation that was part of his talk as well. It seems to be the same message over and over again. Of course, it's open source, we love that. But the notion of digital transformation really has to be rendered in the apps. And we're still waiting for the tsunami of apps to come or are the apps already here? That's the big we had yesterday. So I wanna get your thoughts on that because you go back to Hadoop World 2011, 2012, Mike Olson was up there with Ping Li from Excel saying we're gonna put $100 million fund together for funding apps. And so where are they? Or is it just native and everything? Well, I would argue that no individual vendor or investor can bring together all of the bits and pieces to help a business become truly digital. So our strategy over the past few years has really been to build an ecosystem of partners like we're doing with Apple, like we're doing with Box, like we did at first with a weather company until we acquired them. So really partnering across to say, okay, what are the key workflows and capabilities and frankly, actions that our clients are taking that up until now are either paper-based or extremely manual or analog, right? The opposite of digital. What are all of those things that people are doing today that you could move to digital? And I would argue that my company IBM, we're really in a great position to bridge a lot of those gaps. So I would say to put a finer point on it, anywhere where there are paper-based processes, right? I mean, that's ripe for disruption. So one of the things I want to just highlight is we covered with theCUBE at Interconnect, your big cloud show. It was pretty apparent that the digitized everything message, let's not listen to formal message, that was my takeaway, which is IBM is saying, look, everything's got to be digital. The full digital spectrum end-to-end is the digital transformation. And I think that is what you're saying. So can you just talk a little bit about that key message because I think Interconnect, you guys really highlighted this. I mean, certainly cognitive was front and center with the marketing and with Watson and all, but the underlying message was blue mix to enable, essentially a digital everything culture. Talk more about that. Yeah, I mean, the cloud is really another huge enabler, right? So we talk about the ubiquity of data in able to store vast amounts of data very inexpensively with Hadoop now. I mean, that's the value prop. And Amazon, of course, was one of the first to introduce us to cloud computing, at least at scale and at a reasonable cost. But I would posit that where we're headed, and I know Google had their event just last week, and what I'm seeing emerge is really not necessarily just serving hard drives and processing as a service, but actually leaning more into a platform as a service where that's where blue mix comes in. And so we have, we're the only offering out there on the market that actually you can build a mobile app using iOS with cognitive, with Hadoop, with Spark, with all of these emerging technologies in one place. And the Swift announcement you guys did with Apple is pretty important. I want to get your take on some specific points around that because, and Peter, I like you to weigh in too, because as an analyst you look at all the different vendors and the impact of the customer is certainly multi-vendor. But in cutting speech, there was a quote, I want to get your thoughts on the quote here. Long term, big data ecosystem around Hadoop may survive longer than Hadoop project itself. So that essentially highlights what we've been seeing, which is it's not just about Hadoop, and some will argue that it's still not easy enough. We were saying that yesterday, it's got to get easier. But what's interesting around the ecosystem, it's not just Hadoop, it's people who have come around Hadoop who are doing other things. So I want to get your guys' thoughts on that. Joe, we'll start with you. That ecosystem, certainly sustainable, it's active, it's all smart people, and they're doing a variety of things, but it's not just about Hadoop, you see Spark and other things, your thoughts on that coming. So we recognize that last year when we announced Spark, which is one of the reasons why we had a community-first approach. We opened up the Spark Technology Center in San Francisco, and it's not composed of simply Hadoop and Spark experts. It also includes domain-specific experts, as well as designers, as well as business users. So I think what I would interpret that comment as is that it's not gonna be focused on Hadoop primarily, and it's going to expand even beyond Hadoop, is talking about the different people, frankly, that are part of this community. Geez, there's like 5,000 people over there, right? They're not all data scientists. I couldn't, you know, how many times that I hear speakers say, and you data scientists, I'm like, I'm not a data scientist, right? So we need to get to a point where we're building applications that can actually translate data science to a common person or to an industry-specific person, right? So that's where I see, that's how I interpret this ecosystem evolving. And in fact, just yesterday, we released our community partner program. So if you go to community.spark.tc, we have around a dozen or so new ecosystem members, like, you know, the makers of Kafka, Confluent, you know, H2O with machine learning, Datto, you know, with their machine learning APIs. So this ecosystem is growing at a tremendous speed, and we're just, you know, trying to facilitate the conversation. Yeah, what I'd say about that, and I think you're absolutely right, Joel, how I would extend that is the idea that every tool has a certain pedagogy associated with it, and what the ecosystem is focused on is solving the problems. Yep. And there's this interesting cycle between identifying the problem and coming up with the technology so it can solve it, and then discovering that the technology can be applied to new problems, but not quite, and extending the technology, and this cycle is going to go on. And Hadoop is, in many respects, the first pebble that was thrown into this vast lake of digital business problems that are, and the ripples are going through, and eventually they'll disperse, but people will say, okay, with that experience, we can now do something that involves streaming, and then we can do something that involves something else. So the ecosystem gets catalyzed by the problems that can be solved, and increasingly that's what we're going to be focusing on, is how can we take these technologies and apply them to new problems, what are the limits of the technology, how do we create new stuff in the process, get that back out into the ecosystem, and so he's absolutely right. We'll remember Hadoop forever, and it will probably be here in some form forever, but in 10 years what we're going to be talking about is a new class of technology, because we will be trying to solve vastly different and more complex, and perhaps more interesting types of problems. So diversity is a key thing, Joel. You guys have a diverse approach, and talk about that, your thoughts on that, because IBM is not, it used to be big blue, blue everything, but now you have openness, you have some differentiation, you started with Watson, and a variety of other things. So talk about IBM's open, yet differentiated strategy, because you guys have been doing open source. IBM has been doing open source, again Hadoop for 10 years, but it goes way back. The stuff that you guys are doing is pretty phenomenal, but talk about the diversity, why it's important to have diversity in the ecosystem. Yeah, I mean, I would say that the short answer is, that without diversity, you can really only solve a handful of challenges, right? And so you really need to bring in different points of view into every kind of discussion, and it's no difference now when you talk about data and the digital business, and even now that we talk about the cognitive business, there's a lot of interpretation that needs to happen, and there's many ways to interpret things. I would argue that it's not necessarily, I mean, certainly we are committed to open source, no question, but I would actually move, I would position it a different way, where we talk about an open kind of community or an open framework. So I think what we're doing with say Watson and how we've opened up a number of our APIs, and not just open them up and kind of threw them over the wall to the community, which I see a lot of people doing lately, we actually make them usable to application developers. So if you go into Bluemix, you can choose from over 30 different cognitive APIs, and use them in your application tomorrow. In fact, internally, we're running what's called a cognitive build contest right now, where hundreds and thousands of IBMers are building machine learning apps that have no experience with doing machine learning. They're able to go into Bluemix, grab those APIs, pull them into an application, and basically prototype stuff inside of IBM for actually creating some really unique opportunities for us. So it's really a cool era that we're in right now. So, Joel, I'm fascinated by what you're saying, because we were talking, in fact, during the introduction about how the big boys sometimes don't get the credit for innovation that they deserve, there is an enormous amount of invention that takes place, and we saw a lot of great, very inventive companies yesterday, but the idea of creating APIs, not just throwing them over the wall, but getting the community to engage with those APIs, taking on the task of servicing and supporting them so that everybody changes their behaviors or adopts the new behaviors that are likely to lead to dramatically new levels of business value. That's always been a strength of a company like IBM. But how do you put together that need to serve the community through open source and at the same time, take on these challenging, very expensive and complex tasks that require a fair amount of money? Yeah, I mean, I think it's a matter of, look, open source is here to stay. I think innovation is happening, while the best innovation is happening when many experts from multiple organizations come together and collaborate. Whether that be over open source technology or an open project, I see that's just basically the future. And where we provide value and where we create value is by bringing, I think, a unique viewpoint to every problem that we encounter. So that's what's really exciting to me is that by and large, by us opening up IBM, frankly, to work with more of the community, it means that we're going to get exposure to a lot more challenges that many businesses are facing. That really, I would say that IBM is one of the few people that can really solve because we're able to attack problems from so many different angles. So how do you sustain the fact that you're trying, that you're gonna become a hub with the community through which an enormous amount of knowledge sharing is going to happen? There's gonna be a lot of knowledge flowing through IBM. How do you then take that, grab that, and turn it into new capabilities while protecting your customer's intellectual property? Because we're not talking about putting in place accounting packages that everybody's gonna do relatively commonly. We're talking about altering your operations, engaging your markets very differently. How do you sustain that tension? Yeah, that's a really great question. And it isn't, it's not an obvious, there's no obvious answer. I think it's just maintaining that relationship and being very transparent, frankly, with our clients as well as with the community and being very clear about what we ask them to share or not share. And actually what you're seeing in the industry today, if you look at Uber and if you look at Airbnb and if you look at a lot of the leading companies, they are open with sharing their architectures. The real IP, I think, is not about the technology stack. I think that's a common misconception. The real IP is actually applying that stack to solve real business problems. And for those, you can put your ear to the ground and hear what some folks are doing, but by and large, that's kind of a cultural kind of thing within a lot of our clients' companies, right? So talk about the show here. We've got a couple more minutes and I want to get your thoughts on what you're working on. What IBM is doing at the show. What's the focus for IBM here at Big Data Week, Big Data SV and Strata Hadoop? What are you guys looking at? What are you looking at for deals? What's on your agenda? What are you looking for? What's surprising is just your thoughts and IBM in general. Yeah, I mean, I would say the most disruptive mega trend or theme, if you want to call that, that we see is the cloud. I think that's actually what could disrupt Hadoop the most. So if you look at, say, our spark as a service on Bluemix, it doesn't use Hadoop, right? It runs on our own resource manager that sits on top of the Swift object store. And we've made a ton of enhancements to make it extremely fast and extremely proficient, but there's no Hadoop in sight there, right? And the same can go for other vendors. So that to me is an interesting scenario because you start looking at, okay, Cloud just creates different economics, different dynamics. And actually Adam Kokolowski is gonna give a really cool talk tomorrow, one of his keynotes, where he's gonna talk explicitly about it, what it takes to work within open source, what it takes to work within the cloud environment. These are non-trivial challenges. I think Hadoop has grown and has done really well, but predominantly on-premise, right? I don't hear much from, say, Cloudera, MapR, Hortonworks, or frankly many vendors, Hadoop vendors, talking about the cloud. And so to me, I think that's where we go hockey stick into the next couple of years, is moving to the cloud and helping our clients create not just cloud environments, but hybrid cloud environments. When you say cloud, are you really saying simplification? You know, I would love to say that if it were so simple. But the idea of taking some of the technology administration challenges out and let somebody else manage them so the developers and users can get to the value. That is true. I would say that a lot of the complexity in managing a Hadoop cluster, and in fact that's a good segue into the ODPI, we actually announced that we released a test harness and the first certification. And so if you look at Hadoop, it's made up of over 30 different projects. So managing that on-premise is painful, but you're right, as you move to the cloud, I mean a lot of that will be automated behind the scenes, right? And we'll make a manage for it. I mean that's just one of those things that everyone's chipping away at every day is making it easier. I mean we're hearing that loud and clear, a lot of cracks in the foundation, but that's just more time-based. Gotta get it done over time. Yeah, and I think, sorry, just to add to that and kind of as an aside, you know what I actually, there's a myth that I wanted to spell where people actually think you need Hadoop to use Spark. Like a lot of the presentations I see, it's like, okay, you gotta build your cluster and get all your data in there and build the lake first and then you can get to Spark. It's like, that's not at all true. Like you can actually spin up a Spark cluster before you ever touch Hadoop, right? So, and I would actually highly recommend it because that's a far faster time to value. You know, direction than I would necessarily saying, you know, investing heavily in a cluster, in my opinion. Joel, thanks for spending the time coming by first thing in the morning here on theCUBE. You know, bringing the energy, day two coverage. Thanks for coming on, great insight and congratulations on your new role as you guys expand and continue to do well with big data and certainly the cloud we see in that a lot of great traction. Of course, Watson is the headliner of all the conversations. You know, cognitive conversation is a platform. All these kinds of things are booming, it's all data driven. This is theCUBE, we're back with more coverage after this short break.