 in San Francisco, it's theCUBE, covering Apache SparkMaker Community Event brought to you by IBM. Now, here are your hosts, John Walls and George Gilbert. And welcome back to the Galvanized Campus here in San Francisco as we continue our coverage at theCUBE of the Apache SparkMaker Community Event along with George Gilbert. I'm John Walls and we're joined here at Galvanized with a couple of guests, one from IBM, Joe Horowitz, who's the Director of Strategy and Business Development for IBM Analytics and Joe, thanks for being with us. Thanks guys, thanks for having us. And joining him to his left is Shri Ambadi, who's the CEO of H2O and Shri, thank you for the time. Thank you for having me. We appreciate that. Joe, let's jump in. We haven't talked much yet today. I don't think George about machine learning and integral piece, obviously in the Spark ecosystem, but from the IBM point of view, machine learning not only wears it today, but what do you see the growth potential and what's the necessary commitment you're gonna have to have to really make that take off? Yeah, I mean, well, we're lucky because we have an expert like Shri here today who can tell us a lot more than I could about machine learning, so that's good. From the IBM side, I mean, clearly it's having an impact across all of our business units. Clearly the ability for enterprise applications to have the ability to learn and to interpret their environments is bringing huge value to our clients. And then Shri, why don't you, if you could just from a 30,000 foot level right now because this is what you do, obviously at H2O. But what are you seeing in terms of what's happening in your market? We are busy debunking the experts. We mentioned the word experts. The money ball for business is on, right? And the money ball is basically, there is no expert, data is the expert. And in many ways, data and analysis, data science has transformed lots of our businesses and they're finding value not just in their vertical, but in a horizontal, they're building around the data and the community they have built. And they're trying to defend that community with beautiful data products. And that's kind of where the revolution ahead is for. And the whole data transformation is bringing AI to wider audience by making AI part of every software element out there. It's certainly coincidental. They don't want you to buy that money ball. We're in here, we are in the Bay Area and Billy B in Oakland, just across one of the bridges here, who is going to be originator of that. We talk about this shift into horizontal rather than vertical. So I mean, what's happening then in terms of those sectors, in terms of those markets? I mean, who's more involved with data now that wasn't maybe five, 10 years ago? So the value chain, I mean, the values have shifted from physical assets to digital assets to kind of the virtual assets. There's a substantial delinking between Airbnb is more valuable than all the hotels put together. Uber is more valuable than GM and all the car companies and even Tesla. And why? And the answer is that they're able to truly bring data to community and build a real trust-based ecosystems around the core product they have. It's no longer about product value, it's about ecosystem value. How can you bring external interactions into your company so you can take advantage of the data you have and build a wider ecosystem? And I think that's kind of the true thesis of how data is able to touch not just your vertical, but all the adjacent verticals that you have in the core business. So if you're an insurance company, you can be informing car companies. You can be informing health companies. You converge all these different siloed verticals into a very, how can I make life better mindset? And I think that's kind of where we're seeing. We're seeing transformation. I mean, software was the original horizontal, but even software companies, much like IBM and others, are building verticals where they can go really up the chain, solve problems, not just sell technologies. I just want to add to that, Ben, if I may. I just signed up for new insurance yesterday. And as I'm going through that application process all online, not even talking to anyone, there's a little check box. Like if you own a smartphone, you install their app and they'll automatically discount because they can track your driving and they can use things like machine learning to basically tell and predict, frankly, if you're going to be a good driver, bad driver. So that was like where, you know, some of these things, like I think a lot of folks have like looked way out, like five or 10 years, and say, gosh, you know, that AI thing is coming. It's scary, you know, let's set up a whole organization to protect us from this scary thing. And I think many people don't realize, like this is here now. This is happening today. You may not even realize it. And in fact, I think a large, at least a big part of this is frankly personalizing experiences for people. I think that's actually to me where I've experienced it personally is actually making me have a more intimate relationship with these, you know, with these verticals. Yeah, I think like the ways that or something that comes to mind when you talk about that, that was my kind of my aha moment was when it started, and I realized that it's, are you going home now? Are you driving to work now? Are you doing this now? Are you going to see your mother, whatever it is? Yeah. And now it's obviously looking at my behavior and my patterns and whatever. And so we're seeing that tree that not just in transportation, but in retail behavior, or might be, even maybe in the intelligence community, whatever it is, we're seeing, or I don't know, we're gathering, and healthcare, being able to predict outcomes more successfully, depending upon the kind of food I consume or what condition I'm in. Are those all these things happening right now? They are. I mean, in many ways, what's happening is the bar. So what open source? You think about Apache Spark, we're in the open source. There's a conference that's coming together. Apache Spark, Apache Hadoop, the whole open source Linux in the original ways, Google have really lowered the bar for building applications and building code. Code is a commodity. So what people used to think of their core values, their core or their data, I think what is really people are trying to do is defend their brand and defend their kind of truly their community. So if you can defend your community with the data and the technologies you have and serve them very well, they will grow profound love for your brand and love for you as a company. And that's what's really making people go into the 20 second, like 2020, as opposed to think about 2016 and today and here and now, AI is here to actually make us be more human. So you can actually let the rule-based machines go away to more pattern recognition-based machines that can truly make all the boring stuff we do day to day completely aside and you focus on the stuff that actually matters, which is get emotional reaction from your customers. Shree, you're heading in a direction that's really interesting. We spoke, the first interview we did was with Derek Shettle, who's in charge of analytic data services or I guess more broadly. Yeah, he's the GM of our analytic portfolio. And he laid out a vision where he said, we're not, the world's moving away from product silos, we're orchestrating a bunch of services. And then we're also bringing together a whole lot of data and analytic feeds and cataloging them the way 10 years ago we might have cataloged a software marketplace. But I wanted to ask you, based on what you were saying- Speed aligns is in microservices. Yes. One line, yeah. So for the consumer of this, I don't mean the person, I mean the company, where is the value now? Is it in a combination of the proprietary data, external data that adds context and then the machine learning algorithms that they all mix in one proprietary ball? Where is value for the customer the same way that Derek described it for the vendor? So the long end of this is how do we improve lives? How do we save lives in healthcare? How do we truly have an impact on time? Time is only non-renewal resource, right? So data is where, so today Google has 359 degrees view of the world as they say, right? So they have the whole, almost the full view of the world. So what companies today of, and we were speaking to a few customers as we speak, like Progressive or Capital One or PricewaterCoopers, they have their customer data that they are not allowed to share, right? Now if you can build data alliances where you're not sharing data, but sharing models and improving each other's models to serve the same customer better, to find vendors who pay on time, for example, or to find the anti-money laundering where you can come together to fight something that's bigger or cancer, which is even bigger, how do you come together as a society to put the best brains in one place to solve something substantial, which will kill one of us, one in four of us will get cancer? So we try to truly elevate the problem to how does it touch lives? And that includes algorithms, data, the best practices in each of these spheres. You want the best doctor to be kind of captured as an AI program. You want the best radiologist, the best like physicist. You want to capture the best of practices as a true pattern recognized model that you can reuse individually in each verticals and horizontally across verticals. So let me chime in here. Because there's been a few things that we've heard, right? So one is what we mentioned earlier is this idea of an open platform, right? That Derek mentioned earlier today. And what I would say that our partnership is really based on Apache Spark, Apache ADU, H2O, they're all open source projects. Not only are they're open source, there's no lock-in. So any, and then the second piece is, is this idea of an abstraction layer from data in the form of a machine learning algorithm, which is transferable, which is I think a tree's point reusable to other industries. And so that's how you remove that kind of, connection back to the data. But there are a lot of, look, there are certain companies that made news this last week that are also here in the Bay Area. And a lot of clients left because frankly there's a lot of businesses out there right now that frankly are serving like these proprietary stacks and charging a lot of money for them, right? And so our vision of the future is, and that's a big reason why we contributed System ML, which was a proprietary piece that Watson's built on, is because our belief is that this needs to be an open framework for anyone, whether it's H2O or Google TensorFlow or any of these libraries that exist for people to come together and then the whole thing will lift up as opposed to having these siloed use cases. More importantly, and riffing off of that, when you see a big company, let's say Walgreens or a CVS or even a Progressive or a Liberty Mitchell or a Capital One or a JP Morgan or an AmEx, they are spending billions of dollars every year on software. And they actually, frankly, are a very massive software operation as a backend in the back office. But historically they were consumers of software. When you're a consumer of software, it is in the interest of the vendor that you're not well educated. When you are actually a maker of software, it is in the interest of the community of the software providers to educate you the most. So open source is about a maker culture. So you're making software and we are infecting that culture not just to us, but to other software companies, powerhouses like IBM, which have like they've started software in the hundreds of years ago, right? Sort of many ways. And so you have IBM, Microsoft, Google, truly embracing open source for the first time ever. And that's actually a very fundamental change, Amazon as well. They've open sourced the deep learning recently. So what you're seeing now is the ability, the need for building ecosystems of software makers. And that's coming into the verticals, into traditionally companies that did not think they were software companies. So I think that's the bigger change that's happening where four years ago they would ask us, how will you make money if you're open source? I think four years from now they'll ask you, how will you make money from software in general? Whether it's open or closed, because the money in the actual code has become a commodity truly. So the money isn't actually the value of the stack. And that's kind of where it goes back to your question. Okay, so let me, let me add, or let me try and clarify one thing. So let's assume the regular, the traditional enterprise app that's coded, you know, it has its infrastructure, it has its, you know, business process rules. And now we wanna share these models. But the models, so much of the work in making them work is in tying them to the data. And the specific, yes, the context, and that's different across companies. So how do you make that a relatively friction-free? It's a good question. I think the API economy is here, right, sort of. And so truly the data products of today are going to have more APIs to become almost like a pipe in the grand scheme of many operations in a pipeline. But if you think about deep learning, it has, for the first time, ever pickled how people look at things. So the vision deep learning that's used to identify cats can be, it actually does edge detection so you can actually detect edges anywhere. So you can actually use it to predict what is the name, what is the name of this carpet. So that cat model is useful, after all. Yes, indeed. And so all of ImageNet has led to kind of this, like embracing of deep learning and explosion in deep learning. You can actually predict what's single of the roof using the same kind of model. So, and then use that to predict how long the roof is gonna be there. And so the insurance companies can suddenly start using, replace this roof instead of waiting for the catastrophe to happen. But will the insurance company have their own deep learning model? So the bigger, there's a bigger question. Will they subscribe? There's a bigger question. Okay. So the challenge isn't so much that we're, how do you apply arguably the model to different data sets? I think that's being sorted out. I actually think the bigger challenge is, frankly, a lot of the data that exists is dark, meaning it's paper based, frankly. And it's not even digital today. And I think digital is often used synonymously with marketing. That's not what we talk about when we say digital. It's actually this, I mean, you have a notepad there, right? I mean, I can't do anything with that until you digitize it in some way. Take a picture and we can run a model on it. I don't know if you're seeing whiteboards every day, right? You gotta point out that we're all fashion guys. No, I'm just saying, so that's where we'll get to. And so a lot of the business that's conducted today is still very much paper based and it's not digital. So there's not a lot you can do with that. Or completely... Audio, video, yeah. But I think the crux of your question is there's lots of reusability. Much like reusable patterns. There are design patterns in classifying. Design patterns in fraud. Design patterns in highly unbalanced datasets that can be reused. Templates that are used to win a Kaggle contest to do A can be transferable to similar to A problem, right? A prime. So the next problem, so zip codes. For example, if you understood zip codes very well, you can reuse zip codes everywhere. If you understand weather patterns, you can apply that everywhere else. So there's some core components that you can productize and then there are components that you compose. Okay. At the risk of descending back into the nitty gritty, take weather, because that is a service now. Weather prediction is a service. Thank you. That's obviously not something where we're gonna pass a model around. That is an API, I guess. Well, I mean, there's, Shree is exactly right. There's design patterns, right? And I think what we're announcing today, right? And you'll hear later today, is our data science experience. And working with H2O's community, I mean, a lot of the learnings aren't happening in the ivory towers really anymore. They're happening in the field and people are sharing at meetups. They're sharing just kind of through word of mouth. And I think a lot of the work that we have to do is frankly sharing these design patterns between not just like hardcore machine learning engineers and data scientists, but even like everyday people should be able to at least cognitively understand what a clustering algorithm means, what classification means. I think that's where I get excited because there's a lot of like we're hearing galvanized. And their mission is really to teach people not necessarily how to do research and development, but frankly, how do I apply data science to, and that's I think where they've actually differentiated themselves quite a bit. I think it's gonna be delivered as applications, microservices, kind of data products. So the APIs, several of these APIs compose and eventually become applications on your dashboard that you're looking at. If you're a CXO today, you don't have the time to look at all the vital signs of your company. How do I condescent to a score? How do I condense that to the score that matters? Like we as often now have plots that show dollar value to the severity of the score because you're not necessarily worried about the most dangerous one that's not so important sometimes or the most dangerous one that's also important. So I think kind of looking at kind of relevance. So going back to your context setting, the context comes from who the end user is, from the user experience. And if the user is a consumer, he has a different context. He's flying frequently, he wants to know which is the best path to optimize, right? And you wanna go to a travel location where you don't want to come back from, right? How do you find that? That's kind of the goal for your lines company should be, right? You know when you get there, okay. But that's, I mean, she's on a very good point. So to your point, I mean, people consume things differently. Some people listen better. Some people are more visual. Some people like to read things. Some people are tactile. So I think we shouldn't limit ourselves in terms of how will people consume data products? Some people play games and some people are immersive. The next bend in this corner is immersive intelligence. VR. We are looking at some of those improvements as well. Before I let you take off, Joel mentioned you're wearing galvanized. You're sitting next to a guy wearing a long-sleeved T-shirt and with a company that has put all the chips in the table on open source. I mean, what does it say to you as a partner about kind of IBM and their culture, their mindset in general? I feel like I'm on exhibition here. It looks good, it looks good. But just in general, it's a different feel, is it not? It's powerful, right? It's kind of the, if you think about when Joel wrote what a blog and Katie Nuggets a few years ago about R versus popularity of R versus like SaaS and other closed source ecosystems. And it's really interesting to see that R Python have really like as open source movements brought data science to life in some degree. And now when bigger companies are embracing them, they're bringing a lot more real software experience into it and channels of distribution. And we as software companies, traditional startup community-based open source companies are looking at bigger companies now voting on open source. There are a lot more movements that can be started and a lot more bridges can be built, a lot more ecosystems to be actually truly thrived. So how do you build ecosystems in an open source? I used to call it BizDev in the land of open source. Is you actually take Spark, put H2O in it and call it sparkling water, right? So that's exactly the product we have, right? And that was actually the first of a movement which kind of Silicon Valley really was like charmed by that whole movement where historically you'd have two open source projects, Cassandra and Hedgepace, both I represent Apache Cassandra at the time, they would really like go to like little limits to show off each other's strengths. And actually some of the parts is actually less than the whole. And if you can build an ecosystem where you try to raise a forest, not a tree, right? And a forest that is really like a real tropical forest, very rich, some people are good at the birds, some of the trees are good at just giving lots of grass and food and sunlight. So all of that builds a very thriving ecosystem. I think what we are seeing now is a multipolar software world, which is great. Multipolar, like you have Apple, you have Google, you have Amazon, you have IBM, you have like all these really cool thriving ecosystems. But also the next level of what I call the integrated platforms. Software platforms were always there. What we're seeing now is giant platforms that are integrated, Nuber, or Airbnb, they are examples of these, or Pridix, GE, they are integrated vertical platforms that can now connect to much more community, much more larger base. So you're going from just pure, I'm a product company, to I'm a platform company looking for ecosystem, to an integrated platform company which can really bring value. These are the multi-levels of growth that's happening that we're seeing, and that's amazing to have. People like IBM, Microsoft has also embraced open source, we're seeing Google opening up. And historically, you've never seen this happen before. So it's kind of wonderful. I think we're no longer living in a winner's take-all society, frankly. I think that there's plenty, it's not who has the best mouse trap will win anymore, it's who builds the best alliances, it's who builds the best solutions that solves the client's most burning problems. That's who will win. And so I think Shree is exemplary of that, of putting clients first, as is IBM. And I think ultimately that's gonna be the differentiator, not the technology, not the stack, because that's becoming a commodity, frankly. It is really about trust. When we're building verticals for CFOs, for example. And historically, which company had the best trust of CFOs, IBM, as a technology company? Of course, we're working with PwC on that front to get to the other side, but on the business side. But truly, as a technology company, you need to be building trust continuously. Whether it's trust with the man in the meetup who's arguing with you on why A should be done, not B, or trust on this panel, or trust across different parts of your customer base, different parts of your partner base. And that's crucial. I mean, that's crucial for machine learning. Think about it. We're going into, away from rule base, where you can finally check, okay, we're making this decision. You have all these routes to a very gray area where you may not know why an algorithm is telling you something. You don't see all the different things, all the millions of features that it took into account. You're just saying, okay, this is what it's telling me, and I have to interpret it now. And so, to Sri's point, if there's no trust there, then people are going to look at it and they're going to go, I'm not going to bet my business on this. And so anyway, so that's- There's a lot of tooling that needs to be invented for building their trust for AI. So there's lots of work to be done. And that's where having big software vendors come in and say, hey, we're going to build a database for AI, AI-DP, with all the same kind of trace logs, with all the same kind of, why did this model go into production? The model governance becomes real. How do you manage models? It's the same, it's as difficult as managing humans. It's very hard, right? Sort of governance is no longer what, who touched the data and did what to it. It's a matter of how did we arrive at this decision? Right, okay. It's a consensus. And actually, more importantly, different hospitals and different banks are trying to come together without sharing data about shared models and fight difficult to trace diseases, difficult to trace habits, behaviors, user-based analytics. And it goes back to where Joel started with. Personalization, almost hyper-personalization. There are five billion, six billion people on this planet. Can I personalize every service he needs to the point where he's as happy as God? Or she, where she needs. That's right. We were talking about a collaborative spirit that is certainly alive and well and is driving this community forward. And we really appreciate the time for both of you. And I know it was a very busy day. I had to spend with us and talk about that. Joel, Shrae, thank you. Thanks guys. Thank you for having us. More on theCUBE right after this.