 Live from Boston, Massachusetts. It's theCUBE, covering Spark Summit East 2017, brought to you by Databricks. Now, hear your hosts. Dave Vellante and George Gilbert. Welcome back to Boston, everybody, where the town is still euphoric. Mike Walter is here. He's a principal analyst at Forrester Research. Attended the parade yesterday. How great was that, Mike? Yes! Yes! It was awesome. Nothing like we've ever seen before. All right, the first question is, what was the bigger, shocking, surprise, upset, greatest win? Was it the Red Sox over the Yankees, or was it the Super Bowl this weekend? That's the question. I think it's the Super Bowl. But that was a lot of fun. So how was the parade yesterday? It was magnificent. I mean, it was freezing. No one cared. I mean, but it was, yeah, it was great. Great to see that team in person. I wish we could talk. We can, but we'll get into it. So we're here at Spark Summit, and, you know, the show's getting bigger. You see more sponsors, still heavily a technical audience. But what's your take these days? We were talking off camera about the whole big data theme. It used to be the hottest thing in the world. Now nobody wants to have big data in their title. What's Forrester's take on it? I mean, I think big data, it's just become mainstream. So we're just back to data, you know, because all data is potentially big. So I don't think it's not the thing anymore. I mean, you know, what do you do with big data? You analyze it, right? And part of what this whole Spark Summit is about, look at all the sessions, data science, machine learning, streaming analytics. So it's all about sort of using that data now. So big data is still important, but the value of big data comes from all this advanced analytics. And we talked earlier. I mean, a lot of the value of, you know, Hadoop was cutting costs. You mentioned commodity components and reduction and the denominator and, you know, breaking the need for some kind of big storage container. Okay, so that, we got there. Now shifting to new sources of value. What are you spending your time on these days in terms of research? Artificial intelligence, machine learning, you know, so those are really forms of advanced analytics. So that's been very hot. We did a survey last year, an AI survey. And we asked a large group of people, we said, oh, you know, what are you doing with AI? 58% said they're researching it. 19% said they're training a model, right? So that's interesting. 58% are researching it and far fewer are actually, you know, actually doing something with it. Now, the reality is, if you phrase that a little bit differently and you said, oh, what are you doing with machine learning? Many more would say, yes, we're doing machine learning. So it begs the question, what do enterprises think of AI? And what do they think it is? So, you know, a lot of my inquiries are spent helping enterprises understand what AI is, what they should focus on. And the other part of it is, what are the technologies used for AI and deep learning is the hottest. So you wrote a piece late last year, what's possible today in AI? What's possible today in AI? Well, you know, before understanding what's possible, it's important to understand what's not possible, right? So we sort of characterize it as there's pure AI and there's pragmatic AI. So it's real simple. Pure AI is the sci-fi stuff. We've all seen it, X, Machina, Star Wars, whatever, right? That's not what we're talking about. That's not what enterprises can do today. We're talking about pragmatic AI. And pragmatic AI is about building predictive models. It's about conversational APIs to interact in a natural way with humans. It's about image analysis, which is something very hot because of deep learning. So AI is really about the building blocks that companies have been using, but then using them in combination to create even more intelligent solutions. And they have more options on the market, both from open source, both from cloud services that from Google, Microsoft, IBM, now Amazon, we guys at the re-invent conference. I wasn't personally, but we were certainly. Yeah, they announced the Amazon AI, which is a set of three services that developers can use without knowing anything about AI or being a data scientist. But I mean, I think the way to think about AI is that it is data science. It requires the expertise of a data scientist to do AI. Following up on that comment, which was really interesting is we try and, or as vendors trying to democratize access to machine learning and AI. And I say those, I say that with two terms because usually the machine learning is the stuff that's sort of widely accessible and AI is a little further out. But there's a spectrum when you can just access an API, which is like a pre-trained model. Pre-trained model, yep. It's developer accessible. You don't need to be a data scientist. And then at the other end, you need to pick your algorithms. You need to pick your features. You need to find the right data. So how do you see that horizon moving over time? Yeah, no, so these machine learning services, as you say, they're pre-trained models, totally accessible by anyone. Anyone who can call an API or a RESTful service can access these. But their scope is limited, right? So if, for example, you take the image API, the imaging API that you can get from Google or now Amazon, you can drop an image in there and it will say, oh, there's a wine bottle on a picnic table on the beach, right? It can identify that. So that's pretty cool. There might be a lot of use cases for that. But think of an enterprise use case. No, you can't do it. And let me give you this example. Say you're an insurance company and you have a picture of a steel roof that's caved in. If you give that to one of these APIs, it might say steel roof, it may say damage, but what it's not going to do is it's not going to be able to estimate the damage. It's not going to be able to create a bill of materials on how to repair it because Google hasn't trained it at that level, okay? So enterprises are going to have to do this themselves or an ISV is going to have to do it because think about it. You've got 10 years worth of all these pictures taken of damage and with all of those pictures, you've got tons of write-ups from an adjuster. Whoa, I mean, if you could shove that into a deep learning algorithm, you could potentially have consumers take pictures or somewhat untrained and have this thing say, here's what the estimate damage is. This is the situation. And I've read about insurance use cases like that where the customer could, after they sort of have a crack up, take pictures all around the car and then the insurance company could provide an estimate, all of where the nearest repair shops are. But right now it's like the early days of e-commerce where you could send an order in and then it would fax it and they'd type it in. So I think, yes, insurance companies are just taking those pictures in. The question is, can we automate it? Well, let me actually iterate on that question, which is, so who can build a more end-to-end solution? Assuming there's a lot of heavy lifting that's got to go on for each enterprise trying to build a use case like that, is it internal development and only at big companies that have a few of these data science gurus? Would it be like an IBM Global Services or an Accenture? Or would it be like a vertical ISV where it's semi-custom, semi-packaged? I think it's both, but I also think it's two or three people walking around this conference, understanding Spark, maybe understanding how to use TensorFlow and conjunct with Spark that will start to come up with these ideas as well. So I think we'll see all of those solutions. Certainly, like IBM with their cognitive computing, oh, and by the way, so we think that cognitive computing equals pragmatic AI, right? Because it has similar characteristics. So we're already seeing the big ISVs and the big application developers, SAP, Oracle, creating AI-infused applications or modules, but yeah, we're going to see small ISVs do it. There's one in Austin, Texas, called Interactive Tell. It's like 10 people. What they do is they use the Google, so they sell to large car dealerships like Ernie Bach, right? And they record every conversation, phone conversation with customers. They use the Google pre-trained model to convert the speech to text, and then they use their own machine learning to analyze that text to find out if there's a customer service problem or if there's a selling opportunity, and then the alert managers or other people in the organization. So small company, very narrowly focused on something like carbine. So I wonder if we could come back to something you said about pragmatic AI. We'd love to have someone like you on theCUBE because we like to talk about the horses on the track. So if Watson is pragmatic AI, we all, well I think you saw the 60 minutes show, I don't know, whatever, it was three or four months ago, and IBM, Watson got all the love. Barely mentioned Amazon and Google and Facebook, and Microsoft didn't get any mention. And there seems to be sentiment that, okay, all the real action is in Silicon Valley, but you've got IBM doing pragmatic AI. Do those two worlds come together in your view? How does that whole market shape? I don't think they come together in the way I think you're suggesting. I think what Google, Microsoft, the Facebook, what they're doing is they're churning out fundamental technology. Like one of the most popular deep learning frameworks, TensorFlow, is a Google thing that they open sourced. And as I pointed out, those image APIs that an Amazon has, that's not going to work for insurance. That's not going to work for radiology. So I don't think they're gonna- Facebook's going to apply it differently. Yeah, I think what they're trying to do is they're trying to apply it to the millions of consumers that use their platforms. And then I think they throw off some of the technology for the rest of the world to use. And then the rest of the world has to apply. Yeah, but I don't think they're in the business of building insurance solutions or building logistical solutions. Right, but you said something that was really, really potentially intriguing, which was you could take the horizontal Google speech to text API. And then- And recombine it. Put your own model on top of that. And that's, I mean, the techies call that like ensemble modeling, but essentially you're taking like, almost like an OS level service and you're putting in a more vertical application on top of it to relate it to our old ways of looking at software. And that's interesting. Yeah, because what we're talking about right now, like this conversation is now about applications, right? We're talking about applications which need lots of different services recombined, whereas mostly the data science conversation has been narrowly about building one customer lifetime value model or one churn model. Now the conversation, when we talk about AI, it's coming about combining many different services and many different models. And the platform for building applications. Yeah. And that platform, the richest platform or the platform that is, the platform that is most attractive has the most building blocks to work with. Or the broadest ones? The best ones, I would say right now. The reason why I say it that way is because this technology is still moving very rapidly, right? So for image analysis, deep learning. Nothing's better than deep learning for image analysis. But if you're doing business process models or churn models, well, deep learning hasn't played out there yet, right? So right now, I think there's some fragmentation. There's so much innovation. Ultimately, it may come together. What we're seeing is many of these companies are saying, okay, look, we're going to bring in the open source. It's pretty difficult to create a deep learning library. And so a lot of the vendors in the machine learning space, instead of creating their own, they're just bringing in MXNet or TensorFlow. I might be thinking of something from a different angle, which is not what underlying implementation they're using, whether it's deep learning or whether it's just random forest or whatever the terminology is, the traditional statistical stuff. The idea though is you want to platform, like way, way back Windows, with the Win32 API had essentially more widgets for helping you build graphical applications than any other platform. I see where you're going. And I guess I'm thinking, it doesn't matter what the underlying implementation is, but how many widgets can you string together? I'm totally with you there. And so I think what you're saying is, look, a platform that has the most capabilities but abstracts the implementations and can be somewhat pluggable, right? To keep up with the innovation. Yeah, and there's a lot of new companies out there too that are tackling this. One of them is called bonds.ai. Small startup, they're trying to abstract deep learning because deep learning right now, like TensorFlow and MXNet, that's a little bit of a challenge to learn. So they're abstracting it. But so are a lot of the, so is SAS, IBM, et cetera. So Mike, we're out of time, but I want to talk about your talk tomorrow. So AI meets Spark. Give us a little preview. AI meets Spark. Basically, the prerequisite to AI is a very sophisticated and fast data pipeline, right? Because just because we're talking about AI doesn't mean we don't need data to build these models. So I think Spark gives you the best of both worlds, right? It's designed for these sort of complex data pipelines that you need to prep data. But now with MLlib for more traditional machine learning and now with their announcement of TensorFlow frames, which is going to be an interface for TensorFlow. Now you've got deep learning too. And you've got it in a cluster architecture so it can scale. So pretty cool. All right, Mike. Thanks very much for coming on theCUBE. You know, way to go, Pat. Yeah. Awesome. Really pleasure having you back. Thanks. All right, keep right there everybody, but back with our next guest right after this short break. This is theCUBE.