 From Midtown Manhattan, it's theCUBE. Covering Big Data, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. Okay, welcome back everyone. Live in New York City, this is theCUBE's coverage of eighth year doing Hadoop World. Now, evolved into Strata Hadoop, now called Strata Data. It's had many incarnations, but O'Reilly Media run in their event in conjunction with Cloudera, mainly an O'Reilly Media show. We do our own show called Big Data NYC here with our community at theCUBE, bringing you the best interviews, best people, entrepreneurs, thought leaders, experts, to get the data and try to project the future and help users find the value and data. My next guest is Rob Thomas, who's the general manager of IBM Analytics, CUBE alumni. I've been on multiple times successfully executing it in the San Francisco Bay Area. Great to see you again. Yeah, John, great to see you, thanks for coming. You know, IBM has really been interesting through its own transformation and a lot of people will throw IBM in that category, but you guys have been transforming, okay? And the scoreboard yet has to yet to show, in my mind, what's truly happening, because if you still look at this industry, we're only eight years into what Hadoop evolved into now as a large data set, but the analytics game just seems to be getting started with the cloud now coming over the top. You're starting to see a lot of cloud conversations in the air, certainly the lot of AI washing, you know, oh, AI this, but it's machine learning and deep learning at the heart of it as innovation. But a lot more work on the analytics side is coming. You guys are at the center of that. What's the update? What's your view of this analytics market? Most enterprises struggle with complexity. That's the number one problem when it comes to analytics. It's not imagination. It's not willpower. In many cases, it's not even investment. It's just complexity. We are trying to make data really simple to use. And the way I would describe it is we're moving from a world of products to platforms. Today, if you want to go solve a data governance problem, you're typically integrating 10, 15 different products and the burden that is on the client. So we're trying to make analytics a platform game. And my view is an enterprise has to have three platforms if they're serious about analytics. They need a data management platform for managing all types of data, public, private cloud. They need unified governance, so governance of all types of data, and they need a data science platform, machine learning. If a client has those three platforms, they will be successful with data. And what I see now is really mixed. We've got 10 products that do that, five products that do this, but it has to be integrated in a platform. You, as an IBM, are the customer as these tools. Yeah, when I go see clients, that's what I see, is disparate, they have disparate tools. And so we are unifying what we deliver from a product perspective to this platform concept. So you guys announced the integrated analytics system. I've got to see my notes here. I want to get into that second, but it's interesting you bring up the word platform because platforms have always been kind of reserved for the big supplier. But you're talking about customers having a platform, not a supplier delivering a platform per se, because this is where the integration thing becomes interesting. We were joking yesterday on theCUBE here, kind of just kind of ad hoc conceptually, like the world's turned into a tool shed. And everyone has a tool shed or knows someone that has a tool shed, there's a tools in the back and they're rusty. And so this brings up the tool conversation. There's too many tools out there that try to be platforms. And if you have too many tools, you're not really doing the platform game, right? And complexity also turns into, when you bought a hammer, it turned into a lawn mower, right? So a lot of these companies have been groping and trying to iterate what their tool was into something else that wasn't built for. So as the industry evolves, that's natural Darwinism, if you will, they will fall to the wayside. So talk about that dynamic because you still need tooling. Yes. But tool will be a function of the work, as Peter Burris would say. So talk about how does a customer really get that platform out there without sacrificing the tooling that they may have bought or want to get rid of? Well, so think about an enterprise today. What the data architecture looks like is I've got this box that has this software on it, use your terms, has these types of tools on it, and it's isolated. And if you want a different set of tooling, okay, move that data to this other box where we have the other tooling. So it's very isolated in terms of how platforms have evolved or technology platforms today. When I talk about an integrated platform, we are big contributors to Kubernetes. We're making that foundational in terms of what we're doing on private cloud and public cloud is if you move to that model, suddenly what was a bunch of disparate tools are now microservices against a common architecture. And so it totally changes the nature of the data platform in an enterprise. It's a much more fluid data layer. The term I use sometimes is you have data as a service now available to all your employees. That's totally different than I want to do this project. So step one, make room in the data center. Step two, bring in a server. It's a much more flexible approach. So that's what I mean when I say platform. So operationalizing it is a lot easier than just going down the linear path of provisioning. Right. All right, so let's bring up the complexity issue because integrated and unified are two different concepts that kind of mean the same thing, something that depends on how you look at it. When you look at the data integration problem, you get all this complexity around governance, a lot of moving parts and data. How does a customer actually execute without compromising the integrity of their policies that they need to place? So in other words, what are the baby steps that someone can take the customers take through with the work you guys are doing with them? How do they get into the game? How do they take steps towards the outcome? They might not have the big money to push it all at once. They might want to take a risk management approach. I think there's a clear recipe for doing this right. And we have experience of doing it well and doing it not so well. So over time we've gotten some, I'd say a pretty good perspective on that. My view is very simple. Data governance has to start with the catalog. And the analogy I use is, you have to do for data what libraries do for books. And think about a library. The first thing you do with books, card catalog. You know where you basically itemize everything. You know exactly where it sits. If you got multiple copies of the same book, you can distinguish between which one is which. As books get older, they go to archive, to microfilm or something like that. That's what you have to do with your data. On the front end. On the front end. And it starts with the catalog. And the reason I say that is, I see some organizations that start with, hey, let's go start ETL, create a new warehouse, create a new Hadoop environment. That might be the right thing to do. But without having a basis of what you have, which is the catalog, that's where I think clients need to start. Well, I would just add one more level of complexity just to kind of reinforce it. First, I agree with you. But here's another example that would reinforce this step. Let's just say you write some machine learning and some algorithms. And a new policy from the government comes down. Hey, are you dealing with Bitcoin differently or whatever? Some GPRS kind of thing happens where someone gets hacked and the new law comes out. How do you inject that policy? You got to rewrite the code. So I'm thinking that if you do this right, you don't have to do a lot of rewriting of applications to the library or the catalog will handle it. Is that right? Am I getting that right? That's right, because then you have a baseline. That's what I would describe it as. It's codified in the form of a data model or in a form of ontology for how you're looking at unstructured data. You have a baseline. So then as changes come, you can easily adjust to those changes. Where I see clients struggle is if you don't have that baseline, then you're constantly trying to change things on the fly. And that makes it really hard to get to this. Well, really hard, expensive. They have to rewrite apps, rewrite algorithms and machine learning things that were built. Probably people that maybe left the company, who knows, right? I mean, so the consequences are pretty great. They've been pretty big. Yes. Okay, so let's get back to something that you said yesterday. You were on theCUBE yesterday with Hortonworks CEO, Rob Bearden, and we were commenting about AI, about AI washing. You said, quote, you can't have AI without IA. Play on letters there, sequence of letters. Which was really an interesting comment. We kind of referenced it pretty much all day yesterday. Information architecture is the IA. And AI is the artificial intelligence. Basically saying, if you don't have some sort of architecture, AI really can't work. Which really means models have to be understood with a learning machine kind of approach. Expand more on that. Because that was, I think, a fundamental thing that we're seeing at the show this week, this in New York, is the models for the models. Who trains the machine learning? The machine's got to learn somewhere too. So there's learnings for the learning machines. This is a real complex data problem. And if you don't set up the architecture, it may not work. Explain. So there's two big problems enterprises have today. One is trying to operationalize data science and machine learning at scale. The other one is getting to cloud. But let's focus on the first one for a minute. The reason clients struggle to operationalize this at scale is because they start a data science project and they build a model for one discrete data set. The problem is that only applies to that data set. It doesn't, you can't pick it up and move it somewhere else. So this idea of data architecture just to kind of follow through, whether it's the catalog or how you're managing your data across multiple clouds becomes fundamental because ultimately you want to be able to provide machine learning across all your data. Because machine learning is about predictions and it's hard to do really good predictions on a subset. But that prereqs the need for an information architecture that comprehends for the fact that you're going to build models and you want to train those models as new data comes in, you want to keep the training process going. And that's the biggest challenge I see clients struggling with. So they'll have success with their first ML project. But then the next one becomes progressively harder because now they're trying to use more data and they haven't prepared their architecture for that. Great point. Now switching to data science, you spoke many times with us on theCUBE about data science. We know you're passionate about it. You guys do a lot of work on that. We've observed and Jim Kobielson and I were talking yesterday, there's too much work still in the data science guys played. They're still doing a lot of what I call cis-admin like work, not the right word, but like administrative building and wrangling, they're not doing enough data science. And there's enough proof points now to show that data science actually impacts business. I mean, and whether it's military having data intelligence that executes something to selling something at the right time or even for worker, player, consumer, all proof is out there. So why aren't we going faster? Why aren't the data science is more effective? What is it going to take for the data science to have a seamless environment that works for them? They're still doing a lot of wrangling. They're still getting down the weeds. Is that just the role they have or how does it get easier for them? That's a big challenge. That's not the role. So they're a victim of their architecture to some extent and that's why they end up spending 80% of their time on data prep, data cleansing, that type of thing. Look, I think we solved that. That's why when we introduced the integrated analytics system this week, that whole idea was get rid of all the data prep that you need because land the data in one place. Machine learning data science is built into that. So everything that data scientist struggles with today goes away. We can federate to data on cloud, on any cloud. We can federate to data that's sitting inside Hortonworks. So it looks like one system but machine learning is built into it from the start. So we've eliminated the need for all of that data movement for all that data wrangling because we organize the data, we build the catalog and we've made it really simple. And so if you go back to the point I made, so one issues, clients can apply machine learning at scale. The other one is they're struggling to get the cloud. I think we've nailed those problems because now with a click of a button, you can scale this to the cloud. All right, so how does a customer get their hands on this? Sounds like it's a great tool. You're saying it's this leading edge. We'll take a look at it. Certainly I'll do a review on it with the team. But how do I get a hold of this? Would I download it? Do you guys supply it to me? Is it some open source? How do your customers and potential customers engage with this product? However they want to, but I'll give you some examples. So we have an analytic system built on Spark. You can bring the whole box into your data center and right away you're ready for data science. That's one way. Somebody like you, you're going to want to go get the containerized version. You go download it on the web and you'll be up and running instantly with a highly performant warehouse integrated with machine learning and data science built on Spark using Apache Jupiter. Any developer can go use that and get value out of it. You can also say I want to run it on my desktop. Yes, there's a cloud version out there. That's the free version. There's also a version on public cloud. So if you don't want to download it, you want to run it outside your firewall. You can go run it on IBM cloud on the public cloud. Just your cloud, Amazon? No, not today. Just IBM cloud, okay, got it. So there's a variety of ways that you can go use this and I think what you'll find. But you have a freemium model. People can get start outs to the downloaded to your data centers and also free too. Yeah, absolutely. Okay, so all the base stuff's free. We have a desktop version too. I think you can download it. You are welcome when people look at this. DataScience.ibm.com. That's the best place to start a data science journey. Okay, multi-cloud. Common Cloud is what people call it. You guys have Common SQL Engine. What is this product? Has it related to the whole multi-cloud trend? Customers are looking for multiple clouds. Yeah, so Common SQL is the idea of integrating data wherever it is, whatever form it's in. ANZ SQL compliant. So what you would expect for a SQL query and the type of response you get back, you get that back with Common SQL no matter where the data is. Now when you start thinking multi-cloud, you introduce a whole other bunch of factors. Network, latency, all those types of things. So what we talked about yesterday with the announcement of the Hortonworks data plane which is kind of extending the yarn environment across multi-clouds. That's something we can plug into. And so I think, let's be honest, the multi-cloud world is still pretty early. Really early. Our focus is delivering. I don't think it really exists actually. There's multiple clouds, but no one's actually moving workloads across all the clouds. I haven't filmed any of them. Yeah, I think it's hard for latency reasons today. We're trying to deliver an outstanding experience. But people are saying, I mean, this is headroom. But people are saying, I'd love to have a preferred future of multi-cloud. Even though they're kind of getting their own shops in order and retrenching and replatforming. But that's not a bad ask. I mean, I'm a user. I want to move from, I don't like IBM's cloud or I get a better service. I can move around here if Amazon's too expensive. I want to move to IBM. If you've got a product differentiation, I might want to be in your cloud. So again, this is the customer's mindset, right? If you have something really compelling on your cloud, do I have to go all in on IBM cloud to run my data? It shouldn't have to, right? I agree. I don't think any enterprise will go all in one cloud. I think it's delusional for people to think that. So you're going to have this world. So the reason we, when we built IBM cloud private, we did it on Kubernetes was we said, that can be a substrate, if you will, that provides a level of standards across multiple cloud type environments. And it's got some traction too. So it's a good bet there. Absolutely. Rob, final word, just talk about the personas who you now engage with from IBM standpoint. I know you have a lot of great developer stuff going on. You've done some great work. They've got a free product out there. But you still got to make money and got to provide value to IBM. Who are you selling to? What's the main thing that'll allow multiple stakeholders? Could you just clarify the stakeholders that you're serving in the marketplace? Yeah. I mean, the emerging stakeholder that we speak with more and more than we used to is chief marketing officers who have real budgets for data and data science and trying to change how they're performing their job. That's a major stakeholder. CTOs, CIOs, NEC level. Chief data officer. You know, chief data officers, honestly, it's a mixed bag. Some organizations, they're incredibly empowered and they're driving the strategy. Others, they're figureheads. And so you got to know how the organization do it. I'll pop up to the CFO or something. Yeah, exactly. So you got to either. Well, they're not really driving it. They're not changing agents. They're not like we're mandated to go do something. They're maybe governance, police or something. Yeah. And in some cases, that's true. In other cases, they drive the data architecture, the data strategy, and that's somebody that we can engage with right away and help them out. And events you got going up, things happening in the marketplace that people might want to participate in. I know you guys do a lot of stuff out in the open. Events that can connect with IBM, things going on. So we're doing a big event here in New York in November 1st and 2nd, where we're rolling out a lot of our new data products and cloud products. So that's one coming up pretty soon. The biggest thing we've changed this year is there's such a craving for clients for education as we've started doing what we call an analytics university, where we actually go to clients and we'll spend a day or two days, go really deep in open languages, open source. That's become kind of a new focus for us. A lot of reskilling going on too with the transformation. Yes, absolutely. All right, Rob Thomas here, general manager, IBM analytics inside theCUBE, Cube alumni, breaking it down, giving us perspective. He's got two books out there. The Data Revolution was the first one. Big Data Revolution. Big Data Revolution and the new one is every company is a tech company. Love that title, which is true. Check it out on Amazon, Rob Thomas. Big Data Revolution, first book and then second book is every company is a tech company. It's theCUBE, live from New York. More coverage after this short break.