 Live from San Jose, California, it's The Cube. Covering Big Data Silicon Valley 2017. Hello and welcome to The Cube special coverage of Big Data SV, Big Data in Silicon Valley in conjunction with Strata Hadoop. I'm John Furrier with George Gilbert, with Wikibon and Peter Burris as well. We'll be doing interviews all day, today and tomorrow. Here in Silicon Valley in San Jose, our next guest, Mitt Wally, is the Executive Vice President, Chief Product Officer of Informatica. Kicking off the day one of our coverage. Great to see you, thanks for joining us on our kickoff. Good to be here with you, John. So, I'll see Big Data, this is like the eighth year of us covering what was once Hadoop world, now it's Strata Hadoop, Big Data SV. We also do Big Data NYC with The Cube. And it's been an interesting transformation over the past eight years. This year has been really, really hot with, you're starting to see Big Data starting to get a clear line of sight of where it's going. So I want to get your thoughts, Mitt, on where the view of the marketplace is from your standpoint. I'll see Informatica's got a big place in the enterprise. And the real trends in how the enterprises are taking analytics, and specifically with the cloud, you got the AI looming, all buzzed up on AI. That really sees the people have to get their arms around that. And you see IoT, Intel Announce, an acquisition of $15 billion for autonomous vehicles, which is essentially data. What's your views? Well, I think it's a great question. Ten years have happened since Hadoop started, right? I think what has happened as VC is that, today what enterprises are trying to encapsulate is what they call digital transformation. And what does it mean? I mean, think about a digital transformation for enterprises, three unique things. They're transforming their business models to serve their customers better. They're transforming their operational models for their own execution internally, if I'm a manufacturing or an execution-oriented company. The third one is basically making sure that their offerings are also tailored to their customers. And in that context, if you think about it, it's all a data-driven world, because it's data that helps customers be more insightful, be more actionable, and be a lot more prepared for the future. And that covers the things that you said, look, that's where Hadoop came in to play with big data. But today, the three things that organizations are catered around big data is there's a lot of data, right? How do I bring actionable insights out of it? So in that context, ML and AI are gonna play a meaningful role, because to me, as you talk about IoT, IoT is the big game changer of big data becoming big or huge data, if I may, for a minute. So machine learning, AI, self-service analytics is a part of that, and the third one would be big data or Hadoop going to cloud. That's going to be very fast. And so the enterprises now are also transformed, so this digital transformation, as you point out, is absolutely real, it's happening. And you start to see a lot more focus on the business models of companies, where it's not just analytics as an IT function. It's been talked about for a while, but now it's really more relevant because you start to see impactful applications. So with cloud and the new IoT stuff, you start to see, okay, apps matter. And so the data becomes super important. How is that changing the enterprise's readiness in terms of how they're consuming cloud and data and whatnot? What's your view on that? Because you guys are deep in this. What's the enterprise's orientation these days? So slight nuance to that, as I answer, I think what organizations have realized is that, today two things happened that never happened in the last 20 years. Massive fragmentation of the persistence layer, you see Hadoop itself fragmented the whole database layer, and a massive fragmentation of the app layer. So there are 3,000 enterprise SaaS apps today. So just think about it. You're not restricted to one app. So what customers and enterprises are realizing is that the data layer is where you need to organize yourself. So you need to own the data layer. You cannot just be in the app layer and the database layer because you've got to be understanding the data because you could be anywhere and everywhere. The best example I give in the world of cloud is, you don't own anything, you rent it. So what do you own? You own the darn data. So in that context, enterprise readiness as you came to it becomes very important. So understanding and owning your data is the critical secret sauce. And that's where companies are getting disrupted. So the new guys are leveraging data, which by the way, the legacy companies had, but they couldn't figure it out. What is that? This is important. I want to just double click on that because you mentioned the data layer. What's the playbook? Because that's like the number one question I get on cube interviews or off camera is that, okay, I want to have a data strategy. That's empty in a statement. But what is the playbook? I mean is it architecture because the data is a strategic advantage. What are they doing? What's the architecture? What are some of the things that enterprises do? Honestly, they care about service level agreements and having potentially multi-cloud for instance as a key thing, but what is that playbook for this data layer? That's a very good question. So enterprise readiness has a couple of dimensions. One you said is that there will be a hybrid, doesn't mean a ground cloud, multi-cloud. I mean you're going to be in multi SaaS apps, multi-platform apps, multi databases in the cloud. So there is a hybrid world over there. Second is that organizations need to figure out a data platform of their own because ultimately what they care for is that, do I have a full view of my customer? Do I, or I have a full view of the products that I'm selling and how they are servicing my customers? That can only happen if you have what I call a meta-data-driven data platform. Third one is boy, oh boy. You talked about self-service analytics. You need to know answers today. Having analytics be more self-serving for the business user, not necessarily the IT user. And then leveraging AI to make all these things a lot more powerful. Otherwise you're going to be spending hours and hours doing statistical analysis and you won't be able to get to it given the scale and size of data models. And SLAs will play a big role in the world of cloud. Just to follow up on that. So it sounds like you've got the self-service analytics to help essentially explore and visualize. You've got the data governance and cataloging and lineage to make sure it's high quality and navigable and then you want to operationalize it once you've built the models. But there's this tension between what made the data lake great which was just dump it all in there so we have this one central place. But all the governance stuff that on top of that is sort of, well, we've got to organize it anyway. How do you resolve that tension? That is a very good question and that's where enterprises kind of woke up to. So a good example I'll give you, what everybody wanted to make a data lake. I mean if you remember two years ago, 80% of the data lakes fell apart. And the reason was for the fact that you just said is that people made the data lake a data swamp if I may. Just dump a lot of data into my loop cluster and life will be great. But the thing is that and what customers or large enterprises realize is they became system integrators of their own. I had to kind of bring data, catalog it, prepare it, surface it. So the belief that customers are now is that I need a place to go where basically I can easily bring in all the data. Metadata-driven catalog, so I can use AI and ML to surface that data. So it's very easy at the preparation layer for my analysts to go around and play with data and then I can visualize anything. But it's all integrated out of the box. Then each layer, each component being self-integrated and it falls apart very quickly when you want to, to your question, at an enterprise level operationalize it. Large enterprises care about two things. Is it operationalizable and is it scalable? That's where it is to fall apart. And that's what our belief is and that's where governance happens behind the scenes. You're not doing anything. Security of your data, governance of the data is driven through the catalog. You don't even feel it, it's there. I never liked the data lake term. Dave Vellante knows I've always been kind of against. Even from day one, because data is more flu I call it a data ocean. But to your point, I want to get on that point because I think data lakes is one dimension, right? And we talked about this at Informatica World last year and I think in this year it's May 15th, I think your event is coming off. But you guys introduced metadata intelligence. So the old model was throw it centralized, do some data governance, data management, fence it out, call, make some queries, get some reports of what we're simplifying. But it was like a side function. You're getting at now is making that data valuable. So if it's in a lake or a store, you never know when the data is going to be relevant so you have to have it addressable. Could you just talk about where this metadata intelligence is going? Because you mentioned a machine learning in AI. Because this seems to be what everyone's talking about. In real time, how do I make the data really valuable when I need it? And what's the secret sauce that you guys have specifically to make that happen? So to contextualize that question, think about it. So if you, what you don't want to do is, keep make everything manual. Our belief is that the intelligence around data has to be at the metadata level, right? Across the enterprise, which is why when we invested in the catalog, I used the word, it's the Google of data for the enterprise. No place in an enterprise you can go search for all your data. And given that the fast rapid changing sources of data, think about IoT as you talked about John or think about your customer data for you and me may come from a new source tomorrow. Do you want the analysts to figure out where the data is coming from or the machine learning or AI to contextualize and tell you, you know what? I just discovered a great new source for where John is going to go shop. You want to put that as a party of analytics to give him an offer. That's where the organizing principle for data sits. The catalog and all the metadata, which is where ML and AI will converge to give the analyst self-discovery of data sets, recommendations like an Amazon environment, recommendations like Facebook, find other people or other common data that has like a Facebook or a LinkedIn. That is where everything is going and that's where we are putting all our efforts on AI. So you're saying you want to abstract away the complexity of where the data sits so that the analyst or app can interface with that? That's exactly right. Because to me, those are the areas that are changing so rapidly. Let that be. You can pick wherever data sits based on what you want. You can pick whichever app you want to use wherever you want to go or wherever business wants to go. You can pick whichever analytical tool you like. But you want to be able to take all of those tools but be able to figure out what data is there and that shouldn't change all the time. I should have asked you a little while you're here. What's going to be the theme this year at Informatica World? How do you take it to the next level? Can you just give us a teaser of what we might expect this year? Because this seems to be the hottest trend, please. So first at Informatica World this year we'll be unveiling our whole new strategy, branding and messaging. There's a whole amount of push on that one. But the two things that we'll be focused a lot on is one is around that intelligent data platform which is basically what I'm talking about. The organizing principle of every enterprise for the next decade. And within that, where AI is going to play a meaningful role for people to spring forward, discover things, self-service and be able to create sense from this mountains of data that's going to sit around us but we won't even know what to do. All right, so what do you guys have in the product? Just want to drill into this dynamic that you just mentioned, which is new data sources. With IoT, this is going to completely make it more complex. You never know what data is going to be coming off the cars, the wearables, the smart cities. You have all these new killer use cases that are going to be transformational. How do you guys handle that and what's the secret sauce of? Because that seems to be the big challenge, okay? I'm used to dealing with data structure, whether it's scheme is now I got unstructured. So okay, now I got new data coming in very fast. I don't even know when or where it's going to come in. So I have to be ready for these new data. What is the Informatica solution there? So in terms of picking data from any source, that's never been a challenge for us because Informatica, one of the bread and butter for us is that we connect and bippling data from any potential source on the planet. That's what we do. And you automate that? We automate that process. So any potential new source of data, whether it's IoT, unstructured, semi-structured log, we connect to that. Where I think the key is, where we are heavily invested, once you've brought all that, by the way, you can use Kafka queues for that, you can roof spark streaming, all that stuff you could do. Question is, how do you make sense out of it? I can get all the data, dump it on a Kafka queue, and then I take it to do some processing on spark. But the intelligence is where all the Informatica secret sources, the metadata, the transformations, that's what we are invested in. But in terms of connecting anything to everything, that we do for a living, we've done that for one quarter of a decade, one quarter of a century, and we keep doing it. I mean, I love having a chat with you, I mean, you're a product guy and we love product guys and they can give us a little teaser on the roadmap. But I got to ask you the question with all this automation. You know, the big buzz out in the world is, oh, machine learning and AI is replacing jobs. So where is the shift going to be? Because you can almost connect the dots and say, okay, you're going to put some people out of work, some developers, some automation, maybe at the systems management layer or wherever. Where are those jobs shifting to? Because you can almost say, okay, if you're going to abstract away and automate, who loses their job? Who gets shifted? And what are those new opportunities? Because you can almost say, if you automate in that should create a new developer class. So one gets replaced, one gets created possibly, your thoughts on this personnel transformation? Yeah, I think what we see is that value creation will change. So the jobs will go to the new value, new areas where value is created. Great example of that is, look, developers today, right? Absolutely, I think they did a terrific job in making sure that the Hadoop ecosystem got legitimized, right? But in my opinion, when enterprise scalability comes, enterprises don't want lots of different things to be integrated and just plumbed together. They want things to work out of the box, which is why software works for them. But what happens is that they want that development community to go work on what I call value added areas of the stack. So think about in connected car, they're working with lots of customers in the connected car industry, right? They don't want developers to work on the plumbing. They want us to kind of give that out of the box because SLA is operation scale and enterprise scalability matters. But in terms of the top layer analytics to make sure we can make sense out of it, that's where they want innovation. So what you will see is that I don't think it jobs will go away per se, but I do think that the jobs will get migrated to a different part of the stack, which today it has not been, but that's, you know, we live in Silicon Valley, that's a natural evolution we see. So I think that'll happen. In general, in the larger industry, again, I'd say, look, driverless cars. I don't think they've driven away jobs. What they've done is created a new class of people who work. So I don't think that will be a big change. The fallacy there, I mean, the ATM argument was ATMs were going to replace tellers yet. More branches opened up, so therefore created net new jobs. I want to get to the quick question. I know George has a question, but I want to get on the cost of ownership. It's one of the things that's been criticized in some of these emerging areas, like Hadoop and OpenStack, for instance, to pick two random examples is, it's great, it looks good. I, you know, all peace and love and industry is being created to legitimize, but the cost of ownership has been critical to get that done. It's been expensive, talent, fine talent, and deploying it was hard. We've heard that on theCUBE many times. How does the cost of ownership equation change as you go after these more value, as developers and businesses go after them, these more value creating activities in the stack? See, look, I always say there is no freelance. Nothing is free. So, and customers realize that that open source, if you completely wanted to, to your point, as enterprises wanted to completely scale out and create an end to an operational infrastructure, open source ends up being pretty expensive. For all the reasons, right? Because you have to throw in a lot of developers and it's not necessarily scalable. So, what we are seeing right now is enterprises, as they have figured out that this works for me, but when they want to go scale it out, they want to go back to what I call a software provider who has the scale, who has the supportability, who also has the ability to react to changes and also for them to make sure that they get the comfort that it'll work. So, to me, that's where they find it cheaper. Just building it, experimenting with that, it's cheaper here, but scaling it out is cheaper with a software provider. So we see a lot of our customers who may start a little bit experimenting to developers, downloading something, works great, but then when I really want to take it across Nordstrom or a JP Morgan or a Morgan Stanley, I need security, I need scalability, I need somebody to call to, and that point, all those equations become very important. And that's where the out of the evoc experience comes in where you've got the automation, that kind of, does that ease up some of the customers? Exactly, and the talent is a big issue, right? See, we live in Silicon Valley, so we ought, by the way, Silicon Valley hiring talent is hard. Just think about it, if you go to Kansas City, hiring a scholar developer, that's a rare breed. So just, when I go around the globe and talk to customers, they don't see that talent at all, that we here just somehow take for granted, they don't. So it's hard for them to kind of put their energy behind it. Let me ask, more on the metadata layer, there's an analogy that's come up from the IIOT world where they're building these digital twins, and it's not just GE, IBM's talking about it, and actually, we've seen more and more vendors, where the digital twin is this, it's a digital representation now of some physical object, but you could think of it as metadata, for a physical object, and it gets richer over time. So, my question is, metadata in the old data warehouse world was, we want one representation of the customer, but now, there's a customer representation for a prospect, and one for an account, and one for in warranty, and one for field service. How does that change what you offer? That's a very, very good question, because that's where the metadata becomes so much more important, because its manifestation is changing. I'll give you a great example, take Transamerica. Transamerica is a customer of ours, and they're leveraging big data at scale, and what they're doing is that to your question, they have existing customers who have insurance through them, but they're looking for white space analysis, who could be potential opportunities, two distinct ones, and within that, they're looking at relationships. I know you, John, you have Transamerica, could you be an influencer with me? Or within your family, extended family, I'm a friend, but what about a family member that you have declared out there in social media? So, they are doing all that stuff in the context of a data leak. How are they doing it? So, in that context, think about that complexity of the job. Pumping data into a leak won't solve it for them, but that's a necessary first step. The second step is where all of that metadata through ML and AI starts giving them that relationship graph to say, you know what? John in itself has this white space opportunity for you, but John's related to me in one way. He and me are connected on Facebook. John's related to you a little bit more differently. He has a stronger bond with you, and within his family, he has different strong bonds. So, that's John's relationship graph. Leverage him if he has been a good customer of yours. All of that stuff is now at the metadata level, not just a monolithic metadata relationship graph. Here's a relationship graph of what he has bought from you. So, you can just see the discovery becomes a very important element. Do you want to do that in different places? You want to do that in one place. I may be in a cloud environment. I may be on-prem. So, that's where when I say that metadata becomes your organized principle, that's where it becomes real. Just a quick follow-up on that. Then it doesn't seem obvious that every end customer of yours, not the consumer, but the buyer of the software, would have enough data to start building that graph. I don't think, to me, what happened was, the word big data, I thought, got massively abused. A lot of Hadoop customers are not necessarily big data customers. I know a lot of banking customers, enterprise banking, whose data volumes will surprise you, but they're using Hadoop. What they want is intelligence. That's why I keep saying that the metadata part, they are more interested in deeper understanding of the data. A great example is, I had a customer who basically had a big bank, rich network customer. In their will, the daughter was listed. When the daughter went to school, by the way, went to the bank branch in that city, she had no idea. She walked up. She basically wanted to open an account. Three more friends in the line. Manager comes out because at that point, the teller said, this is somebody you should take special care of. Boom, she goes in a special cabin. The other friends are standing in a line. Think of the customer service perception we just created in a millennial, right? That's important. This brings up the interesting comment, the whole graph thing we love, but this brings up back the neural network trend, which is a concept that's been around for a long, long time, but now it's front and center. I remember talking to Diane Green, who runs Google Cloud. She was saying that you couldn't hire a neural network. They couldn't get jobs 15 years ago. Now you can't hire enough of them. So that brings up the ML conversation. So I want to take that to a question and ask about the data lake because you guys have announced a new cloud data lake. So it sounds like from what you're saying is you're going beyond the data lake. So talk about what that is because data lake people get, you throw stuff into a lake and hopefully it doesn't become a swamp. How are you guys going beyond just the basic concept of a data lake with your new cloud data lake? Yeah, so data lake, if you remember last year actually, it started San Jose, we chatted and we had announced the data lake because we realized customers to your point, John, as you said, were struggling on how to even build a data lake and they were all over the place and they were failing. And we announced the first data lake there and then in Stratus, New York, basically we brought the metadata ML part to the data lake. And then obviously right now we're taking it to the cloud. And what we see in the world of data lake is that customers ask for three things. First, they want a pre-built integrated solution. Data comes come in, but I want the intelligence of metadata and I want data preparation picked in. I don't want to have three different tools that I will go around out of the box. But we also saw as they become successful with our customers, they want to scale up, scale down. Cloud is just a great place to go. You can basically put a data lake out there. By the way, in the context of data, a lot of new data sources are in the cloud. So it's easy for them to scale in and out in the cloud, experiment there and all that stuff. Also, you know, Amazon, we supported Amazon, Kinesis, all of these new sources or new technologies in the world of cloud are allowing experimentation in the data lake. So that allowed our customers to basically get ahead of the curve very quickly. So in some ways, cloud allowed customers to do things a lot faster, better and cheaper. So that's what we basically put in the hands of our customers. Now that they are feeling comfortable, they can do a secured and governed data lake without feeling that it's still not self-served. They want to put it in the cloud and be a lot more faster and cheaper about it. And more analytics on it. More analytics. And allow, and now, and because our ML, our AI, the metadata part connects cloud, ground, everything. So they have an organizing principle, whatever they put wherever, that they can still get intelligence out of it. I mean, we got a break, but I want to get one final comment for you to kind of end the segment. And it's been fun watching you guys work over the past couple of years. And I want to get your perspective because the product decisions always have kind of a timetable to it. It's not like you made this up last night because it's trendy, but you guys have made some good product choices. Seems like the wins at your back right now at Informatica. What specifically vets that you guys made a couple of years ago that are now bearing fruit? Would you just take a minute to end the segment to share some of those product bets? Because it's not always that obvious to make those product bets years earlier. Seems to be a tailwind for you. You agree and can you share some of those bets? Now, I think you said it rightly, product bets are hard, right? Because you got to see three, four years ahead. The one big bet that we made is that we saw as I said to you the decoupling of the data layer. So we realized that look, the app layer is getting fragmented. The cloud platforms are getting fragmented. Databases are getting fragmented. That whole old monolithic architecture is getting fundamentally blown up and the customers will be in a multi, multi, multi-spread out hybrid world. Data is the organizing principle. So three years ago, we bet on the intelligent data platform. And we said that the intelligent data platform will be intelligent because of the metadata driven layer. And at that point AI was nowhere in sight. We put ML in that picture and obviously AI has moved. So the bet on the data platform, second bet that in that data platform it'll all be AI ML driven metadata intelligence. And the third one is we bet big on cloud. Big data we had already bet big on by the way. You knew already there. We knew that cloud, big data will move to the cloud far more rapidly than the old technology moved to the cloud. So we saw that coming. We saw the red shift HD inside wave coming. We worked so closely with AWS and Azure team with Google now as well. So we saw three things. And that's what we bet. And you can see the rich offerings we have, the rich partnerships we have and the rich customers that are live in those. And the markets right on your doorstep. I mean AI is hot ML. You're seeing all the stuff converge with IoT. Those were some I think forward looking bets they've paid out for us. And but just so much more to do and so much more upside for all of us right now. A lot more work to do. I mean thank you for coming on, sharing your insight again. You guys got in good pole position in the market. And again, it's right on your doorstep. So congratulations. This is theCUBE. I'm John Furrier with George Gilbert with more coverage in Silicon Valley for Big Data SV and Strata Hadoop after the short break.