 Live from Las Vegas, Nevada, extracting the signal from the noise. It's theCUBE covering Informatica World 2015. Brought to you by Informatica World. Okay, welcome back everyone. We are here live at Informatica World 2015. This is theCUBE's SiliconANGLE's flagship program where we go out to the events and extract the signal from the noise. I'm John Furrier, the founder of SiliconANGLE. Joe, my co-host, George Gilbert, our new Wikibon big data analyst, just new on the scene, taking over the helm from Jeff Kelly. And our next is Anil Chakravarti, he's the EVP and Chief Product Officer of Informatica. Great title, we love the product guys. Because now we can get down and dirty in the product. So welcome back to theCUBE. My pleasure, thank you for having me. We had a great chat at Amazon re-invent past event. We were out there, great chat. Cloud is changing the game. Data now is obviously rising up in all CIOs, CDOs, CXOs are all mindful of the fact that data is not just an operational cost. It's an opportunity to drive revenue and growth. What do you guys have on the product that align with that vision? And what are some of the customers you guys have right now that you're working with? Yeah, we call what we do as the vision, as the intelligent data platform. And we see the world evolving exactly like you laid out. We think of 10 years ago, data and applications were very closely integrated together, very closely tied together. Now the data architecture and the application architecture are going in separate directions, obviously aligned and integrated, but data, people realize that data lives on for way longer than even the applications do. And so they also realize that data, their data is both on premise as well as in the cloud. And so they need to be able to address data. So the intelligent data platform is all about that. It's a new way of managing your data. I was talking to a customer last night and I said to him, and I was talking with one of our colleagues, I'm like, has a customer ever said to you, I need less compute power? And with the cloud, where you mentioned Amazon and these new resources, whether it's on-prem or in the cloud, compute is becoming more and more available, which enables more data analysis, whether that's for security purposes or and or for other value. So as compute and as resources become infinite, if you will, I mean infinite, so to speak, data has to be frictionless. That's correct. How do you guys enable that? Because that is a big problem. Customers have these islands of data or ponds of data, whatever you want to call it. They want to put them into an ocean or a lake or whatever metaphor. This is a huge challenge. What do you guys see for that piece of the product? We look at it from the customer's perspective and say, what are they trying to do with this data? Enterprise customers, if you think of their data-driven business initiatives, we look at it as five primary buckets. This captures bulk of what our customers do with our data. Analytics is a big one. Application consolidation, getting a 360 degree view of their customers, cloud modernization, data governance. Those are the five big things that they do with our data. And we structured a lot of our technologies and solutions around those five areas. And how do you guys do it? Because in the old days, you had a one-trick pony product, not you guys, but a vendor, oh, here's my general purpose solution, and then sell it out. And with data, use cases differ by client, by industry. There's a lot of dimensions, even amongst the data itself and the customers. So how do you build a platform? Is it, do you guys architect it differently? Is there certain technologies you guys rely on? Can you share some color on that? Yeah, the good thing about that one-trick pony, as you called it, was it was actually built with an architecture that was actually portable. And that's the core component of what we call the intelligent data platform. That engine, which we call as Vibe, which is the engine that actually is portable across multiple platforms, helps you do data processing across your, across on-premise in the cloud, on Hadoop, et cetera. That's the foundation. On top of that, we call it the data infrastructure, where there are certain core components that are integrated together. Data integration, data quality, data mastering, data security, these are all integrated together in the data infrastructure layer. And then, as you saw at the show, we're really investing a lot in what we call data intelligence, because, as the volume of data grows exponentially, unless you know more about your data, where is it located, is it sensitive or not, what is the quality of the data, what is the trustworthiness of the data, if you don't have that metadata, you cannot be successful with your data, and that's the other layer we're investing in. So, the way we think of bringing all of this together is this intelligent data platform, which has these three key layers. Talk about the metadata, because that's really a big thing I want to unpack. Last week at EMC World, we had theCUBE there, and I asked the head of Xtremio, the new big rock and product at EMC, what are the three conversations you're involved in the most? He goes, metadata, metadata, and metadata. That's a big part of how the brains are being formed in this fabric, having metadata available, always around the data. How do you guys see that evolving naturally? I mean, is there a certain technology? I mean, metadata describes data. And so, if I'm integrating disparate data sets, I need access to all the metadata. That's right. Now, metadata is, it's also easy to get confused about metadata, so let me just explain what we mean by metadata. What we mean by metadata is, as you said, data about data, but the data is really about what types of data, what quantities of data, and where it's resident. So it's in effect, it's looking inside the data store. If you look at it from the viewpoint of an infrastructure company like EMC, for them, metadata would mean more about file sizes and other things at the infrastructure level. For us, it's at the data level. So if you have a big database table with a million records, for example, we would know what columns you're referring to and what those columns actually contain. So that's the data that we are building on. Exactly, as you said, having visibility to that, you then have to be very careful about how you use that metadata. You don't need to actually expose everybody to that metadata. You need to know what to use it for and how to use it. And that's what we're building with the Live Data Map. There's a trade-off involved where you have all this intelligence, it's almost like a sort of grand central station where you know who's coming and going and everything about the schedules. Correct. But when you have this sort of frictionless flow of data where you sort of crack the data loose of all the applications, as you're constantly expanding the data systems and the applications, there's a trade-off where, if you put the intelligence at the edges, you have a potentially more agile way of adding new sources and targets, whereas if you put it in the center, it's maybe more waterfall. Everyone has to agree on changes. Is that a fair characterization? Yeah, I think actually the best way to understand how Live Data Map works is it is a federated architecture. In other words, for example, let's say you have a product like the Intelligent Data Lake, Project Sonoma, using metadata. It does not depend upon the central metadata source. It has a metadata that it is gathering, but the key thing is it's in the same format. It is shareable with metadata from the other sources. So in other words, to address your question, if it's not all dependent on a central single instance of metadata, metadata is still embedded within those products, but the format is shareable, discoverable, searchable across all these products. So when you find something that, you find contextual data that you want to use in your analytics and it's repeatable, it's easy to connect. And it's easy to share it with that federated repository. Exactly. So I got to ask about this data lake thing that's been kicked around. People know that I'm not a big fan of data lake. I like data ocean better because it's more current, more flows, different kind of diversity, but data lake, I get that. It's a marketing term. People have been using it to show how you combine data and it helps people get to the journey. But one of the things that's come up here that I want to get your take on is most people have a data swamp. They jam everything in too fast into this big pool or data and there's bad data quality. And what they want to get to is an intelligent lake or robust, whatever you want to call it, but they have swamps instead. So how do you guys get a customer from a swamp to a lake, a intelligent data lake? Yeah, and let me just add, when you're also in the context of fair evaluating some of the other data wrangling tools, you know, where it's the self-service, let me explore this swamp. Yeah, so the both alternatives. So that's exactly what Project Sonoma addresses is. We believe to have a data, whatever you want to call it, a data reservoir or something more refined, a data swamp. There are a couple of ways of doing it and you need, as a tool, we want to support all of them. One way of doing it is to have trusted people put data into it. We've already pre-checked the data. That's what we call curated data. So for example, they're looking at data about revenue. There's no argument about which data about revenue is accurate or not. IT knows which data sources are accurate, systems are record and they pre-populate the data. So that's one way of making sure the data that is in there is high quality. The second way is collaborating, where you say, look, just like, you know, when you go shopping on eBay or Amazon, for example, you look at ratings. There are people who have very high ratings because they've been trusted buyers or sellers. Same thing with trusted users of data. And if it's a trusted user, any data they contribute is of a higher quality. And those are some of the methods that we're using to make sure that the data in there is actually of high quality. So I got to go to the next step on that, which is, okay. The CEO and chairman was up on stage saying, the age of engagement. Correct. So now let's take that to the engagement. What is the engagement piece of that puzzle? Is that the actual trusted gesture data? Or what does that mean? What does that mean, engagement data? The engagement data essentially means that, let me use an example to clarify it. You know, if you went to an ATM machine, say 10 years ago, what did the ATM machine record? It only recorded basically your transactions in the timestamp. You know, you made a transfer or you made a credit or debit, that was it. Today you go to the same ATM machine, what's it recording? What it is recording, for example, is if it's a touchscreen, it's recording your touch. If there's a camera, the camera is recording your actual presence and what you do, et cetera, for other potential users. Those extra things that it's actually being stored, that's what we mean by the age of engagement. It is not the actual transaction data, that's obviously being recorded. It's all the other interaction that you're having. For example, if you have a mobile app and you show up at an ATM, the company knows that, hey, he's a mobile banking customer who is at this ATM. Why is he at the ATM? Maybe there's something in the mobile app he could not do. Those are the kinds of things that we call the age of engagement. So talk about the thing, Dan. First of all, I want to congratulate you guys, doing a great job with the product. The messaging here is great. And you know, we were talking earlier about how, typing energy at the show, huh? Tons of energy. I mean, the lunch room's packed, not even a seat to have, you know, it's just, it's a lot of energy. No, plus great hotel. But your message of bringing the data up, okay, call it a data lake or whatever is a good message. So how do you guys take it to the next level, okay? So that assumes, okay, now data's accessible. And how do you guys take it into the next generation in two dimensions? Data at rest and data in motion. This goes back to my data ocean analogy, but like, I got a lot of motion going on, a lot of fast data, real time. Then I got data at rest. Correct. How do I make that accessible for all applications and for the entire network? The main aspects of that is, first of all, accessibility. Accessibility is one, how do I get access to it and how do I move it? So for example, data, that's moving, that's at a very high streaming rate is not ingested the same way as batch data. So we have multiple types of ingestion mechanisms to make all data accessible. Second key piece is, who has access to the data? If it's a lake, if it's a free for all, there's going to be a governance nightmare. So how do you make sure that whatever security controls were in place at the original source of the data can continue to be used in the lake? Otherwise, for example, you're all just opening up access of all data to everybody. So that's accessibility is a key aspect of it. The second piece, which you said, which covers the data at motion, what we think of it as sort of essentially data in flux, which is there's a project being worked on in the demo that you saw of Project Sinoma. For example, you could see that analysts can create their own projects. Any data that's in a project is essentially data at motion because they have not yet finalized the data. They are tweaking it, they're blending it, they're using it. They're working with it, yeah, they're playing with it, they're analyzing it. When they're done, they get to publish it. When it gets published, it becomes data at rest that other people can use. If it's in an active project, it's data in motion. And that's how we handle the data. It's super exciting. I wrote a blog post in 2008, I said, data's the new development kit and kind of riffing on the notion that the old days of developers you have a development kit and you get some source code. What you're referring to now is data's now being acted on. As a resource in the development process, whether it's a business user or a developer. Exactly, and being shared, it's being shared, it's collaborative and it's a way of making sure that you get, it's a simple way of thinking about it. In the past, when you needed to do it, for example, get a document typed up. You went to a typist pool. That typist pool is gone now. It's all self-service. How do you do the same thing with analytics? Why do you go to a central department to do analytics for you? You won't. Soon, every user will be able to do their own, but it is with trusted data that they get from trusted sources. Following on that, we've been talking to customers like Bloomberg, JP Morgan Chase, CreditSwiss, and they're saying, along those lines, it's not just SaaS apps, you know, the sort of business process efficiency apps, but analytic apps where they take what used to be a data feed from a provider, and they might want to push the analytics on Bloomberg, so Bloomberg is responsible for integrating all the data feeds, and the bank might be updating the analytics, but that the integration is done just once in a provider. Is that a direction you see? That's actually, in fact, we acquired a company called Strikeion which provides data as a service for exactly that reason. Now, data, as an example, if you're an insurance company, you're trying to understand your white space. You got your own internal data of which products you sold to which customers, and that might come from five or six different systems. Then you go outside to a third party and you get a lot of household data. What are all the households in the United States, and what is the- I'm getting more. And you get some of these could be real-time, some of these could be one-time batch feeds, and then you've got to put all of that together. A lot of that type of data, those data feeds, are, we make that possible. We work with any data feeds that come automatically through like a Bloomberg, for example, or we can also help customers initiate their own data feeds, a very specific data that they need for their analytics. Okay, that's it. So what's next on the roadmap? Give us a taste of what's coming around the corner. I mean, you're a public company now, but still high level, directionally, from this event. Yeah. What's your objectives? I mean, you're going to go back and keep on working. Certainly when you go private, it's going to be nice to have some privacy around retooling, but what's next? Four big areas we're concentrating on. One is the cloud, where for us, it basically means any kind of cloud deployment, hybrid cloud, on-prem, all in the cloud, et cetera. Cloud is a big area of focus. Second area of focus is around the next generation of analytics, this model of how to make it completely self-service and collaborative analytics, but with trusted data. The third one is master data. You know, master data is one of our crown jewels. And in the world of big data, the data swamp, data ocean that you talked about, for data to be useful, it has to be anchored to real master data. That's a key area we're focusing on. The fourth area is data security. So all four areas are active investment areas for us. What would you say about the security piece a minute or two left? Security's obviously big data driven now. Anywhere you go, you see that impact. What's the do over in security right now? I mean, security's upside down. We kind of recognize that. What do companies do for security? What's the big do over? Well, they have to just start looking at protecting what really they care about, typically tends to be the data. So what we've had so far are security tools that kind of protect the data indirectly, through network security, et cetera, et cetera. And clearly for a lot of customers, that is not proving to be sufficient. So they just have to do both. They have to keep those kinds of network security tools, et cetera, active. At the same time, they just go in and start understanding their data models and say, where is my critical data? If I care about social security numbers, critical numbers, account numbers, policy numbers, whatever you may have. Where is it, first of all? Where is it? Is it protected today, where it is? Who else has it? Is it being copied? Is it being shared? Is it being put into a data lake? Where everybody has access to it, maybe even contractors and outsiders? So that's the mindset they need. The do over. I mean, this is going to fall on your shoulders when they had a perimeter. The data guys, the perimeter's got everything secure, but once this breach, now it's no perimeter. Exactly. It's on you now. Exactly. So the people responsible for the data have to look at it from a data perspective. So you guys have concepts like geo data, like knowing where the data is, find my iPhone kind of concept, where it's like, yeah, yeah, find my iPhone. And data moves around so much. You must have tracking. That's what data discovery is all about. Even the metadata that we use helps us discover the data. So it says, look, here are these thousand databases in which you have personally identifiable information. So you guys build security in from day one into your platform? Well, that is- For the data. That's one key piece of it. It's one, keeping our products secure, but then using the metadata from our product to make security better for our customers. This is where I think the signaling, so let's share this with your thoughts on this. Because to me, I would think that the signaling of the data, using big data, like, hey, someone's accessing that data out here. Correct. Is a notification opportunity, maybe a small data point. Correct. But in the bigger picture, could be a signal. That's correct, exactly. And that's where security is all about detecting anomalous behavior. Because if you can identify what's normal and then anything that's abnormal, you got to go track down and see what's really going on. Easy to say, very hard to do, but that's where we're getting to. That's a new type of application. That's the system of intelligence on top of it. That's exactly right. This comes from the data intelligently. Okay, yeah. Okay, Anil, that's awesome. Thanks for joining us and sharing your insights. I know you're super busy. Congratulations on the great show. Thank you, my pleasure. And thanks for having me on the show. I love being on theCUBE, but hopefully we'll run into each other soon. We love having you on your tech athlete. Anil here, SVP, EVP, Chief Product Officer. Always talk to the product guys to find out what's going on. That's what I have my philosophy on theCUBE. We'll be right back after this short break.