 Live from New York, it's theCUBE. Covering Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, NVIDIA, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and Peter Burris. Welcome back to New York City, everybody. This is theCUBE, the worldwide leader in live tech coverage. We're here in conjunction with Strata plus Hadoop. Strata plus Hadoop world, which has evolved over the last seven years we've been here. This is Big Data NYC, our show within the show at Big Data Week. Ronan Schwartz is here, he's the Senior Vice President and General Manager of Data Integration and Cloud Integration at Informatica. Ronan, good to see you. Thanks for coming on. Good to see you again, and I'm really happy to be here in New York. For me, it's only the sixth time from the seventh, but it's bigger than ever. Yeah, well, it was a historic moment for theCUBE when I flew in from Dallas to meet John Furrier for what was our first Hadoop world. Back then, you know, was it the Hilton? And it wasn't that crowded, and people were like, what's Hadoop, you know, right? And that's evolved pretty quickly. It went from sort of what it is, can I get it to work, to wow, how can I bring it into my enterprise and make it trustworthy? And we're sort of slightly beyond that now, aren't we? And it's all about data, so. It's true, it's all about data, and now it's data for everybody, data in the cloud, data in a hybrid environment. And really, like you said, it's actually how to make data work in real use cases for small organization, but also to the enterprise. Well, and our research for years has shown that one of the key tenants of big data initiatives is the existing infrastructure, the existing data warehouse infrastructure, the transformation, data transformation tooling that's being used, you can't just ignore that. Although many customers did early on, they sort of said, okay, we're going to do a little Skunkworks project, and then organizations started to realize, well, wait a minute, we have all this other data, then we need data integration. So talk about the role that Informatica plays in that whole evolution. Thank you, and it's a great opportunity for me to kind of share Informatica's view of that world. Informatica was always focused on data, and if we look 30 years back, then into the data, you know, we used to call it data 1.0, then the data was really a part of an application, of a single application. It was very, very important, but it was important in the context of the application that actually generates that data. If we're looking into fast forward to 10 or 15 years ago, then that was the time of the data 2.0, where an enterprise was starting to look into data across multiple application, building the data warehouse, building ESP to kind of move data across application in real time, some events that are going across application, but actually it was still, I think, very limited in the view and the viewers of the data. Data 3.0 is the world that we're living, at least in the last three to five years. It is actually the world that, with big data as well as small data, it is the world with data in the cloud, on-premise, and in a hybrid environment, it is the world that actually have more data consumer in new roles that didn't exist before. It's not just the business intelligence group, it's actually everybody is a data employee, and last but not least, important is actually data in real time and data in batch. Both of them equally important, you need both of them depending on the use case. The most important thing in data 3.0 is that in data 3.0, data is the center. It's not the application, it's not the data warehouse, it actually is the whole data that is actually in the center. So let me see if I can pick up on that. So data 1.0 was the data's under control of the application, so to get to the data I had to go through the application, which had its own set of bureaucratic and investment considerations. So the application was the primary citizen, the source of investment, the source of organizational control, and the determinant of whether or not you got access to it. In data 2.0, it really was the data management technologies, which once again had its own set of organizational and financial and investment profiles. And now it's moving to this data 3.0. I think what you're saying from a customer or CIO practical standpoint is you are now going to look at data as an asset of itself. You're going to build organizations that are focused on data as an asset, you're going to invest in data as an asset so that people can get access to the data based on their specific use cases. Have I got that right? Yes, absolutely. So data is actually becoming at the center and it's in the center and not just for the data management group as it used to be in the data 2.0. It is actually in the center of the CEO level. Definitely the CIO, one of the most important thing it needs to deal with is with data. And it actually affects many things. It affects analytics, but it also affects security. It affects so many other aspects of the organization. And I think especially when we're here in a Duke world, what I want to make is a point is that data is an asset inside a Duke, but also outside of a Duke. Data is kind of the key asset of the enterprise in general. And there's a lot of evidence of that. We had a great conversation with the senior executive of the Fortune 500 company last night about how he met with the board of directors of this company and they spent significant amount of time talking about data as an asset if only from a security standpoint. So there's a lot of evidence that this is happening. Now how is it starting to happen in the Hadoop big data ecosystem? I think what Hadoop really empowers companies to do is actually to change their business. It's to change their business into a digital business when they're actually able to, for the first time, to process massive amounts of data and change really the way business is done. And that actually brought the data into the center. And actually in a second level of effect, actually got the organization to collect data inside a Duke, but also drive things like one of the things that Informatica is announcing here in Strata Hadoop is an enterprise information catalog. Can we actually get a full visibility to all my data inside my multi-parallel Duke clusters, not just one, but multi-parallel Duke cluster, but also in any other place in the enterprise, including the data 2.0 data warehouse, including application data that is not connected to a Duke. Can I actually get the visibility across the board? Can I not only get that visibility through writing jobs in specific coding language? Can I actually get that visibility through a business interface? Can I actually bring the right data into the lake without a massive code effort? These are the things that Informatica is actually announcing here that are coming to really support this revolution of putting the data in the center. And that's essentially an effort to sort of make the metadata more transparent and accessible. Is that right or? Absolutely, so if data is actually the key, then the map of... Data is the asset. Yes, data is the asset, then actually mapping your assets is a very important thing, right? Imagine we all have our financials, right? You actually want to know where your money is and how is it doing, who is handling it, the changes. If data is the key asset, you want to know that about your data. So we've been looking for a great catalog in the Big Data Universe for quite some time. We found a number of who claim to be that, but just aren't there yet. At least that's been our experience. So let me test you, let me push you on this. A great catalog should allow me to discover. Yes. You can do that. Yes. Great catalog would improve my ability to govern data. That is absolutely right, yes. Because I want to know who the ownership is and what the permissions are and what the rules of engagement. Yes. I have to understand also some of the lower level, the metadata stuff, so that I can understand how transformations take place when they take place, et cetera. What about going back to the notion of data as an asset? I can actually start building a data portfolio management system. Am I in a position where I can start tracking investments in data, returns on data, and figuring out how much additional investment I should be making? Absolutely, and I really want to encourage your viewers and the Informatica customer base and the DUP joiners to look into the Informatica information catalog. To do what you described, one of the key things that we bring to the table is intelligence. Because with massive amounts of data, data inside the Hadoop multiple clusters and outside of Hadoop, you actually want the intelligence of this catalog to tell you not just where the data is, but actually how does it relate to each other and where do you have actually the same data set and you don't even know. And just like you said, who else is doing the same thing that I'm trying to do? What assets are available for me that are more prepared, more authorized, more governed versus assets that are less governed and so on? Democratization of data is really, really important, but just like with democracy, you need to have certain tools and certain capabilities to allow everybody to influence the government, this is what we're giving here. So you give access, but the governed access. You can control who can see what in what level. Informatica have a very, very strong, not just governance security, but also data security layer. So you don't want to prevent people from seeing the data, but you might actually prefer for them not to see the social security number, right? Should I prevent them from access to the whole file? Probably not, but if I can mask just this one field, that is actually a great solution. So a true data 3.0 infrastructure must actually support everything that you described and if you can do it with intelligence, you can actually save a lot of time and empower much more people with data. So this idea of a portfolio was interesting because the 1.0 is a pejorative, right? But the good thing about 1.0 is the organization had visibility on the application portfolio because it tied to the business process. And what with 2.0 became insights for a few. And it was sort of, okay, it's over here and it's slow and we get that. And only very specific insights. You have to look at, you looked into the data in the very specific lenses that somebody else defined for you. I think David means that the insights into the value of the data through a few people had it. That's right. So power analysts, you had to go through that were the bottleneck and the promise of big data is, as you pointed out, it's much more ubiquitous across the organization. And so I hadn't really thought about thinking about data as a portfolio, I don't know why I hadn't thought about that. It's a pretty obvious thing to do, but the chief data officer presumably is responsible for that notion of a portfolio, where to invest, how to evolve that portfolio and support the monetization of the business, not necessarily the monetization of the data. And that's a mistake that a lot of people made early on in big data, said, okay, how are we going to make money off selling the data? And the thing that organizations realize is, wait a minute, we understand how we make money. There's lots of ways to make money. How can data support that? If I could, just because you can copy data easily does not mean that you should look at data's fungibility as something you can use externally. Because there's a lot of circumstances where you don't want to give up that data. Now there's a lot of arguments about this and whatnot, but having clear understanding about how data in context generates value for your business is crucial to doing a good job of this. And we need strong tools that are capable of expressing those insights about how data should be governed directly into the governing process of data. And if you didn't trademark already data portfolio, I think it's a very, very strong trademark that you should do. And in that context, and just connecting it to the data management tools, if you're looking to data as a portfolio and you look at data as an asset, then how do you actually make this asset better? How do you make this asset bigger? Should be one of the top concerns or top focus of the CIO? And Informatica is- If not the top focus. Yes, yes. I was trying to be kind of very, very generous. I actually think- Of the CIO? Yeah. Of the CEO as well. I kind of agree because it's a shift of business. Or the CBO, this gets into an interesting organizational discussion because we used to think that the CIO's job was really to manage that data as an asset or a liability. But sir, I was the chief information officer for a reason. But in many respects, the CIO was never able to do that because he or she was having to just keep the infrastructure lights on. And so then this CDO role emerged. There's a lot of discussion about who sort of owns that. And the more, well, certainly in financial services, healthcare and government, it's the chief data officer, increasingly. Would you agree with that or? I think that the chief data officer as a role was created because of the growing importance of data and because of the CIO with its broad responsibility could not actually give enough focus to the data part of it. But I think today's data is probably the most important asset that exists in the organization. It is the CIO that should care about it. You have the CDO to help him in doing that. I also think that in many businesses, and some of the businesses that are not just created yesterday, but also businesses that exist for a long time, the data is actually becoming the business. And as data is becoming the business, it's actually the problem of the CEO, not just the CIO. So as a former CIO, would you be, I hate to say it, but it's a land grab for the data who owns the data. I mean, it's going to really fund, if you believe that data is the future and thinking about data as a portfolio, then if I'm the CIO, I might want to stake out some turf. Oh no, no question. But, well, here's how I would put it. That in a period of turbulence and significant transformation, a couple things happen. First off, the business steps back and says, what are we not doing a good enough job governing from an asset standpoint? Oh, data. Well, we need to put someone in charge of establishing the policy, the rules, the organizational approaches, the governance models for how we think about shared claims on that asset. So it's perfectly natural that a lot of businesses would say that's important. If they want to associate a particular chief with that, like a CDO, fine. But we'll go through this process and it would all come back together in some form. The other thing that happens, and here's where my bigger concern is, is that there's a marketing strategy practice in the tech industry, of course, not Informatica. But there's a marketing practice in the tech industry that you don't want to have to sell to large communities. It's a very complex sale. You want to collapse it down to the individual that you can sell to. So the tech community from a marketing standpoint has always created that single buyer. And tried to get people to invest in that notion of a chief something. A DBA, a Cisco certified engineer. A storage manager, whatever it was. And so there's some of that going on too. Let's create our buyer. But the bottom line is, if data's going to be acknowledged as an asset, not the only asset, but as knowledge, as an asset, then we have to up our game in how we price, value, govern, and invest in data. I totally agree. And I think how to make data better, how to make data governed so that you're actually getting better results is going to increase the value of the data for a specific company. And what you're seeing is actually whole new businesses being generated based on data. And one of my favorite example, the speaker that was with me in Informatica World, coming from a company named JLL. JLL is a more than 200 years company focused on commercial real estate, right? And they are able to change their business from person to person renting of commercial real estate into a full mobile real-time experience, all based on data. To deliver something like that, you suddenly need to collect every angle that is available within the facility that you're renting. You need to understand the angles of the sun in different hours, the windows location. You suddenly become not just a line of address as the data, but actually all 3D information that exists there. And if you want actually to allow somebody to be in one place and at the same time look into his iPad and look into a different asset and say, which one is better so we can make a faster decision while he's still closer to you as the seller, you have to actually base your business on data. And at that stage, data is not something that goes to manage your finance. It's not something that got to do with the quarterly report. It actually becomes something that everybody in the organization work with. And such an asset is important to everybody in the organization. But it dramatically extends the value of the business. That piece of property that you're looking at is still an asset and still crucially important. But what you're doing, and this is our definition of digital business, you are differentially using data to create and sustain customers. I totally agree, I totally agree. And I did want to add, I mean, I think we talked about the CDO as one profession that has been kind of rising because of the importance of data. I think the data analyst and a really very specific type of data analyst is another one, data operation. How is another new role that is rising? Managing those flows of data through the organization. Data engineer, we heard from the chief, that's the data scientist yesterday is a critical role in the team. That is totally correct. And I think one of the things that Informatica is actually really trying to do is to empower them with the right tools so that they can actually look into the data, improve the data, get the data to be available at the fingertips of the right user at the right time. And it actually is not a simple exercise. It's an exercise that involves scanning all the data in the organization. Sometimes you hear data warehousing, data lakes. Let's bring all the data into one place. But then you're learning that, even Hadoop clusters, you have more than one, right? So the proliferation of data, the availability of data in different places, you have to handle that. So you have to get real visibility across the enterprise. The second thing that you have to do is you have to be able to take data in batch in massive amounts or move events data in real time from one place to the other. So you have to support both the real time as well as the batch optimization. You have to give very, very strong tools to the data engineers. And at the same time, you have to give simplicity to the business user because they're all working with data. I think the last thing that I'll say is you have to do it on-premise and in the cloud because the change is there. Hybrid is happening now and the cloud is rising. Well, as John Furrier likes to say too, data is the new development kit. So you don't see application developers diving in. Data science, application developers coming together. So great discussion, Ronan. Thanks very much for coming on the queue. Thank you very much for hosting me. You're welcome, all right. Keep right there, everybody. We'll be back with our next guest. This is theCUBE, we're live from New York City. We're right back.