 Live from New York, it's theCUBE. Covering theCUBE, New York City, 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. Welcome back to the Big Apple, everybody. This is theCUBE, the leader in live tech coverage. My name is Dave Vellante. I'm here with my co-host, Peter Burris. And this is our week-long coverage of CUBE NYC. It used to be really a big data theme. It sort of evolved into data, AI, machine learning. Ronan Schwartz is here. He's the Senior Vice President and General Manager of Cloud Big Data and Data Integration at Data Integration Company Informatica. Great to see you again, Ronan. Thanks so much for coming on. Thanks for inviting me. It's a good warm day in New York. Yeah, the storm is coming. And well, speaking of storms, the data center is booming. Data is this crescendo of storms have occurred. And you guys are at the center of that. It's been a tailwind for your business. Give us the update. How's business these days? So we finished Q2 in a great, great success, the best Q2 that we ever had. And the third quarter looks just as promising. So I think the short answer is that we are seeing the strong demand for data, for technologies that supports data. We're seeing more users, new use cases, and definitely a huge growth in need to support, to support data, big data, data in the cloud and so on. So I think very, very good Q2, and it looks like Q3 is gonna be just as good if not better. It's great. So there's been a decades long conversation, of course, about data, the value of data, but more often than not of the history, of recent history, when I say recent, I mean, let's say 20 years on. Data's been a problem for people. It's been expensive. How do you manage it? When do you delete it? It's sort of this nasty thing that people have to deal with. Fast forward to 2010, the whole Hadoop movement, all of a sudden, data is the new oil, data is this, which Peter, of course, disagrees with for a reason. No, it's subtlety. It's a subtlety, but you're right about it, and well, maybe if we have time, we can talk about that, but the bromide of, but it really focused attention on data and the importance of data and the value of data, and that's really a big contribution that Hadoop made. There were a lot of misconceptions. Oh, we don't need the data warehouse anymore. Oh, we don't need old legacy databases. Of course, none of those are true. Those are fundamental components of people's big data strategy, but talk about the importance of data and where Informatica fits. In a way, if I look into the same history that you described in Informatica, I've definitely been a player through this history. We divide it into three areas. The first one is when data was like this thing that sits below the application, that you use the application to fit the data in, and if you want to see the data, you go through the application, you see the data. We sometimes call that as data 1.0. Data 2.0 was the time that companies, including Informatica kind of rose, and being able to give you a single view of the data across multiple systems, across your organization, and so on. This is where Informatica, with the ETL, with data quality, even with master data management, kind of came into play and allow an organization to actually build analytics as a system, to build a single view as a system, et cetera. I think what is happening, and Hadoop was definitely a trigger, but I would say the cloud is just as big of a trigger as the big data technologies, and definitely everything that's happening right now with Spark and the processing power, et cetera, is contributing to that. This is the time of the data 3.0. When data is actually in the center, it's not a single application like it was in the data 2.0. It's not this thing below the application, the data 1.0. Data is in the center, and everything else is just basically have to be connected to the data. And I think it's an amazing time. A big part of digitalization is the fact that the data is actually there. It's the most important asset to the organization. So I want to follow up on something. So last night we had a session, Peter hosted the Future of AI, and he made the point, I said earlier, data is new, I said you disagreed. There's a nuance there. You made the point last night that oil, I can put oil in my car, or I can put oil in my house. I can't do both. Data is the new currency. People say, well, I can spend a dollar on groceries, or I can spend a dollar on sports tickets. I can't do both. Data is different in that... It doesn't follow the economics of scarcity. And I think that's one of the main drivers here. As you talk about 1.0, 2.0, and 3.0, 1.0 is locked in the application, 2.0 is locked in the model, 3.0, now we're opening it up so that the same data can be shared, it can be evolved, it can be copied, it can be easily transformed. But the big issue is we have to sustain overall coherence of it. Security has to remain in place. We have to avoid corruption. Talk to us about some of the new demands given, especially that we've got this more data, but more users of that data. As we think about evidence-based management, where are we going to ensure that all of those new claims from all of those new users against those data sources can be satisfied? So first, I truly like, this is a big nuance. It's not a small one. The fact that you have a better data actually means that you do a lot of things better. It doesn't mean that you do one thing better and you cannot do the other. I agree 100%. I actually contribute that for two things. One is more users, and the other thing is more ways to use the data. So the fact that you have better data, more data, big data, et cetera, actually means that your analytics is going to be better, right? But it actually means that if you are looking into hyper-automation and AI and machine learning and so on, suddenly this is possible to do because you have this data foundation that is big enough to actually support machine learning processes. And I think we're just in the beginning of that. I think we are going to see data being used for more and more use cases. We're in the integration business but in the data management business and we're seeing within what our customers are asking us to support, this huge growth in the number of patterns of how they want the data to be available, how they want to bring data into different places, into different users. So all of that is truly supporting what you just mentioned. I think if you look into the data 2.0 timeframe, it was the time that a single team that is very, very strong with the right tools can actually handle the organization needs. In what you described, suddenly self-service. Can every group consume the data? Can I get the data in both batch and real time? Can I get the data in a massive amount as well as in small chunks? These are all becoming very, very central. And very use case, but also user and context. We think about time, dependent. And one of the biggest challenges that we have is to liberate the data in the context of the multiple different organization uses. And one of the biggest challenges that customers have or that any enterprise has, again, evidence-based management, nice trend, a lot of it's going to happen. But the familiarity of data is still something that's not, let's say, broadly diffused. And a lot of the tools for ensuring that people can be made familiar, can discover, can reuse, can apply data, are modestly endowed today. So talk about some of these new tools that are going to make it easier to discover, capture, catalog, sustain these data assets. Yeah, so I think you're absolutely right. And if this is such a critical asset and data is, and we're actually looking into more user consuming the data in more ways, it actually automatically create a bottleneck in how do I find the data? How do I identify the data that I need? And how am I making this available in the right place at the right time? In general, it looks like a problem that is almost unsolvable. I got more data, more users, more patterns. Nobody have their budget tripled or quadrupled just to be able to consume it. How do you address that? And I think Informatica very early have identified this growing need. We have invested in a product that we call the Enterprise Data Catalog. And it's actually the concept of a catalog or a metadata repository, a place that you can actually identify all the data that exists, is not necessarily a new concept. Oh, it's been around for years. Yes, but doing it in an enterprise unified way is unique. And I think if you look into what we're trying to basically empower any user to do is basically, we're all using Google, you type something and you find it. If you're trying to find data in the organization in a similar way, it's a much harder task. And basically the catalog, and Informatica Unified, Enterprise Unified Catalog is doing that, leveraging a lot of machine learning and AI behind the scenes to basically make this search possible. Make basically the identification of the data possible, the curation of the data possible, and basically empowering every user to find the data that he wants, see recommendation for other data that can work with it, and then basically consume the data in the way that he wants. I totally think that this will change the way IT is functioning. It is actually an amazing bridge between IT and the business. If there is one place that you can search all your data, suddenly the whole interface between IT and the business is changing. And Informatica is actually leading this change. So the catalog gives you line aside on all those data sources. What's the challenge in terms of creating a catalog and making it performant and useful? I think there are a few levels of the challenge. I chose the word Enterprise Unified Intelligent Catalog deliberately and I think each one of them is kind of representing a different challenge. The first challenge is the unified. There is technical metadata. This is the mapping and the processes that move data from one place to the other. Then there is business metadata. These are the definition the business is using. And then there is the operational metadata as well as the physical location and so on. Unifying all of them so that you can actually connect and see them in one place is a unique challenge that at this stage we've already completely addressed. The second one is Enterprise. And we're talking about enterprise metadata. It means that you want all of your applications. You want the application in the cloud. You want your cloud environments, your big data environments. You want actually your APIs. You want your integration environment. You want to be able to collect all of this metadata across the enterprise. So unified all the types. Enterprise is the second one. The third challenge is actually the most exciting one is how can you leverage intelligence so it's not limited by the human factor or by the amount of people that you have to actually put the data together, right? And today we're using a very, very sophisticated, interesting logarithms to run on the metadata and be able to tell you that even though you don't know how the data got from here to here, it actually did get from here to here. It's a dotted line. Maybe somebody copied it. Maybe something else happened. But the data is so similar that we can actually tell you it came from one place. So actually, let me see, because I think there's, I don't think you missed the step, but let me reveal a step that's in there. One of the key issues in the enterprise side of things is to reveal how data is being used. The value of data is tied to its context and having catalogs that can do, as you said, the unified, but also the metadata becomes part of how it's used makes that ability to then create audit trails and create lineage possible. You're absolutely right. And I think it's actually one of the most important things is to see where the data came from and what steps did it go to. It's also one other very interesting value of lineage that I think sometimes people tend to ignore is who else is using it? Who else is consuming it? Because that is actually a very good indicator of how good the data is or how common the data is. The ability to actually leverage and create this lineage is a mandatory thing. The ability to create lineage that is inferred and not actually specifically defined is also very, very interesting. But we're now doing things that are, I think, really exciting. For example, let's say that a user is looking into a data field in one source and he is actually identifying that this is a certain specific ID that his organization is using. Now we're able to actually automatically understand that this field actually exists in 700 places and actually leverage the intelligence that he just gave us and actually ask him, do you want it to be automatically updated everywhere? Do you want to do it in a step-by-step guided way? And this is how you actually scale to handle the massive amounts of data. And this is how organizations are going to learn more and more and get the data to be better and better the more they work with the data. Now, Ronan, you have hard news this week, right? Why don't you update us on what you've announced? So I think in the context of our discussion, Informatica announced here actually today, this morning, in Strata, a few very exciting news that are actually helping the customer go into this data journey. The first one is basically supporting data big data across multi-clouds. The ability to basically leverage all of these great tools, including the catalog, including the big data management, including data quality, data governance, and so on, on AWS, on Azure, on GCP, basically without any effort needed. We're even going further and we're empowering our user to use it in a serverless mode where we're actually allowing them full control over the resources that are being consumed. This is really, really critical because this is actually allowing them to do more with the data in a lower cost. I think the last part of the news that is really exciting is we added a lot, a lot of functionality around our Spark processing and the capabilities of the things that you can do so that the developers, the AI and machine learning can use their stuff. But at the same time, we actually empower business users to do more than they ever did before. So kind of being able to expend the amount of users that can access the data, one in a more sophisticated way and one in a very simple, but still very powerful way, I think this is kind of the summary of the news. And just a quick follow-up on that, if I understand it, it's your full complement of functionality across these clouds. Is that right? You're not neutering. That is absolutely correct, yes. And we are seeing definitely within our customers a growing choice to decide to focus their big data efforts in the cloud. It makes a lot of sense. The ability to scale up and down in the cloud is significantly superior, but also the ability to give more users access in the cloud is typically easier. So I think Informatica have chosen as the market we're focusing on enterprise cloud data management. We talked a lot about data management. This is a lot about the cloud, the cloud part of it. And it's basically a very, very focused effort on optimizing things across clouds. That's cloud is critical. Obviously that's how a lot of people want to do business. They want to do business in a cloud-like fashion, whether it's on-prem or off-prem. A lot of people want things to be off-prem. Cloud's important because it's where innovation is happening in scale. Ronan, thanks so much for coming on theCUBE today. Yeah, thank you very much. And I did learn something. Oil is not one of the terms that I'm going to use for data in the future. I'm going to do something different. It's good. And I also, my other takeaway is in that context, being able to use data in multiple places, usage is a proportional relationship between usage and value. So thanks for that. Excellent. Happy to be here. And thank you everybody for watching. We will be right back right after this short break. You're watching theCUBE at hashtagcubenyc. We'll be right back.