 This is George Gilbert. We're on the ground at Spark Summit. We're interviewing Sanjay Krishnamurti, who's senior EP and CTO at Informatica. So Sanjay, Informatica is in a unique position having most of the mainstream enterprises handle their data transformation, data preparation workflows with respect to Spark and other processing capabilities. How are you using those to bring the enterprise from the old operational database to data warehouse to now where we have greater volumes or greater number of data targets? What's the strategy? So Informatica has always provided a sort of tooling which appeals to higher level, provides a higher level of abstraction so the developers can develop their data pipelines in a graphical way and then deploy it to whatever runtime platform they have. In the past they have deployed it on databases itself when they use ELT, they use R Engine to do the data transformation and data movement and we also allow them to push that down to Hadoop or take that and deploy it to a cloud. So given that what we call Wipe, that data flow engine that can be morphed to run anywhere, we have the capability of taking the jobs that they already built out and then leverage that in a big data ecosystem. So one of the popular topics, at least with Hadoop in terms of getting across the chasm sort of beyond the data lake is data warehouse offload. Does your architecture lend itself to taking what would have been a lot of transformation, heavy workload that is executing on the data warehouse, moving that back to Hadoop? Yes, actually we have customers who are already doing that. They go through their data warehouse spend today and realize that maybe there's a lot of data that's really for staging purposes and they don't really need a data warehouse for that and it would be much more cost beneficial if they were to move that to Hadoop and now since they use Informatica for their mapping, they can take that mapping, repurpose that so rather than writing directly to a database, they write to Hadoop and a lot of the preprocessing happens in the Hadoop cluster. Once they preprocess the data then they can move it to a warehouse of their choice so we already have customers doing that and they can do that data warehouse optimization using our tools. So our architecture lends very well to that kind of pattern. As we understand it, that's the most common application for taking Hadoop into an economic role where it's not just kicking the tires. Now you've done some work with future-proofing the architecture so that the data preparation and transformation logic that's done in your graphical environment can actually run on many different underlying engines. Tell us about that. So once you define your data processing logic using our domain-specific language using the visual tool, what we call a mapping, we now take that mapping and we can either translate that into a high-query so if you're using, you know, that was before Spark came along, so you can translate that into high-query so essentially now using high infrastructure to do the processing. Now high was very batch-oriented and wasn't the most performant but gave you the best throughput. We can take that same logic that you've written and translate that into, we can take the same logic and translate that into Spark SQL or Spark and now, all of a sudden, without making any changes to your development artifacts, now you're running on Spark. So that way your future proved. Today is Spark, tomorrow it may be something else, maybe not, but, and you are protected from whatever that underlying infrastructure is so not only do you leverage what you've already done, you also leverage your skill sets so you don't need to learn something new. Now, we see more pipelines, we see more pipelines having the characteristic of being near real-time, not as a replacement to what was done with staging to Hadoop or the data warehouse, but to feed the interactive analytics. How does Informatica address those needs? Informatica has provided real-time capabilities in the past also so in our products today you can feed, take messages of message queue, do any kinds of transformations and maybe publish it to another message queue. We already have that capability or you can take changes that are happening in a database in real-time, do something and publish that as an event somewhere in a different event bus. We have that capability. We want to be able to take the same capability and map that to Spark real-time so once you have that, you can leverage Spark's real-time capabilities but again you use the same business logic that you've been using all along. It's very similar to what we've done with push-down to Hadoop. Same capabilities except now it's available to you in real-time. So would it be fair to say they'll be the traditional what was batch that's future-proofed and now there's the real-time that will be essentially future-proofed as well for multiple deployment targets. Exactly. Our architecture has always enabled that and now prior to Spark there was no real-time capability in Hadoop environment. Spark's streaming provides us that so now you can take your real-time workloads as well and push it down to Hadoop. Okay, great. So this is George Gilbert on the ground, Spark Summit. We'll be back shortly with our next guest.