 We're here with James Markarian, who's the executive VP at Informatica, welcome. Thanks for coming on the queue. Hi, thanks for having me, guys. You are non-stop, very impressive. Oh, this is the beginning. This is the beginning. All right, talk to us tomorrow at 4 p.m., but... This is going to be a walk in the park. Yeah, so this is, let's see, the third Hadoop world. Now, you guys were probably here in some way, shape, or form last year, but you didn't have a big presence at the event, right? Right, that's right. So this is your first year sponsoring, is that correct? That's correct, that's correct. You know, you think of Informatica, you know, 20-plus-year-old company, doing a lot of hard data problems with the structured data world, and now you're coming into this big data world. Tell us about, you know, what's that all about? What's going on there with Informatica? Great, thanks. So first of all, it's always very scary to me when people talk about us as a 20-year-old company because I remember when we first started out, when I first joined about 13 years ago, we were this tiny little... I remember that too. Yeah, we were a tiny little nothing that nobody asked about for advice about anything, and now we have all these customers coming to us saying, hey, what should we do about Hadoop and what can Informatica do for us? So this new world of unstructured data processing is pretty interesting and new, and I think we saw some statistics in the keynote the other day this morning that said that a lot of the instances that we're seeing with Hadoop aren't necessarily terribly large, but what they're doing are interesting new things with unstructured information. We have customers that are doing things with social media and lots of blog information, and it wasn't necessarily that you couldn't do these types of things in traditional data warehousing environments or traditional database environments, but it just was either economically infeasible or it was actually just plain difficult to get the types of analytics built that you wanted to. So we think Hadoop is both about big data, of course, but also exciting for unstructured data. So you've made some announcements leading up to the event. I know H-Parsar was one of the announcements we wrote about it. Was that your first foray into this whole Hadoop space? Talk about that a little bit. Yeah, we've been doing a few things around Hadoop. The H-Parsar was only the most recent announcement that we've had in the Hadoop space. We do have some other components that are out there. H-Parsar is kind of an especially cool thing, though, in my view. A lot of what we see happening. Two ETL geeks out there. Yeah. Well, let me give you a chance to be convinced. So what we're seeing on Hadoop, of course, is a lot of processing of unstructured information. And what we're seeing from the developer side is people are writing a lot of hand code, which is very inefficient. Not only the first time you do it, but it's difficult to maintain, difficult to keep up with the changes that are happening in all the different formats, whether it's Omniture logs, JSON formats, XML documents, Swift, HIPAA documents, anything you can think of. And so what this H-Parsar technology allows you to do is really quickly tease apart the different components of these unstructured documents into the components that you want to analyze. And you can access this functionality from Hive, from PIG, and from MapReduce. So it's very quick to deploy, very fast-executing component, and really cuts down the time-to-value for Hadoop. So that is kind of cool. And that's a new territory for you guys. You come from the structured legacy. So talk about what's different in unstructured land and why you have credibility there. Well, I guess those are maybe two different questions. But I'd say unstructured has been around with us for a while. It's getting a lot of attention now. But as Informatica for a long time has been helping customers, like you heard, both Morgan Stanley and more recently, JP Morgan Chase with Larry Feinsmith, we've been helping them handle the processing of their various document formats. Swift processing, Excel spreadsheets, Word documents. So we're just taking some of that same expertise and bringing it to Hadoop. Was it 30,000 databases? Is that what it's? Yeah, that's right. It's a good time to be Informatica when you have customers out there with those sorts of footprints. Yeah, so we're just taking the expertise that we built up in the sort of non-Hadoop world, even around unstructured and bringing it to Hadoop. But I think what's interesting about Hadoop is you're seeing the real economics of information changing. And so we have customers that are doing things like mining their social media influence networks to drive business value. We're seeing customers that are mining things like their survey verbatims, seeing their mine sites like Yelp and Travelocity for feedback on their hotel experiences. And the interesting thing is that a lot of these problems combine unstructured processing with transaction processing because you wanna know who you should really be paying attention to. So I might post some random tweet about something. Should the company that I'm tweeting about actually care what I have to say about it? Did I actually stay at that property? Am I just retweeting effectively something that somebody told me that's not really verifiable? But when you can correlate unstructured information with an actual hotel stay and maybe even an actual hotel room, you bring those two worlds together and then you have really actionable information. That's what Informatica's helping us with. I moderated a panel with EMC down in the Bay Area, talked about big data and just kind of a business crowd. And you guys were on the panel and someone from Informatica, CTO was there. And the conversation was interesting because you had a lot of business people in the world in the room who Hadoop was new to them. So my question is a little bit more of kind of philosophical one. And Mike Olson talked about, this is not about just one island, right? And that Hadoop is not a unification kind of business model where everyone's going to unify. You guys are actually having a huge presence in the market. What are you seeing out there in the legacy side of it that's changing specifically? Because you guys have that business there. What's going on there? How do you see that? Well, I think, first of all, a lot of customers have questions about how Hadoop fits in with everything. And to some extent, my comment about Hadoop is it has to declare its major for really everyone to understand it because you even heard today there's guys that are focusing on the H base side. There are guys that are focusing on the analytics side. So is it a transaction processing system? Like some of the guys we're implying is it a purely analytics environment? So Informatica is really kind of seeing this like everyone evolve. I think that the most natural cases right now are around the analytics. They're not really a substitute for other things that are going on. We're seeing it as a complement to existing analytic environments. Hadoop doesn't yet support the full range of SLAs that you see with some of these traditional analytic environments, but it seems like it's getting there. So I think it's just early days with Hadoop and I'd say that a lot of the people that I talked to at the conference are still just trying to figure out what it is, what it means to them and where it fits in with their IT landscape. Maybe a major in analytics with a minor in transaction processing is that your sort of vision for the future? How do you see that shaping up? Well, I think that, you know, I wouldn't want to presuppose anything about Hadoop right now. I think that it kind of looks that way now. I think that there's nothing but green space or green field ahead of Hadoop. Yeah, very smart guys working on it. Yeah, I think when you look at how Hadoop got started, some of the early papers that came out of both Google and Amazon. Now, it reminded me a lot of how relational databases got really kick-started. So like back in the 70s, cotton date, write a paper, Larry Ellison and Bob Minor read the paper and it's like, hey, I'm reading about the future here. And when you read the seminal papers around Hadoop and MapReduce, it's like, hey, you get a feeling like you're really glimpsing the future of something. And I think like we didn't know exactly how relational databases would evolve at that point, you know, it's a little bit early to tell where Hadoop is going. I think it's going to have a healthy footprint in both transactions. I mean, you got to like the attendance here. 1,400 people packed house. I mean, they're going to need the Moscone center next. I mean, like, I mean, this demand. I mean, and it's not just geeks too. There's business people here, right? So you mentioned SLA. I mean, you have a lot of experience dealing with SLA. I mean, what are the key factors in the SLA kind of component of this that you're keeping an eye on that you guys are watching and developing? Yeah, there's a whole spectrum of things in the SLA that we care about, right? So you even heard Larry this morning in the keynote talk about some uptime and stability concerns. Like, there's some very basic elements of the SLA, like if the system's not available, or if you have to worry about data corruptions. Yeah, that's sort of telling you that we're in an early stage in the market. And also, like you have to start thinking about, well, how business critical is that information if there's some degree of tolerance for those sorts of outages? Then if you go all the way to the other end and talk about from an analyst's perspective, and you think about everything that went into OLAP processing. So relational provided analytics, but it wasn't good enough for a lot of business analysts. They couldn't slice and dice quickly enough. Couldn't really satisfy their fast twitch need for information. And Hadoop, again, is really good at relatively high latency, high scale analytics, but isn't necessarily an environment where this writing on top of vanilla Hadoop, you'd want to sit there and slice and dice information. And that's why we're seeing a lot of the tools build up to build effectively caching and other things on top of Hadoop to provide that sort of SLA for their users. And bring some structure to that Wild West. How about your relationship with Cloud Era? What is, what's going on there? And can you talk about that a little bit? Yeah, sure. So we have a partnership with Cloud Era that was announced several months ago. We have a certified integration with Cloud Era. We have a connector to HDFS, and are gonna be coming out with a connector to HBase, of course. And so we are seeing kind of where things go. And what you see with Hadoop is, it's this kind of big powerful engine, but despite Mike's comments, data doesn't originate in Hadoop. In a lot of cases, data doesn't terminate in Hadoop. And so Hadoop is just like one of many happy stops along the way for data. And Informatica wants to be a part of that process, getting data in and out of Hadoop. And also, you heard about the skills shortages around MapReduce. Everyone had their hiring pitch at the talks this morning. And what we're seeing from our customers is, hey, we don't necessarily want a whole army of MapReduce programmers. We have people that know Informatica. We have people that know SQL. Can they leverage those skills and have that actually be effectively our programming environment for our Hadoop environment? And so our answer to that has to be yes. Talk a little bit about that data origination comments that you made, because you mean data originates in devices and machines, and is that what you're talking about? Yeah, yeah, that's right. So where is data first transacted, right? So just for an example, since we're here a little bit north of the Occupy Wall Street movement, which also does some financial transactions down there, so we have customers that do things like if you look at our messaging business and ultra low latency messaging, we have customers that are transacting at the sub-microsecond level, right? So you have trades that go in and are measured in nanoseconds nowadays, not even microseconds. Those transactions happen very, very fast. They're originated in very high performance, trade execution systems, and then they're actually shipped out to analytics environments like Teradata and like Hadoop. So that data is, it's gonna be a very long time before that data gets originated in something like Hadoop. Like you can't have stability questions when you're taking down a billion trades a second, like in some cases. So that's like one tip of the iceberg. So that's what I talk about when I'm talking about where data really starts. Yeah, yeah, yeah. I mean, is the Kumbaya big data we're here, we've called this Hadoop world, the Burning Man of big data. It's an organic community, it's growing, but the reality is these big businesses are running massively scaled environments. So that's what you're referring to, right? I mean, this is like- Yeah, that's right. Both on the transaction side and also on the analytics side. And it took a lot of time for a lot of smart people to build up transaction environments that could handle those types of volumes. I think it's always tempting to treat, there are forefathers as idiots, like, you know, what were those guys doing is so much easier now, but there's actually something, you know, pretty good technology behind that. This is what's interesting about Hadoop is that the use cases are so diverse, you don't really need to have one unified anything anymore. You could have specific Hadoop instances deployed in those use cases as applicable. Yeah, that's right. And I think also that just the appetite in IT for niche solutions for specific, you know, either persistence or analytics problems is so much higher than it was just a few years ago where you could do anything you wanted a few years ago as long as it was in an Oracle database. And now with the NoSQL movement, with the Hadoop movement, you're seeing, you know, this sort of indifference really to like, you know, brand name, persistent stores and the problems are so big and diverse now that people just want great technology solutions. Yeah, purpose-built. Yeah, yeah, that's the technology. It's not as, it doesn't feel anyway as incremental. I mean, there's some pretty radical things going on and companies are looking around saying, well, I better hop in or I'm going to lose competitive advantage. I mean, the numbers that JP Morgan and Chase showed today were pretty impressive in terms of cost savings and business advantage that they're driving. That's right. And so, and I think that, you know, ultimately have to think about the business drivers behind all of it. So, to some extent, IT is in a position where they have to innovate, they have to take chances, they have to apply the latest technologies if they're going to keep up with the pace of change in the industry. You know, as soon as like Lehman failed, you know, the questions that Larry brought up, like, what's our customer's exposure to Greece, for example, those are questions that are super hard to answer when you think about all the indirect ways in which financial instruments are linked together. And so, they simply have to innovate in order to be able to answer those questions, respond to their board, respond to their shareholders about what their exposure really is. You know, there's a lot of talk about, and I'd love to get your take on this, James, even though it might not be, well, it's related, but not central to what you guys are doing. There's a lot of talk about making it easier for business people to use Hadoop. And we're clearly not there yet, despite what some of the rhetoric is. So, what are your thoughts there and how important in that, how likely is that? Is that a reality in your view in the next, you know, five or seven years? Well, I think one of the principles behind Hadoop, you know, we've had MPP computing for a very long time, but it's always been like the true, like not only the geeks, but the geeks among geeks that have been able to harness the power of, you know, Cray machines or the connection machine back in the day. And it was like this big machine. Yeah, that's right. There was Danny Hillis, that's right. Where is he now? He's probably saying, hey, that was my idea. So, you know, now, you know, the thing that's really interesting about Hadoop is it's sort of putting a lot of compute power in lots of people's hands. So, it's kind of like a democratization of high power computing. And so, I think that there is always going to be this push to move that power higher up in the stack. And to the extent that technologies like, or languages like MapReduce and Pig and Hive, you know, simplify that, I think that's important, but it's not really going to be until, you know, I think like Datamir, for example, is an excellent way to do that. Because what all the business people want to use at the end of the day, they just want a spreadsheet. You have this nice spreadsheet on top of, you know, this 1000 node Hadoop cluster, and you can start getting real-time information across massive amounts of information. I think that's getting us close. The business user's always kind of an elusive thing. Everyone's been talking about this. Like, you know, I guess the question you would end up asking is, have we even solved that, you know, in the traditional BI world? There's certainly companies like ClickTech and Tableau that are saying, you know, no, we haven't. And I think, you know, we're a pretty long way from that happening in Hadoop as well. My last question, James, relates to the whole competitive environment. Last year it was, you know, one horse race, and now you're seeing some choices come about. What are you guys seeing there? What's your take on the whole, you know, Hortonworks enters, you got EMC and MapR doing their thing, and what's your angle on that? I'd be really, really concerned about the Hadoop community if that wasn't happening. You know, if you had one vendor that was the only champion for it, you know, I think that you'd have to look at it as an IT buyer and say, that looks really risky to me. So, you know, again, not that all roads lead back to relational, but, you know, you look at the foundations of, you know, relational and all the companies that sprang up. Well, IBM didn't exactly spring up, but, you know, they were there, Oracle, Informix, Ingress, Sybase, you know, many billion-dollar-plus companies were built around, you know, one very good idea. It was actually very good for the ecosystem. They all pushed each other. They made SQL better. They gave IT buyers confidence that this is a trend that's gonna be here to stay. Even if this company doesn't succeed, I have a standards-based solution. I can migrate my applications over to something else that, you know, will be there, and I think it's a really, really good sign for Hadoop. Great, my final question, that was a great comment, by the way. I think that's right on the money. My final question is we were talking with Kirk Dunn about the concept of cloud-washing, which everyone's been talking about, it's that cloud on it. Is there a Hadoop? We haven't heard that word Hadoop-washing yet, but, you know, it's getting to the point now where it's so frothy, VCs are putting up $100 million, clients are hearing about big data, you know, from all the, out in the external world. I mean, if there was cloud-washing, meaning slapping on the word, I mean Hadoop-washing, slapping on Hadoop on, hey, we're Hadoop, whatever. What would we look for? I mean, is that even existing, or is it just so diverse? Yeah, that's a good question. I hadn't really quite considered that. So I think, you know, first of all, you know, the cloud economics are such that, I think the cloud is inevitable to a certain extent. I think that the Hadoop value proposition makes it itself inevitable. I don't know if everyone's gonna be adding like H onto everything the way they add, you know, like I or this or that for cloud. I think that the thing that you would always look at is, is there a discernible customer business value that's been delivered for whatever, you know, the company is trying to convince you, you know, it has Hadoop applicability. If there's customer value, then it looks, you know, it looks and sounds legit. If there's not that discernible customer value, I think you do have to, you know, wonder whether this is just kind of marketing or whether it's something real. All right, James Mark-Harrion, thanks very much from Informatica for coming on theCUBE. Great story, great guest. Appreciate you taking time out with us. Great.