 Live from San Jose, California, it's theCUBE, covering Big Data Silicon Valley 2017. Okay, welcome back everyone. We are live in Silicon Valley. For Big Data Silicon Valley, our companion showed a Big Data NYC in conjunction with Strata Hadoop, Big Data Week. Our next guest is Murthy Mathesit-Proxima with the Director of Product Marketing, Informatica. Did I get it right? Absolutely. Okay, welcome back. Good to see you again. Good to see you. Informatica, you guys had a mit-on earlier yesterday. Kicking off our event. It is a data lake world out there, and the show theme so far, theme has been, obviously beside a ton of machine learning, which has been fantastic. We love that, because that's a real trend. And IoT has been a subtext to the conversation and almost a forcing function. Every year, the Big Data world is getting more and more pokes and levers off of Hadoop to a variety of different data sources. So a lot of people are taking a step back and a protracted view of their landscape inside their own company and saying, okay, where are we? So kind of a checkpoint in the industry. You guys do a lot of work with customers, history with Informatica, and certainly, over the past few years, the change in focus, certainly in the product side, has been kind of interesting. You guys have what looks like to be a solid approach, a abstraction layer for data and metadata to be the keys to the kingdom, but yet not locking it down, making it freely available, yet providing the governance and all that stuff. Exactly. The myth laid it all out there. But the question is, what are the customers doing? I'd like to dig in if you could share just some of the best practices. What are you seeing? What are the trends? Are they taking the step back? How is IoT affecting it? What's generally happening? Yeah, I know, great question. So it has been really, really exciting. It's kind of been kind of a whirlwind over the last couple of years. So many new technologies. And we do get the benefit of working with a lot of very, very innovative organizations. IoT is really interesting, because up until now, IoT has always been sort of theoretical. Like you're like, what's the thing? You know, what's this internet of things? And IoT was always poo pooing someone else's department. Exactly. But we actually have customers doing this for now. So we've been working with automotive manufacturers on connected vehicle initiatives, pulling sensor data. We've been working with oil and gas companies and with connected meters and connected energy, manufacturing logistics companies, looking at putting meters on trucks and they can actually track where all the trucks are going. I mean, huge cost savings and service delivery kind of benefits from all this stuff. So you're absolutely right. IoT, I think, is finally becoming real. And we have a streaming solution that kind of works on top of all the open source streaming platforms. So we try to simplify everything, just like we have always done. We did that with MapReduce, with Spark, now with all the streaming technologies. You have a graphical approach where you can go in and say, well, here's what the kind of processing we want. You've laid out visually and it executes in the Hadoop cluster. I know you guys have done a great job of the product. Been very complimentary, you guys. And it's almost as if there's been a whole transformation within Informatica. I know you went private and everything, but a lot of good product shops there. You guys got a lot of good product guys. So I got to ask you the question. I mean, obviously IoT sometimes has a operational technology component. Usually they run their own stacks, not even plugged into IoT. So that's the whole of the story. We'll get to that in a second. But the trend here is you have the batch world. Companies that have been in this ecosystem here that are on the show floor at O'Reilly Media or talking to us on theCUBE, some have been just pure play batch related. Then the fashionable streaming technologies have come out. But what's happened with Spark, you're starting to see the collision between batch and real time called streaming or whatnot. And at the center of that's the deep learning. It's the IoT and it's the AI that's going to be at the intersection of these two colliding forces. So you can't have a one-trick pony here and there. You got to kind of have a blended, more of a holistic, horizontal, scalable approach. So I want to get your reaction to that. And two, what product gaps and organizational gaps and process gaps emerge from this trend? And what do you guys do? So three-part question. Okay, good. I'll try to cover all three. So first the collision and your reaction to that trend. Yeah, yeah. And then the gaps. Absolutely. So basically, I mean, you know Informatica. We've supported every type of kind of variation of these type of environments. And so we're not really a believer in it's this or that. It's not on-premise or cloud. It's not real time or batch. We want to make it simple no matter how you want to process the data, where you want to process it. So customers who use our platform for their streaming real-time or streaming solutions are using the same interface as if they were doing it batch. We just run it differently under the hood. And so that simplifies and makes a lot of these initiatives more practical because you might start with a certain latency and you think maybe it's okay to do it at one speed. Maybe you decide to change. You could be faster or slower. And you don't want to have to go through code rewrites and just starting completely from scratch. That's the benefit of the abstraction layer like you were saying. And so I think that's one way that organizations can shield themselves from the question because why even pose that question in the first? Why is it either this or that? Why not have a system that you can actually tune and maybe today you want to start batch and tomorrow you evolve it to be more streaming and more real-time? Help me on the other two. On the gaps. Always product gaps because again you mentioned that you're solving it. That might be an integration challenge for you guys or integration solution for you guys challenge opportunity, whatever you want to call it. Organizational gaps, maybe not set up for it and then process. I think it was interesting we actually went out to dinner with a couple of customers last night. They were talking a lot about the organizational stuff because the technology they're using in Formatica so that part's easy. So they're like, okay it's always this stuff around budgeting, it's around resourcing, skills gap and we've been talking about this stuff for a long time. But it's fascinating even in 2017 it's still a persistent issue. And part of what their challenge was is that even the way IT projects have been funded in the past you have this kind of waterfall-ish type of governance mechanism where you're supposed to say oh what are you gonna do over the next 12 months? We're gonna allocate money for that, we'll allocate people for that. Like what big data project takes 12 months? I mean 12 months you're gonna have a completely different stack that you're gonna be working with. And so their challenge is evolving to a more agile kind of model where they can go justify quick hit projects that may have very unknown kind of business value but it's just getting buy-in that hey something might be discovered here. This is kind of an exploration use case, discovery, a lot of this IoT stuff too. I mean people are bringing back the sensor data. You don't know what's gonna come out of that or what insights you're gonna get. Frequency velocity could be completely dynamic. Absolutely. So I think part of the best practice is being able to set aside this kind of notion of innovation where you have funding available for get a small cross-functional team together. So this is part of the other aspect of your question which is organizationally, this isn't just IT. You gotta have the data architects from IT. You gotta have the data engineers from IT. You gotta have data stewards from the line of business. You got business analysts from the line of business. Whenever you get these guys together, small core team, and people have been talking about this, right? Agile development, all that. It totally applies to the data world. And the cloud's right there too, so they have to go there. That's right, exactly. So you sort- So is the 12 month project model, the waterfall model, how do you want to call it? Maybe 24 months more likely I'd like it. But the problem on the fail side there is that when they wake up and ship, the world's changed. So there's kind of a diminishing return. Is that kind of what you're getting at there on the fail side? Exactly, it's all about failing fast forward and succeeding very quickly as well. And so when you look at most of the successful organizations, they have radically faster project life cycles. And this is all the more reason to be using some like Informatica which like abstracts all the technology away so you're not mired in code rewrites and long development cycles. You just wanna ship as quickly as possible, get the organizational buy-in that hey, we can make this work. Here's some new insights that we never had before. That gets you the political capital to the next project, the next project. And you just got to keep doing that over and over again. I always call that agile more of a blank check and a safe harbor because in case you fail forward, you's all up tailing forward, you keep your job. But there's some merit to that but here's the trick question for you. Now let's talk about hybrid on-prem and cloud. Now that's the real challenge. What are you guys doing there because now I don't want to have a job on-prem. I don't want to have a job on the cloud. That's not redundancy, that's inefficient. That's duplicates. Yes. So that's an issue. You guys tee it up there for the customer and what's the playbook for them? And people are trying to scratch their heads saying I want on-prem and Oracle got this right. Their earnings came out pretty good. Same code on-prem, off-prem. Same code base so workloads can move depending upon the use cases. How do you guys compare? At this exact same approach that we're taking because again, it's all about that customer shouldn't have to make the either or. So you're doing, so for you guys, interface and code, same on-prem and in the cloud. That's right. So you can run our big data solutions on Amazon, Microsoft, any kind of cloud Hadoop environment. We can connect to data sources that are in the cloud. So different SaaS apps if you want to sub-date out of there. We've got all the out-of-the-box connectivity to all the major SaaS applications. And we can also actually leverage a lot of these new cloud processing engines too. So we're trying to be the abstraction layer. So now it's not just about Spark and Spark streaming. There's all these new platforms that are coming out in the cloud. So we're integrating with that. So you can use our interface and then push down the processing to a cloud data processing system. So there's a lot of opportunity here to use cloud. But again, we want to make things more flexible. It's all about enabling flexibility for the organization. So if they want to go cloud, great. There's plenty of organizations that want to go cloud. That's fine too. So if I get this right, standard interface on-prem and cloud for the usability. Under the hood, it's integration points in clouds so that data sources, whatever they are, and through whatever it could be, Kinesis coming off Amazon into you guys, or Azure's got some stuff over there, that all works under the hood. Trans abstract from the user. That's right. Okay, so the next question is, okay, if to go that way, that means it's a multi-cloud world. You'd probably agree with that. Multi-cloud meaning I'm a customer, I'm going to have multiple workloads on multiple clouds. That's where it is today. I don't know if that's the end game and obviously all of this is changing very, very quickly. So I mean, Informatica, we're neutral across, you know, multiple vendors and everything. So you guys are Switzerland. We're the Switzerland, you know. So we work with all the major cloud providers and there's new ones that we're constantly signing up also. But it's unclear how the market will shift out. I mean, there's just so much information out there. I think it's unlikely that you're going to see mass consolidation. We all know who the top players are and I think that's where a lot of large enterprises are investing, but we'll see how things go in the future. We're a customer, where should customers spend their focus? Because this, you know, you're seeing the clouds and we're just commenting about Google Next yesterday with Amit and others that they're trying to be enterprise ready. You guys are very savvy in the enterprise and there's a lot of table stakes, SLAs to integration points. And so there's some clouds that aren't ready for prime time like Google for the enterprise. Some are getting there fast like Amazon, Azure's super enterprise friendly. I mean, they have their own problems and opportunities, but they're very strong on the enterprise. What do you guys advise customers and what are they looking at right now? Where should they be spending their time? Writing more code, scripts or tackling the data. How do you guys help them shift their focus? Yeah, yeah, definitely not scripts. That's about the worst thing you can do because for all the reasons we understand. Why is that? Well, again, we were talking about being agile. There's nothing agile about manually sitting there writing Java code. I mean, think about all the developers that were writing MapReduce code three or four years ago. Those guys, well, they're probably looking for new jobs right now and with the companies who built that code, they're rewriting all of it. So that approach of doing things at the lowest possible level doesn't make engineering sense. That's why the kind of abstraction layer approach makes so much better sense. So where should people be spending their time? It's really, the one thing technology cannot do is it can't substitute for context. So that's business context, understanding if you're in healthcare, there's things about the healthcare industry that only that healthcare company could possibly know and know about their data and why certain data structure the way it is. So business context is something that only that organization can possibly bring to the table and organizational context as you were alluding to before. Roles and responsibilities, who should have access to data? Who shouldn't have access to data? That's also something that can't be prescribed from the outside. It's something that organizations have to figure out. Everything else under the hood, there's like no reason whatsoever to be mired in these long code cycles and then you got to rewrite it and you got to maintain it. Automation is one level. Machine learning is a nice bridge between the taking advantage of either vertical data or specialty data for that context. But then the human has to actually synthesize it and apply it, that's the interface. I get that right? That progression? Yeah, yeah, absolutely. Well, and the reason machine learning is so cool and I'm glad you segway into that is that so it's all about having the machine learning assist the human, right? So the humans don't go away. We still have to have people who understand the business context and the organizational context. But what machine learning can do is in a world of big data, I mean inherently the whole idea of big data is that there's too much data for any human to mentally comprehend. Well you don't have to mentally comprehend it. Let the machine learning go through so we've got this unique machine learning technology that will actually scan all the data inside of Hadoop and outside of Hadoop and it'll identify what the data is. Because it's all just pattern matching and correlations and most organizations have common patterns to the data. So we've figured out all this stuff and we can say, oh, you've got credit card information here. Maybe you should go look at that if that's not supposed to be there. Maybe there's a potential violation there. So we can focus the manual effort onto the places where it matters. So now you're looking at issues, problems, instead of doing the day-to-day stuff. The day-to-day stuff is fully automated and that's not what or... So the guys that are losing their jobs, those Java developers writing scripts to do the queries. Where should they be focusing? Where should they look for jobs? Because I would agree with you that Java is always the mad-produced guys and all these script guys and the Java guys. I mean, Java has always been the bulldozer of the programming languages, very functional. But where those guys go, what's your advice for... We have a lot of friends, I'm sure you do too. I know a lot of friends of Java developers are awesome programmers. Where should they go? Well, so first, I'm not saying that Java's going to go away, obviously. But I think Java... And Java guys who are doing some of the payload stuff around some of the deep in the bowels of big data. Well, there's always things that are unique to the organization and custom applications. All that stuff is fine. What we're talking about is map-reduced coding. What should they do? What should those guys be focusing on? So it's just like every other industry you see, you go up the value stack. So if you can become more of the data governor, the data stewards, look at policy, look at how you should be thinking about organizational context. And governance is also a good idea. And governance, right? Governance jobs are just going to explode here because somebody has to define it. Technology can't do this. Somebody has to tell the technology what data is good, what data is bad. When do you want to get flagged if something is going wrong? When is it okay to send data through? That whoever decides and builds those rules, that's going to be a place where there's, I think, a lot of opportunity. Murthy, final question. We got a break. We're getting the hook signed here, but we got Informatica World coming up soon in May. What's going to be on the agenda? What should we expect to hear? What's some of the themes that you could tease a little bit to get people excited? Yeah, yeah. Well, one thing we want to really provide a lot of content around the journey to the cloud. I think we've been talking today too. There's so many organizations who are exploring the cloud, but it's not easy for all the reasons we just talked about. Some organizations want to just kind of break away, rip out everything in IT, move all their data and their applications to the cloud. Some of them are taking more of a progressive journey. So we've got customers who've been on the leading front of that. So we'll be having a lot of sessions around how they've done this, best practices that they've learned. So hopefully it's a great opportunity for both our current audience who's always looked to us for interesting insights, but also all these kind of emerging folks who are really trying to figure out this new world of data. Right, thanks so much for coming on theCUBE. Appreciate it. Informatica World coming up. You guys have a great solution. And again, making it easier for people to get the data and put those new processes in place. It's theCUBE breaking it down for big data SV here in conjunction with Strata Hadoop. I'm John Furrier. More live coverage after this short break.