 From New York, it's theCUBE. Covering Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, NVIDIA, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and Peter Burris. Welcome back, everybody. This is theCUBE, the worldwide leader in live tech coverage. Joe Goldberg is here. He's innovation evangelist at BMC Corporation. Joe, welcome back to theCUBE. It's good to see you again. Great to be here as always. Yeah, so Big Data Week, Big Data NYC, Strada plus the Dube, and you know, it's got to be here, right? This is where all the action is. This is the big one. This is absolutely the big one. So Big Data is one of your major initiatives. Talk about what BMC is doing inside of Big Data. Sure, so BMC as a company is really, traditionally we've been in the IT management space, so we have a variety of solutions that help customers manage their technology assets. You know, lately in talking about how customers are going through digital transformation and the emphasis on, you know, that sort of aspect of upgrading and modernizing their technology stacks, their approach and how they deal with technology, BMC is now focusing on a number of what we call DEM or digital enterprise management initiatives. And you know, these are areas that customers have told us where the solutions we currently have and the directions for those solutions are areas where we can help customers modernize, you know, assist them with that sort of journey along their digital transformation. Certainly one of them is Big Data. And you know, that's what we're here to talk about today. And specifically within the Big Data initiative, there are a number of BMC solutions, but ControlM is the one that I specifically talk about and that's my area of focus. And BMC dove into this space a few years ago now and really trying to help manage all the complexity of the scoops and the hives and the flumes and all, you know, pick your project of the day. But so take us back to that and give us your perspective on sort of the early days of a dupe, how it's evolved and what your role has been in that evolution. Sure, so actually the very first strata that I came to was in New York. It was still in the Hilton. I don't know if you remember those days when. Of course, we were there. People were like, you know, cheek to jowl. The place was like overflowing. I think it was the only industry event that I've ever attended where they actually turned people away. You know, they sold out and they said, no more. We can't get any more people in the door. So in the early days when we got involved, although Hadoop has been like, you know, I think we're celebrating the 10th year of Hadoop, it really kind of hit the commercial mainstream about three to four years ago. And that's when we got involved. So we consider ourselves to be really early adopters and early players in this industry. So we got involved, I think in probably the best way you can get involved in a business like that, which is our customers began to ask us. So ControlM as a product or as a solution manages business application, workflows and batch. And Hadoop One and previous versions of Hadoop with primarily MapReduce as the mechanism for running work was 100% batch. And so as customers began looking at that technology and sort of embedding and absorbing it into their enterprise, it was very natural for them to try and figure out how to deal with that batch. And if they were ControlM users already in their traditional environment to be able to leverage that technology. So that's how we got into the business. We really, we introduced to it by our customers when we began to learn about it. We saw this was an amazing opportunity. And so that's how we started. As Hadoop has matured and moved to Hadoop 2 with Yarn and sort of multi-tenancy and multi-application capability, really I think the trajectory of the technology much as in the traditional world has shifted towards transaction, real-time streaming, things that are I think more, let's say attractive, have more hype, but there's still a tremendous amount of batch. And as organizations really kind of mature along again that big data journey, they see that although, excuse me, although they may sort of get involved in early use cases, see how great streaming and real-time data ingestion and that kind of analytics could be when they actually sort of codify it and bring it into the environment, a lot of that is going to become batch. We actually had a presentation from one of the analysts that I thought was really great that talked about kind of this emphasis on technology where they talked about on-time, not real-time or batch, but rather on-time, which I think is a great way to define how technology happens and where control in place because a lot of the stuff that happens, although aspects of it may be data that's streaming in real-time, nevertheless there's still a bunch of stuff that has to happen via other or deferred means and that's where really control comes in. Of course, what we bring into this that distinguishes us from what is in the ecosystem already is the fact that we're coming at this from the enterprise. So we already support all of the traditional technologies, all of the traditional applications like ERPs and relational databases and so we bring all of that maturity that we have been developing over years and years and have applied that to big data and Hadoop. Okay, so interesting, on-time. On-time is like before you lose the customer, right? And that may not be real-time. Exactly, sure. Good enough. Okay, and then Spark, so you enter when there's this and there's still this collection of tools and projects and then Spark comes in, how has Spark sort of changed the business and what has it meant to BMC? So, we've added support for Spark as well and again I think Spark is a really good indication of the kind of sort of the dichotomy of processing that I'm talking about. There is a lot of Spark streaming that's real-time stuff but a lot of it winds up getting deferred some way or you may do a lot of development for your insights that you iterate in real-time but eventually when you want to deliver that to let's say business consumers or even data scientists, they don't necessarily want to be able to run all of this in real-time all the time and so some of it becomes batch. So, we were very early in providing support for Spark and so Control-M does support Spark, both Spark SQL and Spark streaming. I think we were actually providing support for it before the community or the open source ecosystem tool provided support for it. So, that's been one of our goals is to at least match and we've actually been outpacing the innovation in our particular discipline in the open source tools that are available in the market. But we see this really as kind of an evolution. Does richer tools, more powerful tools? And I think it's kind of interesting that tools are kind of matching the trajectory of data and volume. So, as people are dealing with more and more data, their tools are improving, but I think that the general balance is very similar still in terms of the business challenges and then being use case driven. I think one of the things that's most interesting that I have seen is that there's a tremendous amount of hype, certainly mind share around insight and new data sources, a lot of new things that are being discussed in the big data space, but sort of on the ground, a lot of companies, certainly enterprises or traditional enterprises are taking advantage of big data technology to really kind of modernize their existing sort of data management and data infrastructure. And so we're seeing a lot of activity rather than trying to pull in social media and machine data that they haven't processed before. A lot of companies are just simply trying to sort of expand the aperture of their current data warehouse and data management infrastructure by using Hadoop and big data technologies. And so it's kind of somewhat evolutionary and shockingly a lot of this is driven by cost, of course. Well, so let's talk about it, we built upon that for a second. So you mentioned that it's entered in the mainstream and it certainly has, there's still an enormous amount of experimentation going on. Absolutely. So if you take a look, different customers need different things out of an application control technology or an application management technology in the Hadoop world. There's some companies that need to be able to schedule and start and stop things with significant precision because it's tied back to other jobs that the shop has to run. But there's also examples, I was talking to the CIO last night, there's also examples where people are still experimenting with some of these technologies and trying to understand in pilot phase how to apply them, how to get value out of them. And sometimes by virtue of the very nature of the technologies, you end up with runaway jobs that consume enormous amounts of resources during a pilot, during experimentation for a phase. And there you need a totally different type of control of the application, very different type of control over the flows. You need to be able to understand how it's consuming resources, what caps or what thresholds to put on it. How do you see this tension between maturity and experimentation impacting how companies are utilizing application control technologies? So it's very interesting that for me, I've been around technology for such a long time. Sometimes, and it's somewhat, some of my audiences find it shocking that I compare Hadoop to what we've seen in the past in other environments, like the mainframe. If you take Yarn, for example, as a resource manager, and really whether it's Yarn or Mesos, and there's sort of the rise, it's the job of those components to primarily manage resources and to manage across the entire environment. And we interact with it. The reason I mentioned the sort of similarity with past technologies in the mainframe world, even today, and control M started on the mainframe. There's a component called WLA, which is Workload Manager, whose job it is to manage how much memory, how you fence off resources for more critical jobs based on priorities, and there's a job entry subsystem. We interface with that. So we manage at the business level and we provide controls that really can kind of map to those resources and the tension that you describe. And we do that on the basis of SLAs or what are the business goals that you're trying to achieve? What kind of concurrency do you want to be able to achieve? What application is more important than the other? How do you want to be able to determine either success or failure? How do you want to do predictive warning and alerting based on how things are running? And we are relying on the underlying technology that actually have management over the resources to give us information about what's going on. So we provide this kind of layer at a business and a control level that can interact with those, but we don't actually, I was going to say, we don't have to, and that's really kind of the benefit. We don't have to get into the details of having to manage that. But you do have to turn the data that's coming out of those other subsystems into something that makes sense to a business. So for example, to see how I was talking last night was shocked when a couple months ago she received a $75,000 bill for an experiment. And while at the resource level they had all kinds of results that they could go back and figure out why it happened, what she had hoped for is that she could actually put a policy on the whole job and say at this point in time, stop this. And so there's a lot of that going on as people try to learn about these technologies. And the thing is, so that's another great example that we actually have a component or a set of functions in our product code, workload policy, where you can set out these kinds of thresholds and limits and determine how resources should be applied based on business priorities. And all of that is because the same kinds of requirements that you're describing have existed in the past in the traditional world. So this is kind of the mantra that we keep repeating, that yes, it's a brand new world and this is radically new technology, but a lot of the sort of management principles are very, very similar. And so that's kind of our benefit. We have been working on this, we've been providing it, we've been maturing our capabilities to manage within that kind of world such a long time. And when we provide support for a new technology like Yarn or Hadoop, all of the work that we have done over all of those years can be applied. And so we as a solution also have this constant sort of tension of how do we deal? What are the differences in this new environment? Well, how do we have to manage or how do we have to evolve our solution to make sure that we can take advantage of it? So we can't just simply say, well, we've done this in the past so we don't have to do anything for Hadoop. There definitely are a lot of things that we have done and over the three or so years that we've been in this space, we have actually come up with about three new versions and about six sort of fixed pack enhancements where we're constantly revving our solutions to provide better support, more granular support, more integration with Yarn and being able to interact with it and get more metrics out of it. To be able to take those metrics then and apply, elevate them to a business level, to apply them to workload policy so how you can manage the environment. We've built support for a variety of sort of these idiosyncratic applications like most recently we've just announced support for Impala which has its own sort of security requirements and how you run in a batch environment. So we have, I think we certainly have been sort of tracking the technology, modernizing our own solution but drawing on that rich background to be able to provide what we think is an extremely rich set of capabilities to manage in this complex environment. And despite the newness of the technology the problem statement that customers are articulating to you is substantially similar is what you're saying and that is what? Well, it's the need to be able to ultimately meet their business goals so to be able to abstract the technology to a business level. Obviously there are people that are working with running a particular task or a process or a job but ultimately the business needs work done. They have business priorities. They need to be able to as early as possible identify potential problems. They need to be able to shift priorities and ensure that resources are available for the work that's more important. Get the work done. That ultimately is what they need and to be able to have all of the or as much insight without requiring deep technical knowledge. So those are among the tense or the tensions that we have to try and manage to be able to provide enough granular capability to technicians to do the work they need to do but to be able to elevate and abstract that to business or non-technical users or people who don't want to become experts in managing technology but just want to be able to reap the benefits of the results. Okay, good. We're out of time, Joe, but to give you the last word, put a bumper sticker on the second. D, I haven't thought about that. What's the bumper sticker? The last thing I think that we're seeing now is that among the complexities of this is that organizations not only need to be able to achieve this kind of level of management and automation but they have to be able to do this in sort of an end-to-end software development lifecycle manner and a variety of the things that we've been developing is enabling the consumption of our technology which in the past has been kind of traditional IT ops, GUI-oriented by developers and DevOps and continuous integration, continuous development teams. Again, to be able to achieve that level of business agility to be able to manage their batch and their workflows across a very complex and ever increasingly heterogeneic environments. What's that hell of a bumper sticker? Including the cloud, which we didn't talk about. Well, that's why I said excessively heterogeneic because nowadays it's like public and private, multiple vendors, so what used to be heterogeneous was like mainframe and Linux and Windows, now it's Google Cloud and then AWS and OpenStack and internal stuff as well as all the other stuff and all the virtualization and everything. Can I take a crack at it? Sure. The more deeply you embed technology in the business, the more you need business control over technology. Well, I wish I had said that. Thank you very much, excellent. All right, we'll leave it there. Joe Goldberg, thanks very much for coming on theCUBE. Thank you, thanks for having me. Keep right there, everybody. We'll be back with our next guest. This is theCUBE, we're live from NYC, right back.