 Live from San Jose, California, it's theCUBE, covering Big Data Silicon Valley 2017. Okay, welcome back, everyone. We are here live in Silicon Valley for theCUBE's Big Data coverage, our event, Big Data Silicon Valley, also called Big Data SV, a companion event to our Big Data NYC event, where we have our unique program in conjunction with Strata Hadoop. I'm John Furrier with George Gilbert, our Wikibon Big Data analyst. We have Basil Farikwi, who's the Solutions Marketing Manager at BMC Software. Welcome to theCUBE. Thank you, great to be here. So we've been hearing a lot on theCUBE about schedulers and automation, and machine learning is the hottest trend happening in Big Data. So we're thinking that this is going to help move the needle on some things. Your thoughts on this, the world we're living in right now, and what BMC is doing at the show? Absolutely, so scheduling and workflow automation is absolutely critical to the success of Big Data projects, and this is not something new. Hadoop is only about 10 years old, but other technologies that have come before Hadoop have relied on this foundation for driving success. If we look at the Hadoop world, there's what gets all the presses, all the real-time stuff, right? But what powers all of that underneath it is a very important layer of batch. If you think about some of the most common use cases for Big Data, if you think of a bank, they're talking about fraud detection and things like that, let's just take the fraud detection example. Detecting an anomaly of how somebody is spending, if somebody's card card is used, which doesn't match with their spending habits, the bank detects that, and they'll maybe close the card down or contact somebody. But if you think about everything else that has happened before that as something that has happened in batch mode, for them to collect the history of how that card has been used, then match it with how all the other card members use the cards. When the cards are stolen, one of those patterns, all that stuff is something that is being powered by what's today known as workload automation and the past has been known by names such as job scheduling and batch processing. This is in the systems business, as everyone knows what's scheduled, is our compiler, all this computer science stuff. But this is interesting, now that the Data Lake has become so swampy and people call it the Data Swamp, people are looking at moving data out of Data Lakes in ways into real time, as you mentioned, but it requires management. So there's a lot of coordination going on. This seems to be where most enterprises are now focusing their attention on is to make that data available. Hence the notion of scheduling and workloads because the use cases are different. Is that, am I getting it right? Yeah, absolutely. And if we look at what companies are doing, I mean, every CEO and every boardroom, there's a chart for digital transformation for companies, right? And it's no longer about taking one or two use cases around big data and driving success. Data and intelligence is now at the center of everything a company does, whether it's building new customer engagement models, whether it's building new ecosystems with their partners, suppliers, back office optimization. So when CIOs and data architects think about having to build a system like that, they are faced with a number of challenges. You know, it has to become enterprise ready, it has to take into account governance, security and others. But if you peel the onion just a little bit, what architects and CIOs are faced with is, okay, you've got a web of complex technologies, legacy applications, modern applications that hold a lot of the corporate data today, right? And then you have new sources of data like, you know, social media, devices, sensors, which have the tendency to produce a lot more data. So first things first, you've got an ecosystem like Hadoop, which is supposed to be kind of the nerve center of the new digital platform. You've got to start ingesting all this data into Hadoop, right? And this has to be an automated fashion for it to be able to scalable. But this is the combination of streaming and batch. Correct. This is now the seems to be the management, holy grail right now, nailing those two. They might get that. Absolutely, so people talk about, you know, in technical terms, the speed layer and the batch layer and both have to converge for them to be able to deliver the intelligence insight that the business users are looking for. Would it be fair to say it's not just convergence of the speed layer and batch layer in Hadoop, but what BMC brings to town is the non-Hadoop parts of those workloads, like whether it's batch outside Hadoop, or if there's streaming, which sort of pre-Hadoop was kind of more nichey. But that's why we need this overarching control, which is, you know, if it's not a Hadoop-centric architecture. Absolutely, so you know, I've kind of said this for a long time that Hadoop is never gonna live on an island on its own in the enterprise, right? And with the maturation of the market, Hadoop has to now play with all the other technologies in the stack, right? So if you think about just take data ingestion for an example, you've got ERPs, you've got CRMs, you've got middleware, you've got data warehouses, and you have to ingest a lot of that in. Where ControlM brings a lot of value and speeds up time to market is that we have out-of-the-box integrations with a lot of the systems that already exist in the enterprise, such as ERP solutions and others, right? And virtually any application that can expose itself through an API or a web service, ControlM has the ability to automate that ingestion piece. But this is only step one of the journey, right? So you've brought all this data into Hadoop, and now you've got to process it. And the number of tools available for processing this is growing at an unprecedented rate. You've got MapReduce, which was a hot thing just two years ago, and now Spark has taken over. So ControlM, about four years ago, we started building very deep native capabilities in the Hadoop ecosystem. So you've got ingestion that's automated, then you can seamlessly automate the actual processing of the data using things like Spark, Hive, Peg, and others. And the last mile of the journey, the most important one, is then making this refined data available to systems and users that can analyze it, right? And often Hadoop is not the repository where analytic systems sit on top of. It's another layer where all of this has to be moved. So if you zoom out and take a look at it, this is a monumental task. And if you use siloed approach to automating this, this becomes unscalable. And that's where a lot of the Hadoop projects often... Crash and burn. Crash and burn. It's the scalability. They crash and burn. So ControlM has been around for 30 years. Oh, by the way, just to add to the crash and burn piece, the data lake gets stalled there because that's why the swamp happens. Because they're like, okay now, how do I operationalize this and scale it out? Right, if you're storing a lot of data and not making it available for processing and analysis, then it's of no use. And that's exactly our value proposition is that this is a problem we haven't solved for the first time. We did this as we have seen these waves of automation come through for the main frame time when it was called batch processing, then it evolved into distributed client server where it was known as more as job scheduling. And now... So BMCs have seen this movie before? Absolutely. All right, so let's take a step back, zoom out, step back, go, go, look at the, hang out in the big trees, look down on the market. Data practitioners, okay? Big data practitioners are out there right now are wrestling with this issue. You've got streaming, real-time stuff, you get a batch, it's all coming together. What is ControlM doing great right now from the practitioners that you guys have solved? Because there are a zillion tools out there, but people are human, every hammer looks for a nail, right? So you have a lot of change happening at the same time, but yet these tools, what is ControlM doing to really win? Where are you guys winning? Yeah, so we're adding a lot of value for our customers is helping them speed up the time to market and delivering these big data projects and delivering them at scale and quality. Give an example of a project. So Malwarebytes is a Silicon Valley based company. They are using this to ingest and analyze data from thousands of endpoints from their users. That's their Lambda architecture, right? Their Lambda architecture, I won't steal their thunder, they're presenting tomorrow at 11, 1130 tomorrow. Another example is a company called Navistar. Now here's a company that's been around for 200 years. They manufacture heavy duty trucks, 18 wheelers, school buses, and they recently came up with a service called On Command, okay? So they have a fleet of 160,000 trucks that are fitted with sensors. They're sending telematic data back to their data centers, okay? And in between that stops in the cloud. So it gets to the cloud from the cloud. So they go to the cloud for upload and backhaul, basically, right? Correct. So it goes to the cloud from there to ingested inside their Hadoop systems and they're looking for trends to make sure that none of the trucks break down because a truck that's carrying freight breaks down is hits the bottom line right away. Yeah. But that's not where they're stopping. In real time they can triangulate the position of the truck, figure out where the nearest dealership is, do they have the parts, when to schedule a service. But if you think about it, the warranty information, the parts information is not sitting in Hadoop. That is sitting in their mainframes, SAP systems, and others. And ControlM is orchestrating this across the board from mainframe to ERP and into Hadoop for them to be able to marry all this data together. And how do you get back into the legacy? That's because you have the experience there. Is that part of the product portfolio? That is absolutely part of the product portfolio. So we started our journey back in the mainframe days and as the world has evolved, the client server to web and now mobile and virtualized and software-defined infrastructures, we have kept pace with that. And that's exactly why. So you guys have a nice end-to-end view right now going on and certainly that example with the trucks highlights IoT right straight there. I mean, you guys have a clear line of sight on IoT. That would be the best measure of your maturity is the breadth of your integrations. Absolutely, and we don't stop at what we provide just out of the box. We realize that we have 30 to 35 out of the box integrations, but there are a lot more applications than that. We have architected ControlM in a way where it can automate data nodes on any application and any database that can expose itself through an API. So that is huge because if you think about the open source world, by the time this conference is going to be over, there's going to be a dozen new tools and projects that come online. And that's a big challenge for companies too. How do you keep pace with this? Well, I think people are starting to squint through the fashion aspect of open source, which I love by the way, but it does create more diversity, but some things become fashionable and then get big time traction. Look at Spark. Spark was beautiful, right? That would came out of the woodwork. George, you're tracking all the fashion. What's the hottest thing right now in open source? It seems to me that trying to, we spent five plus years building data lakes. And now we're trying to take that data and apply the insights from it to applications. And really ControlM's value add, my understanding is we have to go beyond Hadoop because Hadoop was an island or a data lake, but now the insights have to be enacted on applications that go outside that ecosystem. And that's where ControlM comes in. Yeah, absolutely. And we are that overarching layer that helps you connect your legacy systems and modern systems and bring it all into Hadoop. And the story I tell when I'm explaining this to somebody is that you've installed Hadoop day one, great. Guess what? It has no data in it, right? You've got ingest data and you have to be able to take a strategic approach to that because you can use some point solutions and do scripting for the first couple of use cases. But as soon as the business gives you the green light and says, well, you know what? We really like what we've seen. Now let's scale up. That's where you really need to be, take a strategic approach and that's where ControlM comes in. So let me ask then, when if the bleeding edge right now is trying to operationalize the machine learning models that people are beginning to experiment with just the way they're experimenting with data lakes five years ago, what role can ControlM play today in helping people take a trained model and embed it in an application so it produces useful actions, recommendations. And how much custom integration does that take? So if we take the example of machine learning, right? So if you just kind of peel the onion of machine learning, you've got data that needs to be moved, that needs to be constantly evaluated and then the algorithms have to be run against it to provide the insights, right? So this in itself is exactly what ControlM allows you to do is ingest the data, process the data, let the algorithms process it and then of course move it to a layer where people and other systems, it's not just about people anymore, it's other systems that will analyze the data. And the important piece here is that we're allowing you to do this from a single pane of glass and being able to see this picture end to end and all of this work is being done to drive business results, generating new revenue models in the case of Navistar, allowing you to capture all of this and then tie it to business SLAs, that is one of the most highly rated capabilities of ControlM and our customers. This is Basile, this is the cloud equation we were talking last week at Google Next, a combination of enterprise readiness across the board. The end to end is the picture. And that's seeing you guys are in good position. Congratulations and thanks for coming on theCUBE, really appreciate it. It's theCUBE breaking it down here at Big Data World. This is the trend. It's an operating system world in the cloud, Big Data with IoT, AI, machine learning, big themes here breaking out at early on here at Big Data SV and conjunction with Strata Hadoop. More, right through the short break.