 Live from the San Jose Convention Center, extracting the signal from the noise, it's theCUBE, covering Hadoop Summit 2015, brought to you by headline sponsor Hortonworks, and by EMC, Pivotal, IBM, Pentaho, Teradata, Syncsort, and by Atunity. Now your hosts, John Furrier and George Gilbert. Okay, welcome back everyone. Live in Silicon Valley, day three of Hadoop Summit 2015. This is theCUBE, our flagship program. We go out to the events and extract the signal from the noise. I'm John Furrier, the founder of SiliconANGLE. Joining my co-host, George Gilbert, big data analyst at wikibond.com. Our next guest, John Finnelli, VP of Marketing at Data Touring. Welcome to theCUBE. Thanks, thanks for having me. Great to see you. First time on theCUBE. We've known each other for a couple of years now, and formerly Citrix, now VP of Marketing at Data Touring. Great to see you, Industry Vet. Talk about Data Touring. Because obviously streaming's huge. You guys are in that business. Spark is like the prom queen. Everyone loves Spark. Half-baked, people still using it. Got some traction, Hadoop certainly shown the way. The big data, storing it, but getting the extraction of the data is key. So tell us about BitTouring. I mean, DataTouring, DataTouring. Not that we can use a BitTouring, well, of course we all use. Get our free musics and downloads. Sure, so at DataTouring, our product is DataTouring RTS, and it's a unified engine for doing both stream and batch processing. And as you know, one of the key issues is trying to get value out of Hadoop. And what we do is we help customers and companies get reduced the time to insight, and to automate and take actions once they have those insights. So it's really about being operational and either driving greater revenue or greater operational efficiency. So I got this little prop here. Lego blocks, which is great, whoops. That's not the piece. Explain the product strategy, just timetable, shipping, versions, and then how you guys fit in the growing ecosystems for analytics. We call FastData, and obviously Spark is certainly out there. It's in the streaming kind of area. Sure, so DataTouring shipped to version 1.0 last June at Hadoop Summit. We shipped 2.0 in January, and then 3.0, which we announced on Friday, will ship in July. And what we provide is a full suite that allows an entire enterprise to take a benefit from Hadoop. So we have a number of tools for the developer, also for the operations person, the data scientist, as well as the end user. As you mentioned, Spark is pretty interesting. There's a lot of hype around Spark, previously a lot of hype around Storm. And both of these technologies are interesting and they serve certain use cases, but they're not really enterprise grade. And our focus has been to have an enterprise grade product. And so to us, that means fault tolerance, ease of use, performance, tied into enterprise infrastructure, role-based access, LDAP integration, et cetera. And as part of our announcement, we announced that the core engine of DataTouring RTS is open source as part of DataTouring. RTS real time streaming? Not all the time, because as I said, customers are now using us for batch. I'd like to think of it as like, RTS stands for? Nothing. It's really like you have the initial at the end of a car, exactly. The GT. Yeah, we just thought it had that image of being fast and sleek, right? It sounds like real time streaming to me. So DataTouring RTS. And the thing that's really interesting is when you compare DataTouring RTS to say Spark streaming, we have a lot of interesting capabilities. So we guarantee no data loss. We're fault tolerant. We guarantee event orders. We also provide at least ones, at most ones and only ones, processing semantics. It's very easy to program. You program in Java. In fact, you could almost think of us as a Java app server for big data. So you don't have to learn Scala. You don't have to think in the MapReduce paradigm. So it's built for the enterprise? Yes. So enterprise has certain features. So explain the business model, because you gave me the quick tutorial before we came on. The open source component is this piece? That's what you just announced? Correct. And this is for your added value on top? Correct. Like most open source offerings, we're very excited that what we open sourced is about 18 months ahead of Spark streaming from a technology perspective. So what we've given to the community is a step function and functionality. And our community edition is based on that open source in the project that's called Project Apex, A-P-E-X. And that community edition is free, unlimited use, go to production, that's fantastic. If you want to buy support for that, we call that the standard edition. And that is the white piece of your Lego stack. Now, the rest of the pieces are what we delivered for the enterprise. And that's everything from LDAP integration, role-based access support. We have a drag, drop, and launch capability so that a data scientist can build a big data app with no coding. A business analyst can build their own dashboard and get a real-time dashboard just by dragging and dropping. And from an operational perspective. The community showcase will close at 1.30 today. Please be sure to not open the house for the registration. I think the mics are fine. Oh, okay. So from an, sorry, there was an announcement in case you were wondering why I stopped. And so from an operational perspective, we provide a management council that allows you to manage, monitor, restart, troubleshoot. And once you move to streaming, you really require it to be operational, right? We're not just on the side doing analytics and batch that you can rerun. So the business model, is it that all the layers work together, or can I mix and match components? So all the layers work together. We basically have two additions, the community addition, which is the core platform. That's the free one. Correct. And then we have an enterprise addition which includes all the other capabilities. Can you tell us, you were telling me before about some of the customers, tell us sort of the environment they came from, whether it was batch or to what extent they were able to get real-time. And now where they are, and even if some of them, whether they're anonymous or not, doesn't matter. It's the before and after. So our customers are using this, as I mentioned, for both batch and streaming. And the reason they can do that is we've noticed a generic pattern of ingest and archive, transform and normalize, analyze and create custom business logic, alert and take actions, and then you visualize and persist the data. And that data pipeline is the same whether you're in streaming or batch. So a great streaming example is a company that's doing ad serving. So this company has 800 ad servers, it's generating 40 million events per hour, and they are deduping against two billion events, which is two days worth of events, to allow them to do campaign management in real-time and allows them to, in real-time, programmatically change their ad strategy. And so they're using two different pieces of technology as part of the stream. One is deduping real-time, so they know exactly when they met the campaign criteria. And the other is an in-memory OLAP cube. So they now can ask questions like which ads are performing and which ones are not, and then they can change their strategy so that they don't use the non-performing ads. I don't want to geek out, but updating an OLAP cube in memory is not that easy, is it? I mean, generally, those are created in batch mode. Correct, so we will our highly parallelized solution. So again, unlike Spark Streaming, we will auto-scale and we will auto-parallelize. So we'll take advantage of all the resources in the cluster. So you can set an SLA that says I want to do 10,000 events per second or 100 gig per second, and the system will keep track. And if it's not meeting your SLA, it will use yarn to parallelize and run more instances. And so when the cube needs to grow, we will parallelize the computation and do all the pre-computations. So talk about the company's business model, again, in context to what's going on at Hadoop. So you charge, you pay the price to addition, is that recurring license or annual? So when we put pricing in place, we really wanted to be fair to our customers. We're not out to gouge anyone, yet we also wanted to be consistent with the model that you see in Hadoop. So the Hadoop distills, of course, charge support per node. We, because we're a yarn app, we can get much finer granularity. So we actually charge by container. So you might have 100 node Hadoop cluster, and we're only running on three of those nodes, right? So rather than say you got to pay us for all 100 nodes, we actually say on those three nodes, we're running six containers, each of which is, you know, 20. So it's a consumption pricing model, pretty much. Correct. But on an annual basis. And a finer grain. It is. And it's very fair to customers. And so what they do is, before they add new capabilities or new apps, they know when they're doing it in development, the size of the app and how many containers, and they then know what it's going to cost them. So they can do the value add trade off right away. And most Hadoop clusters grow because of the size of the storage. So people add to the cluster because they need more storage. We've never had a customer say, oh, I need more CPU because of data to one. I better put another node in. There's oodles of CPU and oodles of RAM in every cluster that's not being used. So I got to ask you about why you're winning in your opportunity. So with all this stuff going on with the analytics, the cloud coming over the top, pushing more innovation, kind of dev ops is like, been waiting for this market to develop because now you got software development more productive. Where do you guys see that going? And, you know, as analytics pops out, what's your take on that? So I would say the number one thing is definitely being able to be operational, to be fault tolerant, to be managed, to be monitored. So once you move into the data center, you have to perform. With one of our large customers that are actually a Fortune 10 customer, we finished the POC and the app was live and they introduced us to the data center guys. And of course they were like, data who? You're not going in our data center. And so we have an operational run book, et cetera. But what they did is they did the infamous lunchtime test. So they have their app running and again, this is the one that has 40 million events per hour. They shut down the cluster gracefully and they went to lunch for two hours to simulate an outage. They then brought the cluster back up and we came back as if we came back from an outage. We processed all the data. We caught up on all the queue data and we tied out exactly with their batch run. At that point, the operations guy said, you can run in our data center. So fault tolerance is number one. And then number two is ease of use. So when we talk to enterprises about Hadoop apps, they tell us that the first big issue is I can't build anything, it's too hard to use. And then if I build it, does anyone care? So a lot of times there's value in the app. And then the last piece, again, the full circle is if I build it and people care about it, can I run it? So those are really the three things we see. So what's your take of the Hadoop ecosystem? You mentioned some of the things that you guys saw that Hadoop's been struggling with just as maturity. What's going on this year? What's your take on the show this year? What's the theme? What's the vibe? Where is it at? We were on the spectrum of evolution. What I've seen a lot is really about trying to make Hadoop valuable. How can we deliver value to customers? So we're not just taking big data jobs or data jobs we had and port them to big data. It's great to do an ETL job and do all these things but that's where the enterprise doesn't see value. So I see a lot of the vendors here. I see a lot of customers that I talk to that are really looking for, if I do this, how can I demonstrate the value to my business user and then how can I again keep it operational? So what's next for you guys with DataTorrent? What's the plan? Got more software, chip, customers to secure. What's the plan? So the plan is again, announced DataTorrent RTS 3.0, open source our core platform. We've released our first set of applications which are horizontal applications for building applications, for doing data visualization and doing data ingestion. We're going to continue to add value and take applications into more spaces. So you're going to see us doing more vertical applications built on the core platform so that the customer has the ability to do right custom apps and get prepackaged applications. Great, so what's the ideal customer for you out there, for anyone who's watching who's saying, hey, DataTorrent, wow, I didn't know they were around. That sounds like a use case I could buy and then they're like, wait a minute, how does that compare to Spark? Because I'm waiting for Spark, should I wait for Spark, should I buy DataTorrent? Yeah, so I don't think you should wait for Spark because you may be waiting longer than you want to. But what we do see is that in an enterprise customer, you have a business problem that you want to solve without having to go off and learn Scala or MapReduce or any of those difficult things. And it can be both streaming and batch. So what we see for many enterprises is that they have that pipeline I talked about and they have multiple products. You have an ETL product, they have a BI product, they have a visualization product and those products are run by different teams and those different teams create their own data silos. And so most of our customers have a very common thing where they might have a long batch process. One customer had a two week batch process that we connected the dots. And again, think of us as a Java app server. We connected the dots and we took their two week batch process down to a 46 minute batch process. So it still doesn't have to be real time streaming. You're a great customer if you're looking to get faster time to insight and you're looking to automate your actions so that you can take competitive advantage in the marketplace. So speed's the real deal. It's all about speed and performance. Speed to insight and speed to action. Okay, John Finnelli, VP marketing data attorney. Check him out. We're here live at a Duke Summit. This is theCUBE. We'll be right back. Day three, getting wind down. We're going to get kicked out. The union guys going to take down the stage. We're going to go until they pull the chair out from under us. We'll be right back after this short break. Great.