 Live from New York, it's theCUBE covering Big Data NYC 2015. Brought to you by Hortonworks, IBM, EMC, and Pivotal. Now for your hosts, John Furrier and Dave Vellante. Welcome back to New York City, everybody. This is theCUBE. We're here at Strata and Hadoop World. This is our event within the event. This is Big Data NYC. Run, Bob Kinnis back. He's the founder and president of Think Big Analytics. Ron, good to see you again. We were talking just off camera. It's been just about a year since the Teradata acquisition. So how's it going? Well, we haven't completed our assimilation of Teradata after our acquisition of them, but we're making good progress. You know, we're definitely seeing a lot of enthusiasm inside Teradata for the opportunity that Big Data really presents, right? That Teradata's got all these great assets around managing data, you know, more maturity around data management and so much depth in analytics. And so all of those assets combined with our DNA of DevOps and software architecture and the Big Data world, you know, there's a lot of synergy and a lot of power. And so we're pretty excited about collaborating and bringing coherent approaches to help customers solve Big Data problems with real solutions, right? I mean, at Think Big we always have had a strategy of helping customers build high value applications with Big Data. Those high value applications tend to be unique to their business, using their data sets, the way they interact with customers and products, and you know, they're not a packaged out-of-the-box thing, instead it's high value for companies that really leverage the data. So I think we've put together, you know, with us joining Teradata, I'll kidding aside, that we really bring a complete package for why we're the best provider for those analytic solutions. Well, we were joking, but I can think of many, many examples having been in this industry a long time, where large companies buy small companies and then pollute the small companies, stamp out the DNA, and then they just not really don't get the value out of it. That's not happening, I'm inferring. So why, how did that happen? Was it just sort of Teradata embraced? What you were doing, they're sort of, you know, leaning into what you're doing? Or they, a lot of times, successful acquisitions, they leave you alone. It doesn't sound like it's a leave you alone type of strategy. Can you describe the dynamic? Yeah, I think it's more like bringing, that we brought a very complimentary capability, right? Teradata, like we talked about in the last session earlier, it invested a lot in this concept of the unified data architecture, being able to put together a best in class data warehouse and discovery tools and data platforms, i.e. Hadoop and now real-time processing. But we bring with our depth in Hadoop and Spark and Kafka and so forth, all this capability to execute the data lakes, the analytics in Hadoop, the streaming real-time analytics that really complement a lot of the capabilities that Teradata is best in class in, discovery with Aster, data warehouse with Hadoop, or sorry, with Teradata, data warehouse. So, you know, I think the reason why we joined is because they saw our capabilities and the ability to drive solutions in the big data space is critical to that unified data architecture that the customers had told Teradata it was really critical to them, right? So we brought a really complimentary part of the picture and that's why we have, there's an important element of how we keep our own brand and mission, we remain independent consultants, we're not incended to sell Teradata appliances, but at the same time, we collaborate on building bigger solutions, drawing on the skills and expertise inside of the Teradata parent company. You were talking about your expertise in building high value apps. Can we talk about some of those apps? Sure. I mean, you know, everybody thinks about fraud detection, I know, you know, customer churn comes up a lot, things of that nature. What are those high value apps and let's talk about how they're maturing. Absolutely, so certainly those are ones that we've had experience in, those two identify various marketing applications. That's, you know, I think a lot of people know about that use of big data to get deeper insight into consumer behavior and we do a fair bit of that work. But I think, you know, maybe I'll start with a different kind, which, you know, when we started Think Big, I didn't anticipate being so big, which is internet of things, device data, right? That we've done a lot of work over the years and the first customers in that space were high tech manufacturing companies. You know, companies that had data center gear that they would send out to run at distributed locations around the world and collect petabytes of phone home data, which will typically be data around the configuration of the device. It'd be information about sensor readings. It'd be information about alerts around what's going wrong, you know, any error conditions or problems that the system's encountering. So we did projects like a couple of big projects like that. We've done a number of projects for high tech manufacturers on yield improvement using massive amounts of test data throughout the life cycle of products, R&D, QA, scaling up production deployment and then even customer service. And what's exciting is now we're starting to see a next phase where those are being applied not only inside of the high tech manufacturing vertical, but we're seeing customers in other areas embrace these. So we're starting to see device data, several of our insurance customers are working with telematics data and we're helping them with that. You know, to be able to do things like determine driver behavior based on the way people are actually driving cars and correlating it with other information like traffic patterns and weather that can inform whether, you know, if somebody breaks hard, that might be a sign, normally it's a sign of a poor driver, right? They break hard. But if they're avoiding an accident that others are hitting things and they're a good driver because they broke hard, right? So squinting through that? Yeah, pulling together those kind of complex signals, right? And so we're helping customers with that. We're helping with use cases like we are working with a large medical device manufacturer around really helping them deliver integrated health outcomes, but there's a ton of data from those medical devices, managing it, doing analytics and delivering it to really have an impact on health outcomes, right? So we've done a lot of work around device data, machine data, Internet of Things, as well as advertising applications, personalization, doing deep analytics to understand the consumer, right? So those are all application areas that we've done a lot of work. And you said bigger than you thought it would be. Yeah, it was. We've seen a lot of significant value for consumers using it. Of course, any consumer facing business has a lot of value from big data and understanding consumer behavior. We've seen a lot of that in the financial services sector, whether it be banks or brokerages or insurance companies, right? That there's a payment processors, right? We've been working with a range of financial services companies around consumer behavior data, as well as retailers and media companies and online companies as well. So I want to ask you a question about the sort of high tech manufacturing companies and the business outcome that they were looking for was just improving, being proactive on service or just better visibility on potential failures? So I think there's some different ones. One big one is improving the yield in the manufacturing process, speeding up time to market, as you can imagine in a high tech market where product lifecycle is often less than 24 months from launch till the product's obsolete. If you can get to market a month early, it has a big impact both on your profitability but also on your market share. By improving quality and improving your reputation, it improves market share as well. And there's a significant waste. With lower yield, there's a lot of scrap that gets thrown out that adds a lot of cost. So that's big. Part of it is just enabling engineers to use big data techniques to better understand the problem domain. Our customers that are now mature and using big data at scale are discovering all kinds of things that their old approaches of sampling data and using small approximate data sets were missing. So they're getting smarter, they're getting more proactive about the analytics and it's letting their engineers spend more time engineering and less time troubleshooting and running down problems. And of those applications that you talked about, did they all involve some real time nature or some of them, can you talk about that? Yeah, so I think that there's a maturity curve that companies typically, I mean, there's a lot of enthusiasm for real time in the market, right? And so there's a lot of sizzle around it and often associated with the latest hype technology. So a couple of years ago, people were really excited about storm and now people are excited about spark streaming and there's value in those technologies and we work with spark a lot including streaming but I think a lot of times people reach for some of these real time and streaming engines where it's not that big a deal for the use case. There's a lot of times when we talk to our customers, they say, oh, we want this in real time. We say, what does that mean to you? How slow? If you had it in an hour, would that be valuable? Oh yeah, that would be real time to us, right? A lot of organizations that are dealing with slow overnight batch windows that bleed into the next day, getting data even early the next morning is real time. Or an hour or five minutes. In fact, I think what you see is almost always when a human's in the loop looking at data and making decisions, true real time analytics isn't very important in terms of having data from the last moment. Having fast results when you're doing an analysis is incredibly valuable, that's a form of real time. But so when you are doing, executing a business process, if you're serving a webpage, diagnosing a trouble code from a device that's phoning in, your call center agents talking to somebody and they get a cue about what to do, real time there is important to be able to, so that's a more mature level of adoption where you're now not just doing decision support analytics with big data, but you're actually building machine learned models to automate a process, real time is important there. However, in that case, there's a lot of scenarios where simpler architectures that don't involve streaming are perfectly adequate, right? You store a certain amount of state about what's going on in the interaction and when that happens, like okay, this consumer just browsed to the sports section, so we should record in their profile that they're currently looking for something around sports. That'll help us if we generate offers for them, right? So that kind of thing, there's a small amount of state, you can use a database, a NoSQL database or otherwise, you don't necessarily need to farm it out or a whole distributed network to process that. So there's the right tool for the job. I think sometimes that's the challenge is that there's too much, there's so much new technology in the big data space, it's overwhelming for a lot of our customers, a lot of people in the industry, the businesses, and it's knowing the right tool for the right job. What are the different kinds of real time? When do I want a scalable micro batch approach? There's a lot of those kinds of questions that things are moving too fast for knowledge to generally settle in that everybody has, there's a common accepted approach that's well understood. So do you think this all this discussion about trying to change the outcome before you lose the customer, that's how many people define real time, is it a little bit overblown right now? You mentioned micro batch, which is kind of a spark, it's sort of a micro batch scalable architecture, is this good enough? Maybe give us some perspective on that, is this sort of a lot of marketing hype or are you actually seeing real use cases where? Yes, real time where you're actually changing an outcome for user, personalizing, driving a specific outcome based on a model is a reasonable thing to do. That's that process execution use case I was talking about where you're in the middle of some business process and you want to have real time data reflecting everything that just happened to make a recommendation or to guide a human interaction, human interaction. So that we're doing that, we've done that for a while, I mean, one of the first things I ever did in big data prior to starting Think Big, I was VP engineering of a company called Quantcast and we had a lookalike business where we would in real time figure out for this given person looking at this piece of content at this time, do they look like someone who's likely to buy anyone of a number of advertisers product so we could tailor an ad for the right campaign, right? That was a real time execution use case, a process execution, right? So there was good uses for it, but I just point out that a lot of times people are wanting to do more decision support human analytics where it's not so important and even within that, a lot of times the architecture to support real time doesn't have to be something complicated like a distributed streaming system, you could have a much simpler architecture. Well, we always say you can't take the humans out of the equation, maybe actually you can, there are more and more use cases, but there's still humans are involved in a lot of situations, but I want to try to take it back to maybe some things that our audience can relate to in the context of real time. I mean, I think of fraud detection and I get, you know, paying half a day later when I've made a transaction, just the other day, you know, on a plane and made a transaction, got a fraud detection alert that said, did you do a fast food transaction yesterday? I'm like, no, I started to do no. And I'm like, wait a minute, could that have been, you know, on a plane, sort of just missed the example? Yeah, that's a good example where real time is valuable, right? You're in the middle of a process, so getting alert as close to real time then is a lot more valuable. But they negated the transaction and so now it's, and that happens, you get a lot of false positives or false negatives, I guess, in that case. And so it seems like there's a long way to go with regard to, you know, the quality of real time. Is that fair? Well, I mean, I think that people are not uniformly executing real time well, right, to your point. I mean, I think- Can you help? Yeah, absolutely. You know, we- I mean, that example I gave, could you help solve that problem? Absolutely, yeah. We can actually improve the feedback to the customer so that he or she can respond accurately, so they don't have to make three phone calls or think about it for a half hour and, you know, reduce the false negatives. Project with a large financial payment processor that was all about modernizing and fraud detection, right? So making it easier to quickly, to test out new ways. I mean, that's a good example of your point. You can't take these people out of the equation. There was a blend of machine learning but human curation of rules and trying things out, simulating them, testing them and then executing them in real time, right? So we built the whole Hadoop backend to power that and, you know, there's a front and an in-memory database so for incredible speed in a real-time way that you can block a transaction. Everyone's got the ability to block transactions in real time, right? But then you also want to have the ability you have to but you have the ability to tie it into a backend system where you can act, alert efficiently on it. Right, excellent. All right, listen, Ron, we're out of time. Thank you very much for coming back on. Thanks Dave. Always a pleasure talking to you. Great to be here. All right, keep right there. We'll be back with our next guest right after this. This is theCUBE, we're live from Big Data NYC at Strata in Hadoop. Right back.