 Live from Midtown Manhattan, the Cube's live coverage of big data NYC, a Silicon Angle Wikibon production. Made possible by Hortonworks, we do Hadoop. And when this goes, Hadoop, made invincible. And now your co-hosts, John Furrier and Dave Vellante. Hi everybody, we're back. This is Dave Vellante, Wikibon. And this is theCUBE. Silicon Angle's been here all week. Big data NYC. Our event, we're at the Warwick Hotel right across the street from the Hilton. And of course, the Hilton Hadoop World has been going on. StrataConf, this is our fourth year covering this event. We have some great sponsors. And one of them is here right now, Hedapt. Chris Raucca is here. He's the Vice President of Engineering. And he's here with Patrick Tool. Hedapt is a company that we've been following for quite some time, Cambridge-Mast-based. Gentlemen, welcome. Thank you. So how's the show going for you? Oh, it's been great. It's been super busy. It's amazing how much momentum you're seeing. It was packed. I don't think I had a spare five minutes in the last two days. So what are the conversations like at the booth? First of all, people are trying to understand sort of what you guys do, right? So what do you tell them? So tell us about Hedapt. Where do you fit in this whole thing? We know you're doing secret for Hadoop. People are really trying to tease apart the ecosystem and understand where all the pieces fit together. You're definitely seeing a lot of kind of spot solutions, not necessarily an overall integrated story that people can really get their heads around. So it's a lot of just explaining the basics, helping people understand where we fit in, the set of problems we solve, and how we're closing some of the gaps that are sort of plaguing people trying to put together a real end to end solution. Okay, so when they say to you, they come to your booth to go, I've heard of Hedapt. I know you're doing some stuff with SQL. What do you do? Fundamentally, we are a SQL database on Hadoop. We allow people to load their data onto the cluster, query the data, do analytics, all through SQL interface. But you're really good if I understand it at handling just diverse data tool sets and making it available for sort of everyday SQL programmers. Yeah, that's absolutely correct. I mean, a core vision of the company is we don't want to force people to resort to MapReduce jobs or scripting or Java by hand. What we want to do is make all the power of the Hube cluster available through SQL directly. So talk about the architecture and how that differs from sort of other platforms out there. Sure, I think you really have to look at Hedapt as a clean engineered commercial piece of software. A robust SQL implementation on the front end. A very flexible data architecture on the back end that allows us to plug in a number of different data sources as well as clean APIs for integrating machine learning algorithms. Okay, so what's going on, Patrick? And so talk about the sort of the kind of data that customers are trying to figure out to do. They already tested it out and they want something that's more real-time. Talk about that. Yeah, so we actually see sort of kind of different types of use cases that we run into. Most of the time that we're sort of seeing is that with the whole splintering of the database market in general and the explosion of the different data stores that are out there, they're really just looking to kind of unify a lot of these things together and trying to build something where they can put all the pieces together into one place and come up with one answer. And the biggest problem that we saw when we first got out in the market was we saw a lot of people trying to glue the different pieces together, right? So you see them using sort of no SQL databases. You see them using some of the MPP databases and Hadoop, right? And what we sort of found was that when you put all these pieces together, they have this sort of data joining problem, if you will, where it becomes this massive ETL problem. And so where we've really begun to take a huge hold into the market is helping people kind of unify that into one place. So it's really looking at all the different types of information that are out there in one central place. Okay, so customer calls you in. Where do you start? Well, I mean, usually what we start doing is pulling apart their data lifecycle. Where do they collect their data? Where do we put, how are they monetizing it? In other words, how are they looking at it? How are they consuming the information that they have? And evidently what we find is that there is a data scale problem, right? And what we also find is that they're moving a lot of the data all over the place in order to come up with those answers that they need. So a lot of it comes in to make sure we get the right technologies for them and being able to pull it apart and use it in a very unified manner. So it could be very simply talking to the data analyst at a company and understanding what kind of analytics they're trying to get out of their information. It can also be talking to the IT professionals where they've actually had a huge, huge problem of trying to manage many of these systems and they're trying to glue it all together. Now, I wonder if you could share with us some of the more common use cases and maybe even some of the harder to get to use cases. Sure. So the common ones that we see sort of inevitably are things like advertising as well as looking at clickstream analysis and also being able to come up with insights into those pieces. Now, the harder ones to get to really are the explosion of sort of the machine learning type of algorithms where people are trying to put sort of artificial intelligence into their analysis, right? Because there's so few people out there who know how to do that. So what ends up happening is we're able to actually bring some of those algorithms to the platform and execute them via SQL, right? And we also tap into the existing community, right? So how algorithms are out there, people are familiar with those and we're able to actually leverage those and bring them to the data analyst and be able to let them run them from our platform. So those are harder because of the diversity or lack of skill sets, a combination? I would say it's just more lack of knowledge of how to use them correctly is where it comes down to. It's a lot harder to take a set of URLs and say, how do I apply a clustering algorithm to it, right? It's not as a well-known problem as sort of basic SQL and analytics has been for the past 40 years. So last year at Strata, you guys won the best in show, I guess I'd call it. And it was, I think it coincided with the Impala announcement, didn't it? Did it not? From Cloudera, it was very close. So that was good, congratulations on that. But a year later, so compare and contrast if you would the difference between Adapte and Impala. Well, fundamentally, I mean, Adapte is a complete end-in platform and we are backed by on-disk database management system. Our fundamental architecture allows us to scale out very wide, push queries down to the data nodes, get a lot of parallelism out of the cluster, deal with very large data sets, not limited to any kind of memory scaling issue. And so I think we're a broader, more robust solution. I think it's interesting, there's a lot of talk about SQL on Hadoop at the show and it's sort of very noisy and I think that's one of the things to get back to your first question that I think was maybe bringing people to the booth looking for clarity. And I think where you'll see things going and certainly we have a good jump on it is you wanna not just have sort of basic SQL operations on normalized data, you really want a more flexible analytic platform where you can start to bring in data from some other sources. So Patrick managed clickstream analysis and typically you'd have a few kind of normalized columns inside your data source and then some big blob of semi-structured data. And what we're seeing in the market and what we're responding to is the need to be able to access that information directly through standard SQL as well. And so it's a nice differentiator for us because instead of having people going through a complex ETL process, they can use SQL they know and love today, available through their BI tools to query on those pieces of information that haven't been scrubbed and normalized before they're brought into the system. So you guys are one of the first to actually provide that SQL capability to this no SQL environment. So how has that affected sort of adoption? I wonder if you could talk about where you guys are having success? Yeah. Yeah, I mean I think generally speaking I would say that SQL on Hadoop was sort of the first part of it, right? And we sort of pushed past that with some of our recent features and some of the things that we're adding in terms of flexible schema and in terms of machine learning algorithms and such. And so what we're sort of finding in terms of the adoption really is databases have been around for a long time. So if you're gonna bring SQL to the Hadoop platform, you need to have a reason, right? And you need to make sure that you maintain the flexibility in the algorithms that are out there, right? That are on the Hadoop platform and the MapReduce framework. And selectively expand the scope if you will of structured databases, right? And make sure that you can bring features to the users who like Hadoop and love the flexibility of it. And still be able to bring SQL to the larger masses who understand how to write the SQL queries. So the adoption rate really is that's where we're starting to see that mass adoption rate of it is because of the SQL and being able to run the additional algorithm. You see in any particular industries that are taking it up faster than others, financial services, insurance, CPG? Yeah, we absolutely see quite a bit in the advertising and online retail. Yeah, okay. You've mentioned clickstream analysis before, so you're seeing a lot of activity there. Yeah, so the clickstream analysis is sort of e-tailers, if you will, where they've got a massive, massive amount of data. So anything that's sort of online data collection is very, comes very easily, right? And they're the ones who are actually looking at and sort of looking into that information and seeing what they can get out of that information quite a bit. So from a DAPS perspective, you know, Chris, I'll start with you. What would you say was the big takeaway of the event here? Were you here last year? I wasn't actually here last year. But so what's the big takeaway this year for you? I think that really the takeaway is, it's becoming busier, more confusing, and there's a need for a clarity about what the emerging architectures are for how applications are gonna be pieced together, right? I think, for me, it's clear individual siloed projects or point solutions aren't giving people what they need, typical developers, what they need to really put together something that's gonna bring value to their business. I think it's definitely a maturing of the overall space where you're gonna see more commercial offers, frankly, that are providing a solution, right? Not just pieces of technology. Anything you'd add, Patrick, to that? No, I think that's pretty fair. I'd say for me personally, it's neat to see how big it's grown and how it continues to grow and there's a lot of stuff out there. It's really kind of neat. It is neat. It's our fourth year covering this event, the Duke World, a lot of good practitioners and the New York crowd tends to be, once they hop onto something, they tend to drive it pretty hard, so I'm sure you're hearing a lot from those guys. All right, gentlemen, thanks very much for coming on theCUBE, appreciate your time and congratulations for all your success. Good luck going forward. All right, thank you. All right, everybody, keep it right there. I'll be back with John Furrier and Jeff Kelly to rap from Big Data NYC. This is theCUBE, we'll be right back.