 The Cube at Hadoop Summit 2014 is brought to you by anchor sponsor, Hortonworks. We do Hadoop. And headline sponsor, WAN Disco. We make Hadoop invincible. And welcome back everybody. We are here live at Hadoop Summit in San Jose. I'm Jeff Kelly with Wikibon. You're watching The Cube. And I'm here with my next guest, Joe Tribalini, who's the director of product marketing at Squirrel. Joe, welcome to The Cube. Thanks, Jeff. Glad to be here. So we've had Squirrel on before. Eli Kahn has joined us many times to talk a little bit about what you guys are doing. And I think Squirrel is becoming pretty well known throughout the industry with the way you're building your analytics platform on top of Acumulo. Why don't you tell the audience a little bit about the company and kind of what you guys have been up to for the last few months since we last talked? Sure. So we extend the open core of Apache Acumulo and Apache Hadoop. We really want to make it easy for people to build real-time applications on top. So Acumulo being a NoSQL database with a strong security heritage, obviously, originating at the NSA, we actually abstract away the big table style NoSQL data model. It's something that people, I don't think, realize that about us. We give you more of a MongoDB-like experience with JSON documents, nested fields of JSON, and allow you to build graphs. So a graph, let's say you have two JSON documents as your nodes. You can draw an edge between them to indicate a relationship. And then you can do graph-based analytics to traverse, to search, to find patterns in your data. Talk, let's dig into that a little bit around graph-based analytics. I mean, we've heard that described as the killer app or the potential killer app for big data. Why is graph-based analytics so powerful? It's powerful because there's certain analytics that you can do that you can't really do with a relational model. So I'll give you an example of one of our customers. They have a graph structure and they're trying to find not only the discrepancies in the shapes of the graph, but also the time-based correlations within that. So if you deviate from a certain structure within a graph over time, it's really hard to express, okay, this data entity is connected to another data entity which is also connected to this third thing. Over time, that relationship has changed. It's really hard to express that with a relational model. So it's about understanding relationships between entities over time. And so put some color on that. Let's say in a financial services organization, how would that be applicable? Sure, so in a financial services world, let's say you're sitting at a corporate treasury desk and you want to build an application for, let's say, counterparty risk. So you can model your financial institutions as nodes, debt positions could be outstanding between them as edges, you could enrich that with exchange rate forecasts, all these different entities logically that you want to track, to do a join in a SQL style world to even specify the problem, let alone analyze it, is going to be an impossible task. So give us an update on the company itself. Where are you guys in terms of headcount and the development of the company as a self? We've been watching you guys since the beginning, kind of seeing you growing up. Where are we today? Yeah, so I joined SQL in February and since then I think we've hired five full-time employees, so we're growing pretty rapidly. We're about 30 employees now full-time, 20 of which are in engineering. We've got a couple of positions open if anybody's out there and interested in the Boston area. We're doing real well, have well over a dozen customers. Not sure what you heard at the last update, but it continues to grow. Very good. Yeah, it's always good to have somebody from Boston join me up here. We're surrounded by these Silicon Valley types. It's good to have some East Coast cred back on theCUBE. So what Squirrel I think is known for, for people who don't know Squirrel really well is they think the sell level security. And that's an important part of the story, but it's only part of the story and it's really an enabler to some of the other things that you're able to do. Put it in context, what role does security and your security capabilities play in your larger value proposition? Yeah, absolutely. So we're kind of reaching an interesting inflection point here where traditional security and cyber security are kind of that line between them is getting blurred. So the sophistication of the tax that are coming from APTs, insider threats, data flowing in, out and across your networks with very little friction. It makes security a whole lot more important, especially with big data as I like to say, when you consolidate all your data into a single platform, you're really compounding the risk of protecting that data. I don't think we've heard of a major security breach that had to do with Hadoop just yet, but it seems like it's a target that's right for the picking. So it's really important to have that expressive granular sell level control on your data. We take that a step further and enrich that with a number of engines. So the way Squirrel's sell level security works, every bit of data has a series of labels attached to it that specify who can access it. We really built out the whole data-centric security ecosystem around that. So it's great to accommodate labels, but how do you get them on there? So we give you an engine that lets you specify rules that dictate how data gets labeled. Let's say you want to do a regular expression. This thing looks like a social security number, tag it with PII for personally identifiable information. We give you another engine that gives entitlements for authorizations for users to be able to access those labels. So let's say we key on things like LDAP, this person's not a new hire, they've been in the system for greater than 30 days. We also enrich that with environmental attributes, circumstances under which you're making the request. So you're at an authorized terminal at work Monday through Friday at nine to five. If you meet all those conditions, then you can get access to data labeled PII. If any of those things isn't true, let's say you go home and VPN in on the weekend, while you're trying to do that, that might be an abnormal activity, we can deny access to the data. We continue to enrich that with encryption, with secure search, which is an area that I think, we're the only big data vendor that has term-level security on search indexes. That's a key advantage for us in auditing and to end across the system. So of course, the squirrel and your analytic platform sits on top of Hadoop. So Hadoop is a critical component and you obviously will have a vested interest in the adoption of Hadoop itself to kind of enable your business. What are you seeing out there in the field in terms of level of adoption, the types of organizations that are adopting Hadoop now? Are we still in that early adopter phase, do you think? Where are we? Are we near the tipping point where Hadoop is about to go mainstream? What are you seeing out in the field and maybe how is your experience here over the last couple of days? Yeah, I definitely think we're at the tipping point, right? I've talked to very few people here that aren't using Hadoop in some capacity. Whether it's still a POC capacity, they have Hadoop, right? More and more of our customers and prospects, we're going in, early on we'd be going in and they'd want us to set up Hadoop for them. That's pretty much no longer the case. Everybody's really, they've got to handle on Hadoop, they're talking to vendors, maybe they've got production support from one of the big platform providers. And from our perspective, you can bring your own Hadoop, we sit above that. So we have partnerships with like, you know, the five major providers out there. Another big topic here this week has been kind of the relationship between Hadoop and some of the analytics you can do in Hadoop now thanks to Yarn and Cumulo and some of the other more advanced type of analytic platforms versus the data warehouse and the more traditional data warehouse from the vendors we all, you know, the household names that are back here as well. What's your view on that? How does, where does Hadoop and the analytics that you provide, does it complement the more traditional data warehouse? Is there an overlap there? Is it going to be a replacement over the long term? What's your view? Over the very long term, maybe the 20 to 30 years, it might be an eventual replacement. I don't think a data warehouse is going anywhere anytime soon. The characterization of workloads that run a Hadoop are just totally different. We see what many others probably see where people are offloading a lot of ETL types of workload from data warehouses into Hadoop. People are starting to realize the analytic complexity of things you can do, especially with like graph databases on top of Hadoop. So it gives you a different workload that really couldn't accommodate or accommodate well in a traditional data warehouse. I don't think the established data warehouse providers have much of a threat from Hadoop in the next 10 years, really. That's interesting. Yeah, I mean, we're seeing some overlap, but I think there is going to be some competition for dollars when you're talking about offloading some of those workloads. It could impact the revenue of some of the data warehouse providers, but I think you're onto something. It is a longer term threat, but you're not going to see these vendors go away in the next five years. That's not the situation. Right, I mean, we've probably all heard the vision from guys like Doug Cutting and Arun Murthy about Yarn being the data operating system. It's interesting to note, if you look at the kind of the emergence of Hadoop, some could argue it has roots in cloud computing and virtualization. You have virtual machines now that they're almost a vestigial organ, right? You throw Windows or Mac or Linux in a VM and it's this container that you can put in your elastic cloud. But really at the end of the day, people have data and they have applications. They have workloads that they want to run on the data. Yarn and Hadoop bring us closer to that direction of like a distributed agreement. But there's still a lot of work to be done to make an enterprise ready from a multi-tenancy standpoint, resource control, as well as having those canned apps that you can just throw on like the Apple App Store model and solve a problem really quickly. So I know you guys had an announcement this week around the new Test Drive VM. Tell us a little bit about that. Yeah, so the Test Drive VM is something we're really excited about. It gives our customers and prospects a way for people that are just interested a frictionless way to try Squirrel hands-on without having to set up a Hadoop cluster. It's a self-contained virtual appliance with all the Hadoop demons running on it with Squirrel pre-installed and configured dummy data sets in there with projects that you can load and play around with to learn how to do things the data-centric security way to learn how to do things with our document and graph data model and to build applications. And over time, we're going to be releasing more and more stuff like out on GitHub with tutorials and blog posts that show people the power of a tool like this. Well, that's kind of a related question. It's critical for a company like yours to attract developers. You've got to win the developer community for you guys to be successful long-term. So this seems like one of the ways you're doing that. What are some of the ways you're trying to really gain developer traction? Yeah, so we're being very active, conferences and trade shows like this out on social media, on our website, really just continuing to deliver hands-on tutorials so people can, rather than just having a data sheet, they actually have it at the steering wheel. They're really building applications on top. All right, very good. Well, we've got time for just one more question. So what's on tap for Squirrel? What can we look for in the next six months, 12 months? What are some of your top priorities? Yeah, so more and more when we're talking to folks, people want more big data solutions than anything else, right? So we're looking at what we can do from a partnership perspective. We're an operational data store being a no-SQL database, but people want operational applications. So what can we do, what areas can we focus on to give people turnkey solutions, partnering with other people in the ecosystem? I would say keep your eyes peeled for that. All right, and we will. Joe Tribalini, thank you so much for joining us on theCUBE. We'll be right back after this, so stay tuned.