 Live from Cambridge, Massachusetts, it's theCUBE at the MIT Chief Data Officer and Information Quality Symposium with hosts Dave Vellante and Jeff Kelly. We're back. This is Dave Vellante with Jeff Kelly. We're here at the MIT Information Quality Symposium at the Tang Center in Cambridge, Massachusetts. Welcome. We've been here all day today. We'll be here all day tomorrow. This is our second MIT CDO forum, Information Quality Forum. Essentially, it's been an Information Quality Conference for the last eight years, but the CDO, the Chief Data Officer, has emerged as a major player in the enterprise, particularly within regulated industries like financial services and health care and government, but increasingly seeping into more commercial enterprises, and so this event has really taken on that theme. It's a small event, 3, 400 people, but very high quality, a lot of MIT folks collaborating. Luis Maldonado is here. He is the Vice President of Products at Squirrel. Good friends down the street. Welcome. Thanks for coming on theCUBE. Thanks for having me. So, what's new at Squirrel? Had you guys on at Hadoop Summit, we were talking about that. What's new since we talked to you guys last? Well, we're obviously moving the product forward. That's a big part of what we're doing right now, seeing great customer attraction. That event was great for us as well, obviously getting a lot of the big data folks in. Following on to that, I think, new since you last saw us, we actually were at Splunk Live just recently now and stay a Splunk Hunk connector for Squirrel. So that was pretty exciting. So a lot of our efforts right now are focusing on how do we help people take what we're doing with them at the data management side and helping them visualize it, understand what's, you know, kind of extract the signal from the noise as you guys do, really start looking through and finding the analysis and, you know, analyzing it and looking for the visualization. You know, I remember when you guys came out of stealth, nobody in the big data world was talking security, right? It was all about map reduce and, you know, the potential of big data and the potential value, et cetera, et cetera, et cetera. And then last year, right around Hadoop Summit, things clicked and I remember it was a good event for you guys because I remember your booth was very crowded and your team was there and very excited. It was almost like there was an epiphany, oh, well, security matters and we don't have a way to secure this data in a granular fashion. So what happened and what has been, I presume it was, it sounds like it was greater this year. Oh yeah, it continues. So I think it's not unlike what you see in a lot of application development. People think about it afterwards. So they, you know, focus on functionality, how do I solve the problem I'm trying to, you know, to get through. And then all of a sudden someone says, wait a minute, if you're going to stick all this data there, who's watching it? How am I protecting it? And I think you see a lot of the projects, if you look at the life cycles, a lot of projects were getting to the point where serious amount of data were going in there, some of them with some real regulatory concerns and people started to think about, well, if that's where we're heading, how are we managing that? So I think that's, it shows the life cycle where the industry is and it's continuing. And I would say, if anything, it's going to continue to increase. We're looking at all the data breaches that are going on. You're looking at all the private information that's being stored in these systems. People are more and more concerned about it now. So there's a parallel between what you guys are doing, I think, and this whole chief data officer theme, because I mean, who are the industries that are most concerned about security? It's financial services, it's healthcare, it's the regulated industries, and that's where you're seeing the chief data officer. Are you seeing that parallel emerge? We are. We are. We see the different concerns in an organization come together. You have the folks that are looking at what do I have available for me from a data standpoint to understand, analyze, extract, and then you have the other folks that are saying, well, but that's regulated. We have to watch this. Recent conversation I had, it was someone from the healthcare industry, it's a scientist looking to do some really interesting research. She could not do her research because she wasn't able to bring all the data together. So while she went to the data side and said, what's available to me for me to be able to do this particular type of genomic research that she was doing, there was actually some concerns about what personal information was coming along with those data sets, not all of it. She just needed to look at certain pieces of it, but you get it all, like all come together, and so there was no way to really extract out the pieces that she needed. So you're seeing these folks kind of come together saying, how can we create a partnership? So how can I get you the data that you need, but how do I manage the regulatory concerns that the rest of the organization has about this data? So I wonder if we could talk a little bit more about security and big data and specifically your strategy. So when you guys came out, I mean obviously security is your fine-grained security at the cell level is your big differentiator. At the same time, there's a concern, okay, that's a one-trick pony, can we use Accumulo as a more broad-based data store? What's happening there? Are those industries that I mentioned earlier sort of dragging you in? Is that where you're going after the opportunity, getting a foothold and growing from there? Are you seeing more broad-based adoption of Accumulo? I wonder if you could talk about that. Sure. I think I would say I'd look at what we've provided as a base. So if you think about the start is just the proposition of being able to bring all your data together and then have that really fine level of security is almost just how do I enter the game? And then from there, what's most attracted our newest customers have been all the layered capabilities on top of that. So the ability to do graph analytics, that's become really interesting, really important to folks, the search capabilities. But what they know is that when I bring all this data together, that I can secure, that it will be secure so that if I do a search, that only the analysts who are allowed to see certain specific pieces of data will see that during their search. The same with a graph. If I'm traversing through and understanding the relationship between analysts or between specific payers in the healthcare industry, I know that there's only pieces that I'm allowed to see will be showing up. So I think that was certainly a great way to start. And I'm seeing that, where do we take it from here? And each one of these industries, the regulations, they know that they're comfortable, they're going to solve that. And then they're looking at functionality now. Now, others have talked cell-level security. Sure. I remember Jeff, when Intel came out with the Sadoop distribution, they were talking cell-levels ago. I was like, why don't you just use a cumulow? Well, we might. So, and then, of course, Intel's now gone a different route. But others have talked about it, maybe even... I can't remember without my head, but I think others have tried to implement it. Now, I know we've had Adam on enough in his immediate response, but you can't just bolt it on like that. So I wonder if you could talk about the state of sort of similar approaches. Where are they at and what's different with what you guys do? Yeah, I mean, so I think the one that we hear the most about is probably HBase and people are saying, yeah, we've added on and we're adding these capabilities. And there's certainly more... There are things you can do by adding it on. There's no doubt, and I think they're doing a great job on involving the HBase code base. There is a difference, though, when you build it from the ground up. And you're building insecurity at every level of what you do. So the examples I used a minute ago, if your layered functionality is a search functionality or it's a traversal, graph traversal, that security is built into every single layer. And so I think you have more challenges when you have to add that on and it becomes more complex to use, more complex to manage, than if it's at the heart and core of what you're doing so that anything you do when you know that we're maintaining that security at every piece of data. So there's an old saying, better is the enemy of good enough. Microsoft has made hundreds of billions in market value being good enough. Is good enough not good enough in security? I would say it depends on who you're talking to, right? There are definitely some industries that just say, I just need roughly this amount of security. And I've heard even some refer to cell level security as almost data source level security. Maybe that's good enough for certain industries. I know that we're hearing places where they are absolutely sold on the proposition of bringing in multiple data sources, mixing all types of things together to create a better analysis. At that level, they are very concerned. And they're the ones that are saying, that might, let's say I'm a financial services institution and I'm bringing in from the retail bank, I'm bringing in data from my credit cards and various, all these different assets coming in. There's personal information involved in every single one of these. And if I can't sort through it, I'm not going to be able to go forward a project that brings all that data together. So I think it's going to depend on the organization. There's certainly some that will be satisfied with good enough, but it's not everywhere. So talk about how you help your customers deal with kind of changing regulatory environments and changing requirements for security. So we had on a number of CDOs today a couple from healthcare organizations, CDO of Seattle Children's Hospital and actually the CIO of Partners Healthcare. We talked a little bit about some of the uncertainty with just take the Affordable Care Act and courts that are striking down parts of the act. And so there's a lot of uncertainty about what the regulatory environment's going to be. How do you help customers have that kind of flexibility needed to adjust to these changing environments? Is that something that's relatively easy to do with the scroll platform? How do you go about helping people do that? Honestly, I think it's a lot of the flexible model that we have for defining what your security labeling scheme, how do you talk about and how you think about security? That's one of the key things that we provide for folks. So just to make it real, if you're an Apple based system, pretty simple model, it might work. But when you've got to change apples and you're using it in very low level code level, it's very difficult to evolve and change as you get new regulations. However, if you have a system that thinks about security labeling and how you think about what type of data am I dealing with? And then separately, I think about, well, what types of access do my users have? And then bring them together in the system itself. It's very natural for you to evolve that type of model. And you say, well, at this point, these types of roles go away. I might change the types of users out of there. Or if I have new labels of data, whether it's private information today or it's coded information tomorrow or whatever the new label might be, I can bring that together. So I think a lot of it is a flexibility of our model. It's not a fixed, here's all the levels that you get. Here's an Apple based system, et cetera. So is that when you're building your application and you set some of these security policies, that's something that can be adjusted and adapted as both the new data that comes in, maybe new regulations that might happen, new use cases. So it's not a rigid kind of approach. That's right. We support a number of different types of systems. We have customers that are using and enhancing and working with things like Active Directory, using Kerberos for authentication. Those types of mechanisms evolve as well. So you might decide Active Directory lists might be not your technology choice in the future. Or it might be that you're looking at a different integration point. So you might need to use another authentication mechanism. Those are the kind of ways that we help you evolve in terms of mechanism as well as policy. So switching gears a little bit. So we're here at the Chief Data Officer and Information Quality Symposium. So I'm curious, so are you seeing this role, the CDO, emerge among your customers? Are you seeing any kind of patterns, maybe in certain vertical markets, maybe government versus healthcare? What are you seeing in terms of the actual term CDO and the role of Chief Data Officers? Is that something that you're actually seeing out in the wild, if you want to? Yeah, I think it's evolving. So I think it's a newer term. I would almost see it parallel to what we saw maybe a year or two ago on data scientists where there were a few of them are out there and they're just becoming more and more. So I think it's the realization that this is an important role. And I would say there are a number of our customers that are playing that role, not necessarily with that title. They may have other titles, but certainly either they're data quality, they're data science, they have that responsibility as well. So we're seeing more of that, but the concern is absolutely there. The ability to bring in, there's tons of different data that they have available to them, but the efficacy, the quality of it, how clean is it, there's a lot of concerns before they even get to the analysis standpoint. How do I make sure that that all works before I even get it? Yeah, well you've got the governance component and then you've got the actual analytics, what you actually want to do with that data. So how are your customers, how are you seeing them kind of tackle those two problems? Are they tackling them in silos or are they trying to bring those two disciplines together? I've seen both. So you see some of the projects. If you start out, there's certainly some early projects where people are just trying out and you'll have someone, kind of a rogue project on the side, IT to be able to test out the functionality. So those tend to grow up and all of a sudden they catch on within the organization and it's later that someone says, okay, now we have to think about what's going in there because we've let the wildness go a little too long. So you see that, but you also see the other approaches where the chief security officer is at the table at the decision point in the project to say if you're going to do this, then here are the concerns that we have to address. So I've seen both. The latter I've seen probably a little bit more in places like financial services, in other places like Telco and maybe a couple of more of the tech type companies, you'll see the kind of the new projects funded and kind of let them grow on their own. So it's interesting to see how they're coming together. I do feel like eventually they do meet and that's where it's interesting to see, well, did your technology choice lead you to the point where you can solve these problems? All right. How about a cumulon summit? How was that this year? You had your first cumulon summit in June, down in DC, right? That's right, yeah. So talk about that a little bit. First one out of the gate. So we co-sponsored, it was oversold. So we're happy to say in the first time out, we knew conference, so you never really know the response, but it was a lot, it was great to see the cumulon community come together. We had everyone from practitioners using it, folks in industry, folks in federal space. We had partners, Caldera and Hortonworks, a lot of great work that was going on there. So the sessions were great. We put them all up available for people to view and actually looking forward to another session next year and even bigger. Yeah, now you guys were the primary catalyst, obviously, of that event, right? Yeah, that's right. And continue to presumably want to evolve that. We do, we do, but it is still part, it's the cumulon summit. So it isn't, it's not the squirrel summit, it's definitely- Yeah, but somebody's gotta drive it. That's right, while we're happy to help drive, just like we do, we continue to work on that, that's right. What is the state of the community? Is it largely based in the DC area, or is it kind of distributed with maybe a Silicon Valley presence and how active is it? Yeah, well you see a bit of, you certainly see a big pocket in the DC area. I think the adoption of a cumulon within the federal space goes without saying that you're gonna see a lot of the contract firms and services firms that are picking it up, building and projects, servicing the agencies. But you also see, we've got Cloudera, we've got Hortonworks that are supporting the project as well and we've got contributors. So we are seeing a West Coast presence. I'd like to see more of the community out there as well. So I think maybe even doing a bi-coastal type of event could be great. But it's great to see all the different types of folks coming together. We even have Splunk there and they're excited about our work together. So it's fun to see how it's growing. Contribution-wise, I think you continue to see it grow. We're certainly, beyond we're the only ones at this point obviously, but there's some really great contributors. I mean, a number of folks see them in person. It's great to see them. So I know you've worked at some, you can't come from HP. So you've worked at more traditional software companies and hardware companies. How does that compare to working at a startup like Squirrel where you've also got a community component of your, that you have to consider as you're building the product and going to market. How does that kind of transition from the world of HP to the world of Squirrel and some of the open source components, some of the community components. How's that kind of transition been for you? It's been fun. I think there's certainly some parallels. I mean, Vertica, we put out a community addition when I was there. And so it was exciting to generate. It wasn't open source necessarily, but it was generating that community effect. Here it was almost built in from the ground up. So I think, I love the fact that we're bringing in and allowing others to kind of extend and embrace. And so, but we have to make sure that things we're doing makes sense for the platform. So just like this last, you know, a Cumula release 1.6 contributing more security work to it, continue to foster the base platform. We have some great ideas on what else we like to push down and let others benefit from. So it's nice to kind of help and manage that constituency is why we're still building commercial product. So a little different, but there's some good parallels. Yeah, I mean, is there a challenge in keeping the community engaged and happy versus what's best for Squirrel the company? I wouldn't say so. Because I think a lot of things that we're interested in, they play very well to the communal community. We can't always put everything we do into the open source. I think we all know we have a product that we have to sell as well. But I don't think it's not, you know, I've seen some really good parallels in terms of what's important to us, you know, helps everyone in the community as well. So I think that's a fine balance. Going forward, I think that the fun part will be engaging more and more folks and trying to have more folks add to it. I'd love to see a bigger, broader contributor base as well. What if we could comment on, Jeff, just right. I mean, you were out of company Vertica, which was not a legacy data warehouse company. You guys were sort of a disrupter there. You know, popularized the columnar and really brought kind of the MPP philosophy to that world and clearly had an impact. There's a prevailing sentiment amongst the large established data platforms that have partnerships with traditional legacy companies like whomever, pick your favorite company, Teradata, Oracle, dot, dot, dot, that will position the Hadoop movement as complimentary. In fact, Amarawadala likes to use the analogy of the iPhone camera versus the SLR, which business would you want to be in? But that's an aside. But trying to position it as complimentary, the research that we've done suggests that while the data warehouse is not necessarily going away, resources are very clearly moving toward Hadoop, very actively. Probably in a more aggressive pace than the new startups that have partnerships with the established players would like you to believe. You guys are a startup, you have relationships with established players. So as unbiased as you possibly can be, what's your point of view on that transition? The data warehouse hasn't, the enterprise data warehouse never really lived up to its promises. That's part of the reason why Vertica came to be. Vertica attacked a lot of those problems, but didn't solve them. And now the promise of Hadoop and things like Accumulo is that it has the potential to solve those problems. So will it, how disruptive will it be? What's going to happen to the traditional data warehouse space? It's absolutely evolving and I think the vendors are realizing that. So it was interesting from when I was at Vertica to watch as the vendors were changing already. I think movements like SQL on Hadoop and those types of things are showing the fact that how do I bridge the technology that I used to support onto this new platform? So I think it naturally is going to evolve. It may be happening faster than some like, but I think you'll see vendors taking actions that come support that. We're just talking about the ADAPTAC position at Teradata. And I think the vendors realize that this technology did serve a purpose and it was for the kind of the state of affairs at the time when it was coming out. It made a lot of sense. And now we have other approaches to it. We have other ways of extending and looking at more data, different types, all types of variety, all different challenges. And it's the ones that continue to evolve and embrace that, I think will survive, but it's absolutely having an effect. All right, good. Well, we'll leave it there. Lewis, thanks very much. Appreciate you. Actually, one last question. So the bumper sticker, let me go back to June. Acumulo Summit, because this is kind of a little niche show. It's not really your wheelhouse, but the Acumulo Summit clearly is as the truck was pulling away from Acumulo Summit. What was your bumper sticker that you was at the back of the truck? What did it say? Ah, that truck that we had there. I would say it was extracting out the secure data signal from the noise. Ah, great. Well, I'll have to do that. I love it, of course. We love it in the Cube. We love that tagline. All right, Lewis, thanks very much for coming to the Cube. It was great to see you again. All right, keep right there, everybody. We'll be back with Paul Gillin and Jeff Kelly to wrap right after this. This is Dave Vellante. This is the Cube.