 Live from New York, it's theCUBE. Covering Big Data New York City, 2016. Brought to you by headline sponsors, Cisco, IBM, NVIDIA, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and Peter Burris. Welcome back to New York City, everybody. This is theCUBE, the worldwide leader in live tech coverage. This is Big Data NYC, part of Big Data Week, part of Strata, plus of Duke World. Tony Fisher is here as the Senior Vice President of Business Development at Zoloni, and he's joined by Kelly Shoup, who's the Vice President of Marketing. Folks, welcome to theCUBE, good to see you. Hi, thanks. Thanks, great to be here. So this is a big week, big week for us, big week for you, the whole community and the ecosystem. How's it going over there? It's going fantastic, yeah. Strata continues, I think, to deliver. So, yeah, it's going really well. Lots of activity. Yeah, good, you guys are happy. Booth traffic is good. Booth traffic is good, lots of new use cases, lots of new potential customers, so yeah, it's all good. Any questions and comments that you're getting? You know, actually, at this particular show, we're seeing some of the most pointed questions, the most data lake-oriented, I want to know this, so I want to know how to do this, so I want to talk about this, then we have at others where it seems to have been, in the past, a little bit more, what is Hadoop, what is a data lake? So we're excited to see, I think, the community is a lot more mature in how they're approaching, right, big data, data platform architectures. Well, typically, in an adoption process, there are three questions that people want to answer. What is it, how does it work, and what's the impacts of using it? And we're now getting into the impacts of using it. Absolutely. They consider different outcomes, they consider different tools, different training, but now when they start thinking about the impact, they start thinking about some of the derivative questions, governance, how much training is going to be needed to utilize all this stuff, talk a little bit about how your starting is, I mean, first of all, are you starting to encounter those questions, and secondly, as you encounter those questions, what kind of answers are you giving to people? Yeah, we absolutely get those questions, and management of big data and governance are fundamental to what we do at Zoloni, so of course we get those types of questions, and we look at the management of data spread out over a bunch of different types of implementation techniques, there's ingestion, first of all, you gotta get the data into the data lake, and then we govern the data, we make sure that as data is ingested, we understand that data, we bring in the metadata associated with that data, and we continue to augment that metadata as data continues to be refined, it's enriched, it's moved more and more towards the consumption and the end user, and as it is moved and enriched, we manage the metadata, we keep up with the metadata, and we catalog all this stuff, and ultimately that catalog then becomes available to the end user and the consuming applications, so what we try to do at Zoloni is we try to manage the full life cycle of data within a data lake environment. I was saying to Kelly, when I first heard the term data lake, I said, oh, this is gonna be a nightmare, you guys said, let's start a company, although you had some experience working with that, take us back to the roots and sort of the decision to go attack that problem. So Zoloni, and this predates Tony and I a little bit, but Zoloni actually started providing big data professional services, and early on Ben Sharma, our founder, founder along with Bajoy Bora, we're going and helping organizations not go, let's just go build a data lake or this, but let's figure out how to make big data work in our environment, how can we take advantage of this, whether it's to do something around cost savings, right? EDW augmentation by leveraging skillet architecture and cheaper storage with Tadoop, or then the next step might be, let's do a very discrete use case where we know we can see some kind of risk reduction or immediate benefit, and then we're starting to see today more and more, of course, we're starting to see the concept of the enterprise data lake, where it's really about having that single source of data that everyone can access for their individual use cases. So we're seeing quite a maturity here, and Zoloni has been there from the beginning, has been working along that evolution. And one of the things that's been really interesting is when we first started down this path, data lake to you, really it meant data dump, right? Or it meant data swamp, right? People really took advantage of that concept and did just that. But from the very beginning, we were working with production type, I mean, organizations that needed a production ready type data lake. We were working with Verizon, wireless, we were working with a number of telco companies, healthcare, like UnitedHealthcare Group, folks that have personal health data that they had to manage. So there wasn't that opportunity to just dump and run. From the beginning, we had to help them do it right. Yeah, absolutely, you made a joke right at the start about, oh, data lake, here comes just another term for us to get to know, and oh my goodness, what's this gonna do? But we kind of considered data lake to be that managed governed environment. You got big data, and a lot of that is just, I wanna dump a bunch of different data into this Hadoop infrastructure or this scale out infrastructure. But we tend to differentiate that from a data lake, which is a more managed environment. Well, you are helping it become more managed, but in many cases, because there's no schema on right, people were using it as a data dump. And the research that we do with chief data officers, they say there's five things that you have to do to become a data-driven organization. And three of them have to be done in sequence, two sort of in parallel. And two in parallel, you gotta partner with the line of business, and you gotta start reskilling. Okay, those are sort of basic things. But the other three relate to data. First is you gotta figure out a monetization strategy, not making money with the data, which was a mistake that a lot of people made early on. It's how does your company make money and how can data support that? Great, and then the second was data sources. Now people just say, okay, great, we got all these data sources, we'll put them into the data lake. But the third was really important, and that's I need to trust the data. It has to have quality, it has to have provenance, and it's gotta be governance. And that's really where you come in. Yeah, absolutely, and those things that you talked about are things that in traditional architectures, organizations had to become comfortable with. If you think about the data warehouse environment or other environments, they're very rigid and a lot of those concepts were considered to be just table stakes in that environment. Not so in the big data world and the data lake world. So you're exactly right, that is where we come in. We're taking a lot of those concepts which organizations are comfortable with and we're applying it to scale out architectures and kind of the next generation architecture. What's the difference between managing and governing the data lake and managing and governing the data in the data lake? Okay, so there's a lot of different ways to answer that question. One of them would be, oh, there's no difference, but I think there are some nuances to managing the data lake versus managing the data within the data lake. And we have this concept that we call data lake 360. And data lake 360 is all about managing the data lake. And as part of data lake 360, every aspect of management within the data lake, whether it happens to be data quality, data lineage, data governance, data life cycle management, all of these are concepts that are required to manage the data lake. Managing the data is more about saying, okay, I'm going to get this data to more accurately reflect the needs of my business. You need data lake management to do that, but they are maybe not so subtly different. Well, except as you said, especially as we try to, as professionals, we try to get more of our business to be aware of this crucial capability and resource. We try to do a better job of anticipating their needs, delivering the data when they need it in the form that they needed, et cetera, than the distinction between managing the resources of the data lake and managing the data lake, managing the data in the data lake, which may come in and go somewhere else pretty quickly, all those relationships start to become more important to the organization. So, you know, similar concept, but maybe a little bit of a tangent, if you don't mind me going there, is one of the things that I really enjoyed in the conversation I had recently with James over at Enterprise Management, was he was talking about the fact that if you can let IT realize that they have some control over the data lake and the data that's within it, and get them feeling comfortable with that level of control, they're gonna be much more likely to do what they need to do to enable that data getting to the business community, right, to their users. And so that's really, I think at the end of the day, what we're trying to accomplish here, and that's where the bedrock portion of our data lake 360 solution and the MICA piece, which is a self-service data preparation piece, really fit nicely, right? So we're helping IT get super comfortable with the fact that they can manage, they can govern, they can automate the data in the data lake, and in doing so. And that's bedrock. That's bedrock, primarily, yes, absolutely. And then by doing that, they're able to get a lot more comfortable to say to the business user, here's your on-ramp through MICA to do discovery through the catalog and to do the self-service data preparation you might wanna do to transform the data in a way that matters to you. And then the nice piece, which is the feedback loop, is the fact that because these two are so tightly integrated, you can then say, you know, I would like to operationalize this transformation I've just created and kick that back to bedrock, which will then operationalize that for you. Yeah, I think that's a really important point. So one of the things that we talk about in our research and broadly, we've said it, I don't know, how many times in the cube, 350 or 60 times in the past couple of days? 365. Yeah, 365. That the, that we, this digital business transformation everybody's talking about is great for consultants and analysts because nobody ever gets very specific about what it means. We're trying to get specific and to us, a digital business is a business that uses data to differentially create and sustain customers. It is about data. A digital business uses data as an asset and that's gonna become increasingly, we think it's gonna become increasingly important. And so this ability to both bring structure and governance to the data lake so that one group of folks can use it as well as onboard people more easily and then take the learnings from that process and turn it into new structures and then go back in the data lake seems really important. How are, how is that dynamic working? Are you starting to actually see your clients, your customers say we're managing the data lake, we're managing the data and then actually bring some almost asset orientations to how they do this from an investment standpoint? Sure, I think that if you kind of take what Kelly was saying and take what you were saying and put them together, there's some actual process about the way we do that and a lot of that has to do with using the Bedrock platform as the IT control over the data lake environment but at the end of that, the IT community has said, okay, I've done my job and here is your set of trusted data, Mr. Business. And then the business can pick that up and further refine it and create continued refinements of the data that are more accurately reflective of the needs of that part of the business and we even have this concept of a shopping cart within Micah so that the business analysts can pull the data that they need, check it out and say, okay, as part of my business I need this data and I'll need to have it directed towards me in the future on an ongoing basis. But what that allows organizations to do is it allows the organizations to have the business analyst or data scientist more involved in the business issues and less involved in the data issues and I think to kind of put a fine point on what you're saying data needs to support the needs of the business. The business doesn't need to have overbearing influence in the data, the data needs to be there to support the business and that really is the goal behind Bedrock and Micah together. We have a really interesting use case that's popped up in a couple of different client implementations and that is taking advantage of that shopping cart that he's talking about to create an internal data marketplace. So organizations are saying now we wanna create that easy user experience, they'll feel like an Amazon shopping cart and allow the business to get in there and play. And I don't think they would have, this is a next generation data like concept, they wouldn't have been doing that unless they felt like they had what they needed to control the data. I love the Data Lake 360 and it reminds me of Rob Hofer, our editor-in-chief at SiliconANGLE just wrote an article on the broken promises of open source software and what was good about the article was prescriptive, there's hope. But it reminds me of the broken promises of the enterprise data warehouse. The enterprise data warehouse promised us a 360 degree view and it promised us predictive analytics and it was never able to deliver and thank God for Sarbanes Oxley because it sort of saved that industry for a while. But so now we're here and we're promising this 360 degree view, will we deliver? I'll take that one for you guys on the spot. Look, I'm in marketing so I'm gonna give you a very positive spin on this. This is on a guy around here. Just say yes. Yes, yes. The fact of the matter is that a lot of data warehouses are very, very successful. The problem with data warehouses is they tend to implode under their own weight and the weight of management and the weight of maintenance and to answer new business problems and questions within the data warehouse environment was very, very difficult. So you may have had a 360 view in a point in time but then things change. So it's not dynamic enough to support the ongoing needs of most organizations. The data lake environment is different because of what you've already talked about which is schema on read. There's no reason for this massive rigor before you can actually do anything. You can begin to augment the information within your data lake immediately. And so 360 becomes much more dynamic. You really do get to as your 360 vista continues to grow you get to keep up with it. So I'm not gonna say it's gonna be just all sunshine and butterflies. There's gonna be some work to do it but just by nature of the dynamic architecture it's going to be easier and more straightforward. And you're right. I mean there's obviously a lot of value created with the data warehouse but I would still contend that never lived up to the vision and the promises and the reason I'm sanguine about this space is because the EDW was like a snake swallowing of basketball and every time a new Intel chip came out they had to go try to keep up. This is different. You're able to, and the other problem with EDW and the whole BI space is it was insights that were owned by just a few and by the time you got them back it was too late. That is changing, is it not? It is and a lot of that has to do with the again fundamentally about some architectural issues. You can support a lot of more data and a lot of different types of data and you can process a lot more data and a lot of different types of data so you can answer questions you couldn't answer in the past. So yeah. And you can operationalize those insights. Which is great because there is a lot more data and a lot more different types of data and it's only gonna get worse over the next few years. And a lot more value in that data that could be derived. If you do it right. Right, it's derived. When it comes to big data, cloud is gonna be such a game changer as well from our point of view, right? I mean we are starting to see more and more hybrid environments. They're helping to answer certain parts of the equation where organizations might have on-premises for one scenario, cloud for another. We're starting to see real time in batch in terms of streaming and batch coming into play in the data lake. So there's so many things that we now kind of get out there and catch. But I think it's an industry we're moving closer and closer to being able to do that and to provide that modern data platform that people really need. So we'll be in the center of it. Thank you Kelly, Tony. Appreciate you guys coming on theCUBE. Thank you. Pleasure. All right, keep it right there. We'll be back with our next guest. This is theCUBE, we're live from New York City and be right back.