 San Jose, California. It's theCUBE, covering Big Data Silicon Valley 2017. Hey, welcome back everyone, live in Silicon Valley for Big Data SV. This is theCUBE coverage in conjunction with Strata Hadoop. I'm John Furrier with George Gilbert, analyst at Wikibon, two great guests. We have Stephanie McReynolds, vice president, startup Elation and Lee Perris, who's the VP of Think Big Analytics. Thanks for coming back. Both been on theCUBE, you haven't been on theCUBE before, but Think Big has been on many times. Great to see you. What's new, what are you guys set up to? Yeah, excited to be here and to be here with Lee. Lee and I have a personal relationship that goes back quite a ways in the industry. And then what we're talking about today is the integration between Kylo, which was recently announced as an open source project from Think Big, and Elation's capability to sit on top of Kylo and to gather to increase the velocity of data lake initiatives, kind of going from zero to 60 in a pretty short amount of time to get both technical value from Kylo and business value from Elation. So talk about Elation's attraction because you guys have had, has been an interesting startup. Got a lot of great press, George is a big fan. He's been jumping with some questions but lots of good products fit with the market. What's the update? What's some of the status on the traction in terms of the company and customers and whatnot? Yeah, we've been growing pretty rapidly for a startup. We've doubled our production customer account from last time we talked. Some great brand names, Munich re-insurance. This morning was talking about their implementation. So they have 600 users of Elation in their organization. We've entered Europe, not only with Munich re-insurance but Tesco is a large account of ours in Europe now. And here in the States, we've seen broad adoption across a wide range of industries. Everyone from Pfizer in the healthcare space to eBay who's been our longest standing customer, they have about 1,000 weekly users on Elation. So not only great, kind of a great increase in number of logos but also organic growth internally at many of these companies across data scientists, data analysts, business analysts, a wide range of users of the product as well. It's been interesting and what I like about your approach is, and we've talked about Think Big about it before, we've let every guest coming on so far that's been in the same areas talking about metadata layers. And so this is interesting. There's a metadata, data addressability if you will for lack of a better description but yet human usable and has to be integrating into human processes whether it's virtualization or any time in real time app or anything so you're seeing this convergence between I need to get the data into an app and whether it's IoT data or something else, really, really fast. So really kind of the discovery piece is now the interesting layer. How's the competitive is it? And what's the different solutions that you guys see in this market? Yeah, I think it's interesting because metadata has kind of had a revival, right? Everyone's talking about the importance of metadata and open integration with metadata. I think really our angle as elation is that having open transfer of technical metadata is very important for the foundation of analytics but what really brings that technical metadata to life is also understanding what is the business context of what's happening technically in the system. What's the business context of data? What's the behavioral context of how that data's been used? That might inform me as an analyst. And what's your unique approach to that? Because that's like the Holy Grail. I mean, it's like translating geek metadata, indexing stuff into like usable. Yeah. It's how it comes as it's been a cliche for years. The approach is really based on machine learning and AI technology to make recommendations to business users about what might be interesting to them. So we're at a state in the market where there's so much data that is available and that you can access either in Hadoop as a data lake or in a data warehouse in a database like Teradata that today what you need as state of the art is the system to start to recommend to you what might be interesting data for you to use as a data scientist or an analyst and not just what's the data you could use but how accurate is that data? How trustworthy is it? I think there's a whole nother theme of governance that's rising, that's tied to that metadata discussion which is it's not enough to just shove bits and bytes between different systems anymore. You really need to understand how's this data been manipulated and used and how does that influence my security considerations, my privacy considerations, the value I'm going to be able to get out of that data set. Lee, what's your take on this? Because you guys have a relationship. How's things big doing? Can we talk about the partnership you guys have with Elation? Sure, so I mean when you look at what we've done and specifically to an open source project it's the first one that Teradata is fully sponsored and released based on Apache 2.0 called Kylo. It's really about the enablement of the full data lake platform and the full framework everywhere from ingest to securing it to governing it which part of that is collecting is part of that process, the basic technical and business metadata. So later you can hand it over to the user so they can sample, they can profile the data they can find, they can search it in a Google-like manner and then you can enable the organization with that data. So when you look at it from a standpoint of partnering together it's really about collecting that data specifically within Hadoop to enable it yet with the ability then to hand it off to more of the enterprise wide solution like Elation through API connections to connect to that and then for them to enrich it in the way that they go about it with the social collaboration and the business to extend it from there. So that's an accelerant then. So you're accelerating the open source project in through this new with Elation so you're still going to rock and roll with the open source. Very much going to rock and roll with the open source. So it's really been based on five years of Think Big's work in the marketplace over about 150 data lakes. The IP we built around that to do things repeatedly, consistently, and then releasing that in the last two years dedicated development based on Apache Spark and NIFI to extend that out. Great work by the way, open source has continued to be more relevant but I got to get your perspective on a meme that's been floating around day one here and maybe it's because of the election but someone said we got to drain the data swamp and make data great again. I play on Trump but the data lake is going through a transition and saying okay we got data lakes but now this year it's been a focus on making that much more active and cleaner or making sure it doesn't become a swamp if you will so there's been a focus of taking data lake content and getting it into real time and IOT has kind of I think been a forcing function but you guys, do you guys have a perspective on that where data lakes are going? Certainly it's been trending conversation here at the show. Yeah, I mean I think IOT has been part of drain that data swamp but I think also now you have a mass of business analysts that are starting to get access to that data in the lake. These Hadoop implementations are maturing to the stage where you have- So value coming out of it. Yeah and people are trying to ring value out of that lake and sometimes finding that it's harder than they expected because the data hasn't been pre-prepared for them. This old world of IOT would pre-prepare the data and then I got a single metric or I got a couple metrics to choose from is now turned on its head. People are taking a more exploratory discovery oriented approach to navigating through their data and finding that the nuances of data really matter when trying to evolve in insight and so the literacy in these organizations and their awareness of some of the challenges of a lake are coming to the forefront and I think that's a healthy conversation for us all to have. If you're going to have a data-driven organization you have to really understand the nuances of your data to know where to apply it appropriately to decision-making. So Ray Azzi actually going back quite a few years when he started at Microsoft said internet software has changed paradigm somewhat in that we had this new sort of set of actions where it was discover, learn, try by, recommend. And it sounds like as a consumer of data in a data lake we've added or pre-pended this step, this discovery step where in a well-curated data warehouse it was learn. You had your X dimensions that were curated and refined and you don't have that as much with the data lake. And I guess I'm wondering it's almost like if you're going to take, as we were talking to the last team at scale and moving OLAP to be something you'd consume on a data lake the way you consume on a data warehouse it's almost like a relation in a smart catalog is as much a requirement as a visualization tool is by itself on a data warehouse. Yeah, I think what we're seeing is this notion of data needing to be curated and including many brains and many different perspectives in that curation process is something that's defining the future of analytics and how people use technical metadata. And what does it mean for the DevOps organization to get involved in draining that swamp? That means not only looking at the elements of data that are coming in from a technical perspective but then collaborating with the business to curate the value on top of that data. So in other words it's not just to help the user, the business analyst navigate but it's also to help the operational folks do a better job of curating once they find out who's using it, who's using the data and how. That's right, they kind of need to know how this data's going to be used in the organization. The volumes are so high that they couldn't possibly curate every bit and byte that is stored in the data lake. And so by looking at how different individuals in the organization and different groups are trying to access that data, that gives early signal to where should we be spending more time or less time in processing this data and helping the organization really get to their end goals of usage. Leigh, I want to ask you a question on your blog post that I just pointed out earlier. You guys quote a Gardner stat which says, which is pretty doom and gloom which said, 70% of Hadoop deployments in 2020 said will either fail or deliver their estimated cost savings of their predictive revenue. And then it says, that's a dim view but not shared by the Kylo community. How are you guys going to make the Kylo data lake software work well? What's your thoughts on that because I think people or that's the number one again question that I highlighted earlier is, okay, I don't want to swamp. So that's fear whether they get one or not. So they worry about data cleansing and all these things. So what's Kylo doing that's going to accelerate or lower that number of fails in the data lake world? Yeah, sure. So I mean, again, a lot of it's through experience of going out there and seeing what's done. A lot of people have been doing a lot of the different things within the data lake but when you go in there, there's certain things they're not doing and then when you're doing them, it's about doing them over consistently and can continually improving upon that. And that's what Kylo is. It's really a framework that we keep adding to it and as the community grows and other projects come in there can enhance it, we bring the value. But a lot of times we go in it's basically and users can't get to the data, either one because they're not allowed to because maybe it's not secured and rely to turn it over to them and let them drive with it. Or they don't know the data's there which goes back to the basic collecting the basic metadata and data about the data to know it's there to leverage it. So a lot of times it's going back and looking at and leveraging what we have to build that solid foundation. So IT and operations can feel like they can hand that over in a template format so business users could get to the data and start acting off of that. I just lost your mic there. But Stephanie, I got to ask you a question. So just on a point of clarification. So you guys are you're supporting Kylo? Is that the relationship or how does that work? So we were integrated with Kylo. So Kylo will ingest data into the lake, manage that data lake from a security perspective, giving folks permissions, enable some wrangling on that data. And what Elation is receiving them from Kylo is that technical metadata that's being created along that entire path. So you're certified with Kylo. So this is, I mean, how does that all work for customer standpoint? That's a very much integration partnership that we've been working together. All right, so from a customer standpoint, it's clean and you then provide the benefits on the other side. Correct. Yeah, absolutely. And we've been working with data lake implementations for some time since our founding really. And I think this is an extension of our philosophy that the data lakes are going to play an important role that are going to compliment databases and analytics tools, business intelligence tools in the analytics environment. And the open source is part of the future of how folks are building these environments. So we're excited to support the Kylo initiative. We've had a long standing relationship with Teradata as a partner. And so it's a great way to work with that. Thanks for coming on theCUBE. Really appreciate it and thank you. What do you think of the show, you guys, so far? What's the current vibe of the show? No, it's been good so far. I mean, it's one day into it, but very good vibe so far. Different topics and different things. AI, machine learning, you could be more happy with that machine learning trend. Great to see machine learning, taking a forefront, people really digging into the details around what it means, how they're going to play it. Definitely, thanks for coming on theCUBE. Really appreciate more CUBE coverage after this short break. Live from Silicon Valley, I'm John Furrier with George Gilbert. We'll be right back after this short break.