 Live from Midtown Manhattan, it's theCUBE, covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. Okay, welcome back everyone. We're here live in New York City in Manhattan for Big Data NYCR event. We've done five years in conjunction with Strata Data, which is formerly Strata Hadoop, which was formerly Strata Conference, formerly Hadoop World. We've been covering the Big Data space going down 10 years now. This is theCUBE, I'm here with Aaron Kalb, who's the head of product and co-founder of Elation. Welcome to theCUBE. Thank you so much for having me. Great to have you on. So, co-founder, head of product, love these conversations because you also founder, so it's a great co-founder. So it's your company, you get a lot of equity, interested in that, but also head of product. You have to have the 20 mile stare on what the future looks, while all inventing it today, bringing it to market. So you guys have an interesting take on the collaboration of data. Talk about what the means, what's the motivation behind that positioning, what's the core thesis around Elation? Totally, so the thing we've observed is a lot of people working in the data space are concerned about the data itself. How can we make it cheaper to store, faster to process, and we're really concerned with the human side of it. Our data's only valuable if it's used by people. How do we help people to find the data, understand the data, trust in the data, and that involves a mix of algorithmic approaches and also human collaboration, both human to human and human to computer to get that all organized. You have a symbolic background, some systems from Stanford, worked at Apple, involved in Siri, all this kind of futuristic stuff. You can't go without a date with me about, Alexa's going to have voice activity, you got Siri, AI is taking a really big part of this obviously, although the hype right now, but what it means is the software is going to play a key role as an interface. And the symbolic system almost brings on this neural network kind of vibe where objects data plays a critical role. Oh, absolutely, yeah, and in the early days when we were co-founding the company, we talked about what is Siri for the enterprise, right? I was very excited to work on Siri, and it's really kind of a fun gimmick, and it's really useful when you're in the car, your hands are covered in cookie dough, but if you could answer questions like, what was revenue last quarter in the UK, and get the right answer fast, and have that dialogue to, oh, do you mean fiscal quarter or calendar quarter? Do you mean UK, including Ireland or whatever it is, that would really enable better decisions and a better. And I always worry that Siri might do something here. Hey, Siri, there it is, okay, be careful when I want to answer, take over my job. Automation will take away the job, maybe Siri will be doing interviews. Okay, let's take a step back, okay, so you guys are doing well, you started out, you got some great funding, great investors. How are you guys doing on the product? Give us a quick highlight on where you guys are. Obviously this is big data NYC, a lot going on. It's Manhattan, you got financial services, big industry here, you got the Strata data event, which is the classic Hadoop industry that's morphed into data, which really is overlapping with cloud, IoT, application developments, all kind of coming together. How do you guys fit into that world? Yeah, absolutely, so the idea of the data lake is kind of interesting. Psychologically, it's sort of a hoarder mentality, right, it's like, oh, everything I ever had, I want to keep in the attic because I might need it one day. Great opportunities, you have all these new streams of data, you know, with IoT and whatnot, but just because you can get to it physically, doesn't mean it's easy to find the thing you want, you know, the needle in all of that big haystack and to distinguish among all the different assets that are available, which is the one that's actually trustworthy for your need. So we find that all these trends make the need for a catalog to kind of organize that information and get what you want, all the more valuable. This has come up a lot, I want to get into the integration piece and how you're dealing with your partnerships, but the data lake integration has been huge and happening, the catalog has come up with, it's been the buzz, and foundationally people are saying catalogs are important. Why is it important to do the catalog work upfront with a lot of the data strategies? It's a great question. So we see data cataloging as sort of step zero, right? Before you can prep the data in a tool, like Trifactor, Paxata, or Kylo, before you can visualize it in a tool like Tableau or MicroStrategy, before you can do some sort of cool, you know, a prediction of what's going to happen in the future with the data science engine. Before any of that, these are all garbage and garbage-out processes. So step zero is find the relevant data, understand as you can get it in the right format, you know, trust it, that it's good, and then you can do whatever comes next. And governance has become a key thing. He heard of the regulations, GDPR outside the United States, but also that's going to have an arms length reach over into the United States impact. So these little decisions, and there's going to be an Equifax someday out there, another one's probably going to come around the corner. How does policy injection change the catalog equation? Because a lot of people are building machine learning algorithms on top of catalogs, and they're worried that they might have to rewrite everything. So how do you balance the trade up between good catalog design and flexibility on the algorithm side? Totally, yes, it's a complicated thing with governance and consumption, right? It's people who are concerned with keeping the data safe and people concerned with turning that data into real value, and these can seem to be at odds. What we find is actually a catalog as a foundation for both, and they're not as opposed as they seem, right? So the relation fundamentally does is we make a map of where the data is, who's using what data, when, how, right? And that can actually be helpful if your goal is to say, let's follow in the footsteps of the best analysts to make more insights generated, or if you want to say, hey, this data is being used a lot, let's make sure it's being used correctly. So you don't get in trouble. And by the right people. And by the right people. Exactly. You could have eager facts, they were fishing that pond drive months, months before it actually happened. With good tools like this, they might have seen this, right? Am I getting it right? That's exactly right. Yeah, how can you observe what's going on to make sure that it's compliant, and that the answers are correct, and it's happening quickly and driving results? So in a way, you're taking the collective intelligence of the user behavior and using that into understanding what to do with the data modeling? That's exactly right. We want to make each person in your organization as knowledgeable basically, as all of their peers combined, right? So the benefit then for the customer would be, if you see something that's developing, you can double down on it, and if the users are using a lot of data, then you can provision more technology, more software. Absolutely, absolutely. You know, it's sort of like when I was going to Stanford, there was a place where the grasp was all dead, people were riding their bikes diagonally across it, and then somebody, you know, smart was like, we're going to put a real gravel path there. The infrastructure should follow the usage, instead of being something that you try to enforce on people. It might be a classic design meme that goes around. Good design is here, but the more effective design is the path. Exactly. So let's get into the integration. So one of the hot topics here this year, obviously besides cloud and AI, which I, cloud really being more of the driver of the tailwind from growth, AI being more of the futuristic kind of headroom, is integration. You guys have some partnerships that you announced in integration. What are some of the key ones, and why are they important? Absolutely, so, you know, there've been attempts in the past to centralize all the data in one place. Have one warehouse or one lake, have one BI tool, and those generally failed, right? Because for different reasons, different teams pick different, you know, stacks that work for them. What we think is important is that we have a single source of reference, one hub with spokes out to all those different points. And everything about it, it's kind of like Google, right? It's one index of the whole web, even though the web is distributed all over the place. So to make that happen, it's very important that we have partnerships to get data in from various sources. We have partnerships with, you know, database vendors, with Clare and Hortonworks, with different BI tools. What's new that we've announced recently are a few things. So one is with Clare Navigator, they have great technical metadata around security and lineage over HDFS, and that's a way to kind of bolster our catalog to go even deeper into what's happening in the files before things get surfaced and high for places where we have a deeper offering today. It's almost a connector to them in a way. You got to share data. That's exactly right. We have a lot of different connectors and this is one new one that we have. Another, go ahead. I was going to go ahead and continue. I was going to say, another place that's exciting is data prep tools. So Trifacta and Paxata are both places where you can kind of, again, find and understand an elation and then begin to manipulate in those tools. We announced with Paxata yesterday the ability to click to profile. So if you want to actually see what's in some rock-impressed Avro file, you can see that in one click. It's interesting, Paxata has really been almost lapping the Trifacta because they were the leader in my mind, but now you got like a NASCAR race going on between the two firms because data wrangling is a huge issue. Data prep is where everyone's stuck right now. They just want to do the data science, so it's interesting. They're both amazing companies and we're happy to partner with both. And actually Trifacta and Analyze have a lot of joint customers we're starting to work with as well. I think what's interesting is with data prep and this is beginning to happen with analyst definitions of that field. It isn't just preparing the data to be used, getting it cleaned and shaped. It's also preparing the humans to use the data, giving them the confidence, the tools, the knowledge to know how to manipulate it. That's great progress. So the question I wanted to ask is, okay, now the other big trend here is, I mean it's kind of a subtext in this show. It's not really front and center, but we've been seeing it kind of emerge as a concept as something we see in the cloud world. On-premise versus cloud. On-premise a lot of people bringing the DevOps model in and saying, okay, I may move to the cloud for bursting and some native applications, but at the end of the day, there's a lot of work going on on-premise. A lot of companies are kind of cleaning house, retooling, replatforming, whatever you want to do, resetting. They kind of get in their house in order to do on-prem cloud ops, meaning a business model of cloud operations on-site. So a lot of people are doing that. That'll impact the store. It's going to impact some of the server modeling. That's a hot trend. How do you guys deal with the on-premise cloud dynamic? Totally. So we want to just do what's right for the customer. So we deploy both on-prem and in the cloud, and then from wherever the elation server is, it will point to usually a mix of sources, some that are in the cloud, like Redshift or S3 often with Amazon today, and also sources that are on-prem. I do think I'm seeing a trend more and more toward the cloud, and we have people who are migrating from HTFS to S3 is one thing we hear a lot about at Strata with sort of a duped interest, but I think what's happening is people are realizing as each Equifax in turn happens that this old Wild West model of, oh, you surround your bank with people on horseback and it's physically in one place, with data doesn't like that, and most people are saying I'd rather have the A-plus teams at Salesforce or Amazon or Google be responsible for my security than people I can get over in the West. And the pecs out of guys have to love the term data democracy because that's really a democratization, making the data free, but also having the governance thing. Absolutely. So talk about the data lake governance because I've never loved the term data lake. I think it's more of a data ocean, I see data lake, data lake, data lake. Are they just silos of data lakes happening now that people are now trying to connect them? That's key. So that's been a key trend here. How do you handle the governance across multiple data lakes? That's right. So the key is to have that single source of reference, right to the regardless of which lake or warehouse or a little siloed SQL server somewhere. That you can search in a single portal and find that thing no matter where it is. Can you guys do that? We can do that. I think the metaphor for folks who haven't seen it really is Google. If you think about it, you don't even know what physical server a web page is hosted from. The data lake should just be invisible. Exactly. So you're interfacing with the multiple data lakes. That's a value property. That's right. It could be on-prem, in the cloud, multi-cloud. Can you share an example of a customer that uses that and how it's laid out? Oh, absolutely. So one great example of an interesting data environment is eBay. They have the biggest teradata warehouse in the world. They also have, I believe, two huge data lakes. They have Hive on top of that and Presto is used to virtualize across a mixture of teradata and Hive and then direct Presto queries. And so it just gets very complicated. And they have a very data-driven organization. So they have people who are product owners, who are in jobs where data isn't in their job title and they know how to look at Excel and look at numbers and make choices, but they aren't real data people. Elation provides that accessibility so they can understand it. Yeah, we used to call Hadoop World the car show for the data world where for a long time it was about the engine, what was doing what, and then it became, what's the car now? How's it drive, right? So you're seeing that same evolution now where, okay, all that stuff has to get done under the hood. Exactly. But there's still people who care about that, right? They're the mechanics, they're the plumbers, they're whatever you want to call them. But then the data science of the guy is really driving things and now end users potentially. And even applications, bots or whatnot. So it seems to evolve. That's where we're kind of seeing the show change a little bit. And that's kind of where you see some of the AI things. So I want to get your thoughts on how you or you guys are using AI. How you see AI, if it's AI at all, there's just machine learning as a baby step into AI. Because we all know what AI could be but it's really not AI, it's just machine learning. But now, how do you guys use quote AI and how does it evolve? It's a really insightful question and great metaphor that I love, right? Because if you think about it, it used to be how do you build a car and now I can drive the car even though I couldn't build it or even fix it. And soon I wouldn't even have to drive the car. The car will just drive me. All I have to know is where I want to go. And that's sort of the progression that we see as well. I think there's a lot of talk about deep learning and all these different approaches and that's super interesting and exciting. But I think more interesting than the algorithms are the applications. And so for us, it's like today, how do we get that turn-by-turn directions where we say turn left at the light if you want to get there. And then eventually, maybe the computer can do it. Do it for you, right? The thing that's also interesting is to make these algorithms work, no matter how good your algorithm is. All based on the quality of your training data. Which is a historical data. Historical data, in essence, the more historical data you have, you need that to train the data. Exactly right. And we call this behavior I.O. How do we look at all the prior human behavior to drive better behavior in the future? Right? And I think the key for us is, we don't want to have a bunch of... You can actually get that URL, behavioral I.O. Oh, we should do it before it's too late. We're going to have to find it right now. Don't register that, Patrick. Yeah, so the goal is... We don't want to have a bunch of underpaid interns trying to manually tag things. It's error prone that's slow. I look at things like Luis Fanon over at CMU. He does a thing where, as you're writing in a capture to get an email account, you're also helping Google recognize the hard to read address or, you know, pre-summed textbook books. I mean, if you shoot the arrow forward, you just take this kind of forward, you almost think, okay, augmented reality is a pretext to what we might see for what you're talking about. And ultimately, VR, you're seeing some of the use cases for virtual reality be very enterprise oriented or even like a consumer. I mean, Tom Brady, best quarterback of all time, he uses virtual reality to play the offense virtually and before every game. He's a power user. In pharma, you see him using virtual reality to do data models without being in the lab. So lab time. So you're seeing augmentation. Totally. Coming in to this turn-by-turn direction analogy. That's exactly right. I think it's the other half of it. So we use AI. We use techniques to get great data from people that don't have to do extra work watching their behavior to learn what's right. And I had to figure out those recommendations. But then you serve those recommendations. You know, if it's Google Glass, like it appears right there in your field of view. So we chose to figure out how do we make sure that the moment of you're making a dashboard or you're making a choice that you have that information right on hand. So since you're a technical geek and a lot of folks love to talk about this, I'll ask you a tough question because it's something that everyone's trying to chase for the holy grail. How do you get the right piece of data at the right place at the right time given that you have all these legacy silos? Latency is a network issue as well. So you got a data warehouse. You have stuff in cold storage. And I got an app and I'm doing something. There could be any points of data in the world that could be in milliseconds, potentially on my phone or in my device, my internet of thing, wearable. How do you make that happen? Because that's the struggle. At the same time, keep all the compliance and all the overhead involved. Is it more compute? Is it an architectural challenge? How do you view that? Because this is the big challenge of our time. Yeah, again, I actually think it's the human challenge more than the technology challenge. So it is true that there is data all over the place, kind of gathering dust. But if you're gonna think about Google, billions of web pages, I only care about the one I'm about to use. So for us, it's really about being in that moment of writing a query, building a chart. How do we say in that moment, hey, you're using an out-of-date definition of profit, or hey, the database you chose to use, the one thing you chose out of the millions, that is actually broken and stale. And we have interventions through that with our partners and through our own first-party apps that actually can change how decisions get made. So to make that happen, if I had imagined it, you'd have to need access to the data and then write software that's contextually aware to then run compute in context of the user interaction. It's exactly right. Back to the term-return directions concept. You have to know both where you're trying to go and where you are. And so for us, that can be the form of I'm writing a SQL statement. After join, we can suggest the table most commonly joined with that, but also overlay onto that the fact that the most commonly joined table was deprecated by a data steward or data curator. And so that's a moment we can change the behavior from bad to good. So a chief data officer out there, and we've got to wrap up, I want to ask one final question, is a chief data officer out there, they might be empowered or they might be just a CFO assistant that's managing the compliance. Either way, someone's going to be empowered in an organization to drive data science and data value forward, because there's so much proof that data science works. From military to play, you're seeing examples where being data driven actually has benefits. So everyone's trying to get there. How do you explain the vision of elation to that prospect? Because they have so much to select from. There's so much noise. It's like, we call it the tool shed out there. There's like a zillion tools out there. There's a zillion platforms. Some tools are trying to turn into something else, a hammer, trying to be a lawn mower. So they've got to be careful on who they select. So what's the vision of elation to that chief data officer or that person in charge of analytics to scale operational analytics? Absolutely. So we say to the CDO, we have a shared vision for this place where your company is making decisions based on data instead of based on God, or expensive consultants months too late. And the way we get there, the reason elation adds value is we're sort of the last tool you have to buy. Because with this lake mentality, it's like you've got your tool shed with all the tools. You've got your library with all the books. But they're just in a pile on the floor, right? And if you had a tool that had everything organized, so you just said, hey, robot, I need a hammer and this size nail and this textbook on this set of information. And they could just come to you and it would be correct and it would be quick. Then you can actually get value out of all the expense you've already put in this infrastructure. And it's especially true on the lake. And it also, tools describe the way the work's done. So in that model, tools can be in the tool shed. No one needs to know what's in there. You guys can help scale that. Well, congratulations and just how far along are you guys in terms of number of employees? How many customers do you have? If you can share that, I know if that's confidential or what not. Absolutely. So we're small but growing very fast, kind of the double in the next year. And in terms of customers, we have 85 customers including some really big names. I mentioned eBay, Pfizer, Safeway Albertsons, Tesco, Myers and so on. And what are they saying to you guys? What are they saying? Why are they happy? You know, so they share that same vision of a more data-driven enterprise where humans are empowered to find, understand and trust data to make more informed choices for the business. And that's why they come and come back. And that's the product roadmap ethos for you guys that's the guiding principle? Yeah, yeah. The ultimate goal is to empower humans, right? With information. All right, Aaron. Thanks for coming with me. Aaron Kalb, co-founder, head of products for Elation. Here in New York City for big data, NYC and also strata data. I'm John Furrier. Thanks for watching. We'll see you more after the short break. Thanks.