 Live from New York, it's theCUBE. Covering Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, NVIDIA, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and Peter Burris. Welcome back to Big Data NYC, everybody. This is theCUBE, the worldwide leader in live tech coverage. This is day four, we're winding up four days of wall-to-wall coverage here at Big Data NYC. Stephanie McReynolds is here, she's the vice president of marketing at Elation, and she's joined by Mark Schaiman, who runs marketing at Teradata. Good to see you guys. Good to see you. Thanks. So what's going on this week? Lots, right? Lots, lots, busy week, for sure. So give us the high level. What's happened at the show, your partnership? Give us the bumper sticker. Yeah, absolutely. So, you know, we're super excited to be back at Strata Hadoop. It's always great to be part of this community year after year. We made a couple announcements this week. Probably most exciting is some extended support. We announced something called Elation Connect, and that allows us to connect to Presto and Spark SQL, some of these compute engines that analysts are really beginning to use for interactive queries on Hadoop. And Elation connects to those to catalog the raw data that those analysts have available to them. But more interestingly, to really catalog the query logic itself that's running through those engines. So we use that to make recommendations to analysts to get more analysts onboarded to Hadoop. And so that was our big announcement. We also announced a couple of weeks ago with Teradata some additional extended support for their unified data architecture. And we're doing some work with IBM too. So we announced a relationship by which we integrate with IBM Watson data works. So what's so cool about Presto? Explain why that's so... Well, Presto is a SQL on Hadoop engine and more actually. It was originally developed by Facebook. They first developed Hive, and then after a while they realized Hive just didn't have the performance for their users for interactive query capability against the big data sets they were analyzing. So they started from the ground up, built this engine called Presto, full in memory parallel SQL on Hadoop engine, and then they open sourced it. We at Teradata jumped on board about a year and a half ago, and we announced that we're actually, I'm going to be a major contributor to the code base and have been contributing to the code base, the open source code base, and are also offering enterprise support. So it's a great query engine because it's not only distribution agnostic. So it can run on, you know, cloud era can run on Horton, it can run on Amazon EMR, but it also can query not only data that exists in Hadoop, but also data that exists in MySQL, Postgres SQL, Cassandra, Kafka, MongoDB, et cetera. So it's a, you know, great engine for enabling, you know, business users to actually access all these data sources. It's like every week Facebook comes up with some new innovation, open source that you guys got to hop on it, right? So you have like a team of Navy SEALs that sort of spots the next big trend. We have some Navy SEALs. Yeah, we do, we do, we have some good guys who, and one thing was that, when we were looking at Presto, it really just fit in well with the whole unified data architecture strategy that Teradata's going after, you know, really being able to create a state of fabric that enables users to access data wherever it resides within their infrastructure. And that's what, you know, Presto just becomes a great enabler for that. So one of the things we're hearing this week, obviously Hadoop is not dead, right? No. But at the same time, this whole concept of this data lake, you know, emerged. When I first heard it, I was like, uh-oh, this is going to be ugly. And of course, you know, it wasn't necessarily pretty, but it got the data in the place that we wanted it, right? It extended the enterprise data warehouse, and then you guys come in and try to help solve that problem of bringing sort of quality to the data lake. So where are we with this whole, you know, EDW extension, you know, Hadoop as this cheaper storage mechanism? What's happening out there? Is it a data swamp or a data lake? Is that another question? Are you churning it? Is it Boston Harbor in the 70s? Yeah. That's right. Are we cleaning it up? That's right, exactly. Yeah, I mean, different organizations are at different phases of their adoption and figuring out what are their data management, really their governance strategies now for the data lake. And so I think what's exciting is that data lakes have now become a standard investment, right? And it's clear that for most organizations that they're going to have databases and they're going to have Hadoop and they're going to have this logical connection between those multiple sources to work with data. But what's challenging is more people come into the lake and you're not accessing it only through command line utilities with these expert coders is if you don't know what's in that lake and the quality of it, you can make some really poor business decisions. So, you know, from a Lations perspective, this notion of having a catalog that automatically interprets some of what's in the lake and gives guidance about what, I mean, schema on read is great if there's good data there. Schema on read can be really bad if there's really bad data and people don't realize it. So as we get more people actually running queries, accessing data in the lake, using data viz tools on top of Hadoop stores, we need something like a data catalog to let people find data easily, really understand at a deep level what's in there and be able to trust that data for the specific use case they're using. So connect this back to the teradata's unified data model. Unified data architecture. Unified data architecture, UDA. Yeah. And connect it back so that we are, we have been at the point where we wanted to ensure that the technical assets that were underlying the data lake were under control and we knew how much they were going to cost and how much so that they didn't go out of control on us and we knew it would. Now we're starting to bring governance to the data in the data lake. How does the relationship that you have start to bring governance across data, whatever it's different, whatever source it's in? How does that starting to work itself out? Well, it's interesting because I think governance is as much about the people as it is about the data, right? I mean, you can evaluate what's the quality of this data. But then governance is also about how do you actually use and apply that data? Well, governance, the least way I think about it, is if you have an asset and multiple claims to that asset and those multiple claims could come from people or users, perhaps other systems, that you have to then put in place a set of policies and rules for how you arbitrate those multiple claims. So absolutely always involves people. But there's a lot of data lakes that are popping up, often in response to particular silos of work that's being done. You obviously have Teradata, you have the data warehouse, you still have a lot of operational systems, you have data that you're buying. How does the process of building a catalog, are people using it to then get greater control over more than just the data lake? Yeah, so the catalog becomes a single point of reference across all of these sources of data. It kind of, one of our customers, eBay, I think has labeled it really well. Deb says at eBay, says that a data catalog is like an analyst GPS for data. Where do I go? What's the right source to go to that not only has that data stored in it, because now we're storing data in multiple places often. Raw data is in the data lake or in a HDFS file, aggregate data is in Teradata, and then I might have an extract that I take out of Teradata, put in Tableau so I can actually visualize the data. And so data is in multiple places and so you need this global positioning system, this GPS to tell the analyst, where exactly can you find the most accurate data for your need across all these systems and make it super simple for that user so they're not confused, they don't get hung up on where to go for the data. I mean today, a large majority of analyst time is spent hunting and pecking and navigating these different systems. Can you be a chief data officer and not have a catalog in place? That's an interesting question. I mean we actually, it's interesting because we actually had a panel yesterday where the city of San Diego was presenting and Maxine Pcherski is a chief data officer there. And he had started a data inventorying project which started very simply, he was just trying to figure out, he was promoted chief data officer, he was trying to figure out where are all the data sets that we have available to us in the city of San Diego. And he ended up getting so frustrated with trying to inventory all that data and how long it was taking him that they ended up selecting a nation to help automate that process. So I think if you look not only at our experience but what some of the other third party independent analysts are reporting out, data catalogs are becoming critical to chief data officers, to be able to address not only concerns about data governance which often roll into them, but also this idea of governance for insight or promoting additional access to the entire organization so it can really become data or insights driven. You can't do that if people don't know where to find the data and more importantly they maybe don't trust the data where it slides. Follow on question, if I'm a CIO can I use elation and avoid a chief data officer? Well I don't think I would recommend avoiding a chief data officer. I mean I think chief data officers often represent the business and bring a business perspective to the table where the CIO has traditionally been more concerned with speeds and feeds and honing the system. Which is wrong, which is not what they should be focusing on. So, you know, look. It's not wrong that that's what they're doing. It's what they are focusing on. It's just that's what they're doing. Yeah. No, no, no, no, no, no, no, no, no. You're right, you're right, you're right. They are doing that but they should not just narrowly be focused on that. And in the process of any significant turbulence you have specialization, it's clear that we need to think more about data. But eventually, both of these parties are going to end up focusing on data, at least at some level. And so I'm wondering if the question is not Ken, you know, the CIO versus CDO. Right. But if I'm a CIO. Sure, I know not. No, no, no, no. That could be exciting though. I mean, you know. It's an interesting topic. If I'm a CIO, and I get access to elation, a product that uses big data techniques to do big data, can I surface information about my data at a rate that I can then deliver to the organization a level of insight regarding business value from data that the organization normally hires a CDO to pull together? Well, I think there is a realization in the market that we're moving towards more of a self-service world. You know, it started with self-service data viz. Then it went to self-service data prep. And now folks are telling us that they see data cataloging as a way to really cement self-service in the organization. You know, I think that's an important movement. And it's something that we've been trying to do in the industry for a very long time. I started my career at Business Objects. And way back when, we were talking about self-service data access. Well, I'll tell you a story about Business Objects later. You probably know a bunch of stories about Business Objects too. I'm sure there are a lot of stories out there about all the BI vendors and their genesis. But I think that this transition is a really healthy one. And it's going to take a while for organizations to figure out how to navigate through this self-service world. And I can't project for you what happens like 20 years down the road with CDOs and NCIOs. But I think that we're having this conversation, and we're putting a C-level title that on the responsibility of making sure that everyone in the organization has access to data at appropriate levels is really a great move forward in a lot of organizations. Mark, what's going on with Teradata in big data, right? So when Hadoop came, I was like, OK, great. We're going to suck all the value out of the Enterprise Data Warehouse. And we're going to lower cost. And then when you talk to customers and say, what's the most important part of your big data initiative? They say, they're the Enterprise Data Warehouse. Exactly, exactly. But at the same time, you've made some moves. You've made some acquisitions. You've driven into this space. You've got the hardware software. Yeah, exactly. So you have the cloud offering now. So give us the update on your strategy and the kind of projects you've made. Big data. I mean, Hadoop and big data are all synergistic to the data warehouse. I mean, in the beginning when Hadoop came out, everybody said, oh, Hadoop's going to replace the data warehouse. And the reality is everybody learned that that just didn't happen. And what we found is that there's really a synergistic component between having your Enterprise Data Warehouse and that high value data within your Enterprise Data Warehouse and then having your Hadoop environment and having your data lake. And part of what Teradata is really doing is that we realize that multiple platforms are going to exist within companies' infrastructures. And what we're trying to do is create the components that enable those users to be easily access that data and tie them together. And Elation as a partner really comes down to that data cataloging and that governance component that's needed over that whole infrastructure. Because we've moved away from this silo. I have a silo data warehouse. And I have a silo specific data lake or Hadoop environment where you have users who are actually in your data warehouse directly querying and accessing data that exists in your Hadoop environment. Or vice versa. Or ingestion of that data is all going around in real time. So you need not only that infrastructure that ties all those pieces together, but Elation has that governance and cataloging component that brings it to the table. So that's where Teradata is really going with big data. It is a key component of the overall direction and strategy of the company. OK, and the partnership, how did it start? How is it evolving? Where do you want to see it go? Yeah, I mean, the genesis of the partnership is really interesting because it started with Oliver Ratsenberger, who had been at eBay for a long time and joined Teradata. And at eBay, he had built something called the Data Hub, which was essentially a data catalog that eBay had built from the ground up with their engineering resources. It's a well-known use case for it. Yeah. Yeah, and so Oliver, when he saw Elation, he really understood what we were dealing. I could have avoided what we were dealing. Exactly. He used software. Manual work. And eBay was one of our very first customers. So by the time that we were talking to Oliver, we had already had many hundreds of analysts using Elation. And so he got right away. Wow, this is a way that is a more collaborative approach to cataloging than the Data Hub was and really gets the sense of how do we enable the users of the data to socially curate the data and get them more engaged with an understanding of what's there and sharing their knowledge with others. And so Oliver has this vision, a Teradata of the sentient enterprise. And we mapped right into this stage in that journey to a sentient enterprise, which was all about collaborative ideation. How do you really get value from analytics? It's a very creative process. It takes a lot of hypothesis testing. It's an ideation design process. And a very collaborative process. And a very collaborative process. It requires a lot of different perspectives. And so in the old days of very traditional data warehouse thinking, where it was not an agile model and it was a waterfall model, you had to conduct interviews with all of the business owners to determine what is my one model for customer. I'm going to instantiate in this data warehouse. Now we've moved to this world where, hey, let all the definitions of customer flourish in this data lake. That's great. But we need to collaborate around when to use what definition and how to be more sensitive about how we prepare that data so that it's tested and communicated. Yeah, exactly. All right, we're out of time, Mark. I'll give you the last word on the week. I've just been a great week. Meeting with customers, meeting with our partners and interacting with them. And at Teradata as a company, we've gone through a huge transformation. I just think that positive things ahead I think is going to be great. Well, thanks for coming on theCUBE and sharing your story. Thank you. You're welcome. All right, keep it right there, everybody. We'll be back to wrap Big Data NYC right after this short break.