 Live from the San Jose Convention Center, extracting the signal from the noise, it's theCUBE, covering Hadoop Summit 2015, brought to you by headline sponsor Hortonworks, and by EMC, Pivotal, IBM, Pentaho, Teradata, Syncsort, and by Atunity. Now your hosts, John Furrier and George Gilbert. Okay, welcome back everyone. We are here live at Hadoop Summit 2015. This is theCUBE, SiliconANGLE's flagship program. We go out to the events and extract the signal from the noise. I'm John Furrier, with our new big data analyst, George Gilbert with Wikibon.com. We have two guests here, Justin Borgman, VP and GM of Teradata, formerly of HEDAPT, which was acquired by Teradata, CUBE alum, been on many times, good to see you, and we have JTang Lead Interactive Analytics of Infrastructure at Facebook, and we all know Facebook, we all use it, half the world pretty much does at this point. Welcome to theCUBE, guys. Glad to be here. So Teradata and Facebook, I mean, I just don't put that together. So tell us what's, I mean, Facebook, I think DevOps, Eaton Glass, spitting out nails, write their own code, building their own MySQL clusters. Like, what's going on? So this now, I mean, for Facebook's growing up, I mean, Mark Zuckerberg changed the slogan from move fast, break stuff, to move fast and make it reliable. So we know Facebook's evolved to, you know, reliability, so tell us a story. Sure, so I mean, really this came about as we were thinking about, you know, post acquisition of HEDAPT, how to have the biggest impact on the ecosystem, on the landscape, to really deliver value for customers that are looking to run SQL queries in Hadoop and so forth, and in that sort of, I guess, survey of the landscape, you know, we took a really close look at Presto and we're really impressed. I mean, it has, first of all, a very clean, modern code base. So, you know, to your point, the culture's probably changed over time and it's incredibly well written and that allows us to feel like we can actually have an impact there as we contribute to that project. But also, it's extremely fast, which is really important for people today, to do interactive SQL analytics. And finally, it allows you to query other data sources as well. So it's actually more than SQL on Hadoop. It allows you to query Cassandra, MySQL, Postgres and other connectors are being written just about every day. I mean, it's like smart data. I mean, it's like data that's interacted with other data source, not siloed. That's what you're getting at, right? Exactly. So, we saw this as a really exciting opportunity for Teradata to really put its weight behind this project and become a stronger member of the open source community. And, you know, we're excited to do this with them. I mean, open source for the big companies is not just a marketing thing anymore. I mean, it's so legitimized, and first of all, it's been for many decades, but now more than ever, you see Red Hat as a tier one player. I mean, they were rogue. Those guys were like tier two. They get into the enterprise now. They got 10 years support. Open source is standard. So we see EMC doing some big moves and actually doing so open source, not just mailing it in. So that's really congratulations on Teradata and that. So Facebook, you guys have all open source and you guys contributed a hell of a lot to open source. Absolutely. I mean, Facebook has a very strong engineering culture focused here on open source and that we share a lot of stuff. So when we start a project on Presto a couple of years ago, we know from the beginning that we would like to open source and then build a very strong community and get external contributors to work with the team at Facebook to make Presto the product of best. Talk about the project. What's going on in the project? You said the code's great. For the folks that might not know the project, explain quickly the project and how they get involved. What's it do? What's the value proposition? What's the community like? Okay. So we start to build a community around Presto when we open source it. And today you have companies like Netflix and Airbnb, Dropbox, these classic Silicon Valley companies adopt Presto, right? And you also have companies that running more traditional enterprises running Presto on top of Amazon Web Services. What is Presto? Presto. For the folks that might not know what it is. Sure. It is an open source distributed secret engine allow you to query data at a massive scale. We have able to successfully route Presto to thousands of users to be able to use the Presto as a tool to query data that sits in Hadoop today. Yeah. I mean you open source a big nugget of value. You guys also involve an open compute which we do here. So Facebook's very much cool on open source. You guys got a great, great to track record. What does that mean now for the folks out in the industry? Because now bringing to the enterprise is what, now you start, the rubber hits the road. Yeah. What's that all about? Yeah. Well, that's where we hope we can come in and contribute. A lot of the gaps that remain in the roadmap are areas where we have strength. So for example, expanding SQL coverage or improving performance, adding enterprise features like security and installer, ODBC drivers so you can connect BI tools. I mean these are things that Teraday has been doing for decades and also that we have a lot of experience with when we were at Hadoop. So we're really trying to fill in those gaps, make it enterprise accessible so it's not just the Silicon Valley elite taking advantage of this technology, but also... Like doing like yarn integration might be important for somebody, right? Exactly. So has that already done or you guys got that on the roadmap? So yeah, so we have a three phase roadmap. Phase one, which is actually available already and you can find it on teraday.com slash presto is really about an installer and some basic management and monitoring tools so you can operate this cluster and experience it firsthand as well as a bunch of documentation again to make it easier to use. Phase two, which is actually at the end of this year will include yarn integration so you can now run presto on a Hadoop cluster that perhaps is being used for multiple activities as well as integration with existing management and monitoring tools. So things like Mbari for example or Cloudera Manager where you're already managing your cluster, you want to see presto in there as well. And then phase three, which will be next year is really about adding ODBC drivers, security, making it more usable via BI tools which we think is really the holy grail. Why did you guys announce this? I mean, obviously you at Hadaap you've seen the sequel on Hadoop, you guys know that. You know, Teraday is a big company. You guys saw this as a great opportunity to one donate software, build a community around it. Where was the end shot in terms of the use cases? Is it to take goodness of the engine and retrofit it or just figure it out with the community or I mean, is there a plan behind it? Was there like a purpose behind the announcement? Yeah, there is. So Teraday has this view of the unified data architecture which we talk a lot about which is this idea of Teradata, Aster, Hadoop, abstracted even further, it's this idea that you're going to have multiple different data platforms within your enterprise. And for us, Hadoop is an increasingly important part of that view. And so, what we're doing here is we really want to have the biggest impact we can and we think that open source is really playing a major role particularly in that Hadoop part of that UDA. And that's the reasoning behind wanting to get behind it. Jay, I got to ask Facebook. You guys obviously do a lot of great stuff. What are you most excited about about Presto and what you guys have seen it do from an interactive standpoint? What's the key hot feature, PR capability? Sure, Presto we built at the very beginning as an engine to allow you to be able to interact with analytics against the massive data set you set on your Hadoop warehouse, right? Scale to petabytes of data. But how did you do that beyond what all the other MPP SQL engines were trying to do? So obviously there's a lot of secret sauce and we openly share with the world. There's a number of implemented technologies and techniques. So that's in the product, the secret sauce is in the product? The secret sauce in the product. Sure, you have to go through the code base. Wow. I don't expect you to do that. Good luck with that. Yeah, but that code is open source. Yes. So the secret sauce is in there, it's open. What does it mean now for all the other MPP SQL engines that are running on Hadoop? And there are many, and that's sort of the core value prop for a lot of companies. Yeah, so I would divide it first into maybe proprietary and open source. And I think on the proprietary side, the more traditional database systems, a lot of those are very mature and there's certainly workloads that you can't do on Presto that you really need a database system. We don't view this as Presto is going to replace Astro or the data warehousing appliance by any stretch of the imagination. If you need that kind of interactivity, that kind of performance or SQL maturity, that coverage. Are you compare, did you say Aster? Right, Aster, Teradir, what have you, or any other database system out there for that matter? I think purely within the context of Hadoop and some of these emerging open source technologies where the SQL engines as a whole are still less mature, far less mature than these sort of tried and true alternatives. That's really where we're trying to make contributions here for kind of a different segment of the market. People that are thinking about building Hadoop-centric infrastructures, that's where we want to be able to add value. So let's get back to the Facebook question on Presto. So describe how it evolved in Facebook and what you guys did with it. So we know that when initially we built Hive. Hive also is the open source project started from Facebook a couple years ago. And what we see is Hive is great as a job to do batch work. That means taking your raw data, refine this into something you can analyze. And with that set of data, it takes a very long time for Hive to provide the data. The analytics input inside that one analyst would need to know. So we built Presto to solve the interactive analyst problem. Presto gave the answer back much, much quicker. It allows you freed up the time to do something more useful, constructive for the organization. Did the work that, the test work without naming any names to make Hive more interactive? Did that, did it work really for smaller data sets or did it not really make it fast enough? And that's why you had a rethink Presto from the ground up. Well first of all, when we first started the project two years ago, test was not quite there yet. And then made the best product when the use cases are hence and the different organizations evaluate their workload and their special need and their adopt appropriate tool set to solve their problem. But beyond just compare Presto and to test, right? Presto has a very unique connected oriented architecture that allows you to query data in place. And I believe, you know, this is a huge value proposition to allow the enterprise. You no longer need to do your classic data warehousing trick to ship your data from your massive different data sources into a central location, normalize the schema and the query in the central warehouse. You are able to one, point the Presto query data that sits on the high warehouse in my sequel, Cassandra or even Kafka. Two, Presto give you the query federation capability allow you to pull data from different sources. That's a key feature. Enjoy them together in place. And why was that so difficult in the past that you can solve it and that you can solve it now? Okay, first of all, in a typical enterprise you have to spend a huge amount of effort and resource to your classic data integration work. We all know that can take a long time and tremendous amount of effort. And not also it reduce your time to insight. And typically you have to wait until midnight get your standard daily batch of started, pull your data from my sequel from different sources, consolidate it whether it's in the Hadoop cluster or a central data warehouse. So by definition, you're already 24 hours late to get your insight. Didn't others try to do federated query long before Presto where they would say to all the database engines that had the data, give me just what I need and then I'll make sense out of it. But what you were describing sounded to me like we're going to pipe all the data in via ETL which was one way of doing it but the Presto way was talk to all the engines that have the data. Yeah, so there are some unique use cases where you demand real time insights to what's going on and Presto at least gave you the option to do such things and that you have to really look at your environment and use case to see whether Presto is the right tool for you and we believe the architecture of Presto give you that option to do these sort of things that not possible via some other tools. So talk about guys where you see the project going what are your hopes for this project? What do you share with the folks out there that are watching or watch this video? How to get involved? What's going on? What's the vibe of the community? What's your hopes for the project? What do you see the outcome? I'll let Jay talk about how to get involved but certainly our hope for the project is to really make it the enterprise grade SQL solution in the open source community for Hadoop and for these other data platforms as well. And we feel like we can get it there. We feel like we can fill in some of those gaps Facebook's working very hard on filling in gaps as well and the project just continues to evolve and mature. And one of the things that I think also struck us was the fact that Presto is truly distribution agnostic. So I think a lot of customers today have some confusion about each distribution platform and which SQL engine to run with each one and each of the distribution vendors that's sort of gone down a particular path Presto runs on any of these. So if you're an application developer building a SQL based application you can now talk to one common interface and your code is portable. We're getting involved, the community. We put everything on GitHub and we also have a Presto user group you can ask questions and interact with the Presto team. If you want to contribute to Presto very simple, open a pull request, submit a patch and somebody from Presto team will work with you to refine your solution and get merged into the Presto. You know what I love about this is I love the fact that Facebook and these innovators and open source at large scale they're big data companies. They're data full as we say. And in this industry of Hortonworks it's all like I sell tools and I sell picks and shovels but the companies aren't living the data problem. They're not full of data. I mean Facebook, I mean so much data you guys process whether it's my sponsored post campaign or figuring out how to do great user experience. So that is now you see enterprises become full of data. We're hearing the growth of data so you guys are kind of like the pioneers of being full of data and dealing with it. So now you're open sourcing Presto. So you guys were at the pioneering the SQL and on Hadoop at ADAPT. So you're combining that. Is that where this projects for companies that are full of data? Some companies don't, I don't have a data problem. I mean I have data problems in terms of management but I'm not, there's no tsunami of data. So can you, do you see distinction between those kinds of companies? I think you just brought a great point that we dog food our own product. At Facebook, we do weekly releases almost and every time we do release we push to our high warehouses. So we eat our own dog food that's a problem you'll see right away. So from a lot of community users of Presto. You push into production. Yes. Yeah, it's not like you're pushing into a sandbox. No. You guys are. We go straight to production. So you, we're not so sure that the code you are running with has already been tested, stressed at Facebook scale. I think that's a great point. Two is that we look at the problems that Facebook user face when they use Presto. And we align with the roadmap with this sort of problems and you can show that we get immediate user feedback. Typical enterprise, you have the product managers talk to customer have very much longer release cycle. No one's jazzed, but sometimes the apps in the enterprise are quite boring. I mean they're not like Facebook. It's like, oh my God, some of my pictures are gone or the image has changed. You're always doing A-B testing to Facebook does. So that's another data issue, right? So again, this is like the consumers Asia of IT is now digital transformation. So this is the enterprise future. That's what you're saying, right? Yeah, absolutely. Justin, I wanted to ask about at the Hortonworks Analyst Day, they had a customer panel and there's the usual confusion, maybe not confusion, but the help us understand where the line blurs between the data warehouse, the traditional data warehouse, the MPP SQL engine that's, you know, good deal cheaper. One customer said it's really, what's in the data warehouse is doesn't really change. I know what I want that's in there. And what's in the data lake is always changing. Is that what customers should think about between Teradata and Aster and Presto? Yeah, I mean, I think that's a good way to think about it. The data lake has some flexibility to it. It can be used for discovery and manipulation of data. And that higher value data that really has strict SLA requirements, that's where the Teradata data warehousing appliance can really deliver on SLAs that the open source community can't today. So I think that's a good way to think about it. Guys, we got a wrap. I want to get you guys to get the final word in. I want to have a unique final question. Justin, you're an entrepreneur, but now you were going to the Big Cape on Facebook, you're pioneering DevOps, dataful, share a personal view of what the enterprises should be thinking about. As the mind share of the culture changed, which is a big part of this new enterprise movement, it's not just the technology, it's the mindset of people. You mentioned, you know, pushing code production, and it's like, whoa, we don't do that. So what's your advice to the enterprise market and enterprises in general around taking this kind of product to the next level? What Presto's all about, Hadoop and this ecosystem? How should they think, what's your advice? Justin, we'll start with you. My advice would be, you know, I think because these technologies are so new and things are changing so often, it's hard to do this, but I think to the extent that you can, try to take a long-term view on what you think is actually going to stand the test of time. And you know, as it pertains to SQL on Hadoop, for example, that's one of the reasons we're trying to get involved. Certainly Teradata has spent a lot of time in this general area of building SQL engines and we really want to invest in Presto to make it the enterprise-grade engine for Hadoop. So I think, you know, as you think about these things, as you think about what tools to pick, what, you know, systems to use for different jobs, you know, try to take a long-term view. Is this something that's going to stand the test of time or is it just popular right now today? And I think, you know, we're really living in a very exciting time, you know, open source and big data. If you look at the history of Hadoop, open source is one of the key reasons why Hadoop and its entire ecosystems are so successful and are getting, you see widespread enterprise adoption. And I see going forward, as you start to look at things like the consumerization of the enterprise and how they're adopting new technology, I think, you know, open source is going to be an indispensable consideration as you walk to march forward, be able to deal with the massive data of problems that every enterprise shop faces, right? Be able to open and adopting appropriate open source technology is going to be mission critical to your success. I think, you know, with, you know, the Presto sort of project coming from an internet company with a big enterprise shop, like Territia standing behind it, give you a unique blend of success factors for enterprise to adopt this sort of product. And we're very excited about it. That's awesome. J-Chang from Facebook, Justin Borgman from Terra Data formerly Hadaap on theCUBE. Great advice, you know, citing time, open source and big data, what imagine. Again, the software is great and it's all open, all done in the open. We're out in the open. This is theCUBE here live in Silicon Valley. We'll be right back after this short break.