 Ladies and gentlemen, please welcome back Hortonworks president, Herb Kuditz. Morning. Morning. Well, I will say, looking across this room today, it does not take big data to be able to correlate the after effects of a party. Because this is a slightly smaller group and I know we're going to have people trickling in. That's usually what happens on day three, is people trickle in as they're sleeping off the party in a good time last night. So everyone have a good time last night? All right, San Pedro Square is a fun thing and we're, you know, if anything, I apologize if there were any lines and everything for people to go through, because we're pretty much getting the capacity on filling that square and being able to do things there. But it's still a great opportunity to get everyone together. So thanks for joining us today. We've had a lot of excitement over the last two days. 4,000 of you talking about Hadoop, walking through a lot of their presentations, the abstracts. We've had a lot of great comments on the different abstracts and things that are happening. And this is day three. So this is actually my favorite day from a keynote perspective. We've got some really interesting discussions that we're going to go through today. So we'll go through the typical keynotes and the breaks and then back to all of the abstracts and everything today. But what makes today interesting is we spend a lot of time on the customer side today. So we've got a customer panel that we've got five different companies who'll be out here talking through the customer panel and walking through their view of Hadoop, where it fits in their enterprise, what they're doing, where they are on their journey, etc. And it's always really insightful in terms of what's happening. The other one is we will actually close the keynote part of the conference with Jeffrey Moore coming and talking about and lending his wisdom on how he sees the market, how does this compare to other markets that he's seen, and what's his vision of how this plays out. So overall, great discussion today. But first, what I really want to do is get started. We're going to kick right off. So today I'm going to, well, do you guys like my slide first of all? I made that myself, made the icons. And it was pretty hard because I'm an engineer. So I'm going to tell you about Flurry and how we use Hadoop. And so really, Flurry is at the crossroads of big data in mobile, and we have a lot of data. So today I'm going to tell you a little bit of a story in terms of a little while. Again, mobile is just growing like wildfire, right? So if you look on the left, you'll see what's happening. Facebook Messenger, they're starting to become bigger than the telcos themselves. They sit on top of the telcos in the so-called OTT or over-the-top apps, much, much larger than AT&T on the right. Now in terms of the amount of data that we handle every day, again, it's really large. We see about, we've seen over the course of time, 1.8 billion devices. Every day we collect hundreds of billions of events. We process a million events per second. And we see about 8 billion sessions a day, where a session is where you start an app and then you background it. That's one session. Now we use HBase in a very sophisticated and extreme way, is what I would say. We track hundreds of billions of events per day, as I just mentioned. One of the tables that we have has 1.5 trillion rows. So I think it's fair to say we're abusing HBase. And we've taken it to the limit, I think, and that's one of the points in my talk. We have about five petabytes in storage across multiple colos, where we use HBase. And we have about 4,500 region servers in production. So the story of HBase in Flurry goes back to 2008, well before my time at Flurry, where we had a fork in the road and we were trying to decide whether we would use MySQL or HBase. Thankfully, we went with HBase, so we sharded MySQL, that would have been a bit of a nightmare. We started with 18.1 on three nodes, as I mentioned on the previous slide. Now we're at 98.13 with some custom in-house patches that help us get to the scale that I've been describing in the talk. We run HBase in multiple data centers, three clusters of HBase in each of the data centers. We run bidirectional replication across both of the data centers. And most of our storage is actually in HBase, not on HDFS itself. So we don't actually build large warehouses that just sit in HDFS and we don't put hive on it necessarily. We're almost purely an HBase shop at this point. In terms of our physical architecture, this is how it lays out. Again, I talked about the fact that we have three clusters, 1,400, 860 nodes. This is the specs on the machines we use, 128 gigs of RAM, 24 cores, 4 terabyte spindles, 10 gigi. Lots and lots of data. You can see the numbers here, 1.2 petabytes in the 1,400 node cluster. So lots and lots of scale. But as I've been hinting, we're hinting a bit of a limit with HBase. Flurry's in no danger at all of falling over anything like that, so don't worry about that. But we feel like we've taken it to the limit. I keep aggregate aggregates inside of Druid all on the fly. We've also added Spark to our portfolio. That's what powers what we call Explorer. I'll show you guys what that looks like in just a second. So a little bit more about Druid. There was a talk on Tuesday by two people in my team, Eric Cheddar and Lee Rhodes, where they talked about how to use sketches with Druid. But if you guys missed that talk, Druid is a real-time, OLAP-oriented distributed query engine. It provides real-time analytics on really, really large data sets, filters and aggregates data on denormalized fact tables. So that's how we build our data models in Druid. We just denormalize everything and then dump it into Druid. So no joins at all. It indexes the data, stores it as columnar compressed data. We see sub-second latency on really, really large tables. So some of our tables have 5 billion, 10 billion rows. Again, I mentioned the fact that we can do 200 billion records inside of Druid as well. It was originally built by Mender Markets. Yahoo helped push that into the open source community under Apache 2.0 recently, probably within the last two months, I'd say. And again, if you want to see how we use Druid at a lower level, you can go back and see the video from a talk done by Eric Cheddar and Lee Rhodes on Tuesday. So this is the Flurry Explorer UI. This actually runs Spark underneath the hood. Explorer is slightly different from the Flurry UI in the sense that you can ask any question on the fly. You declare your metrics, you declare your dimensions, you declare your filters. It runs real-time queries against the data, slightly different from Druid. But this is really where people come to do their data workbench work and do ad hoc analysis and kind of probe around and play with the raw fact data. But one thing I do want to point out to you guys is I still feel like we're missing what I would call the big data holy grail. And to me, what I want to be able to do, and I've kind of jumped through this since I was a little kid, is I want to take all the data that I have and dump it into some black box. And I don't want to have to cook the data or pre-compute the data, and I want to be able to ask any question that I want using SQL. And we're still not quite there. I think Druid comes pretty, pretty close to this, but we're not quite there. There's similar tools such as Impala, but it can't really handle the scale that we deal with at Yahoo. So I think that this is an important piece that one of you guys in their crowd should develop and it will absolutely use it. So I think that the takeaways that I want you guys to understand is I still feel like the Hadoop architecture is evolving. We've evolved Flurry over time to match that evolution of Flurry. It's still, to me, a data toolbox. What you should do is be taking the tools out of the toolbox to assemble the architecture that you want. We've been able to take those tools and reduce the latency and increase the capabilities within Flurry over time by taking this approach. And the other thing is I'd really like to thank the Flurry engineering team, especially Dave Leith and me and Friedman and Rahul Gudwani, because these are the guys that helped shape the architecture of Flurry. And I'd be remiss to not mention the fact that we're hiring. So please email me, timtiyahoo-inc.com. I hope you guys have an enjoyable rest of the conference and thank you very much. Thank you, Tim. Thanks. Thank you very much. So it's interesting to hear how Yahoo thinks of Hadoop and as they think about it, because they're well into their journey with Hadoop, right? Well into that version of how can they leverage it across the enterprise? What are the opportunities of what they can do? And how does it help them in their business? So now I have the pleasure of introducing Ron Bodkin. So Ron, new Ron is founder and CEO of Think Big, a systems integrator and consulting company. Right? It's specialized. Ron and Dupin have done a lot of work. We've done a lot of projects together. And now it's part of Teradata, right? Right in their division. He's here to share his insights from both the systems integration perspective as well as from Teradata on the journey of Hadoop. Here you go, Ron. Thank you. Well, it's great to be here. I'm very excited to be in front of the audience here today. I started working with Hadoop over eight years ago at a startup called Quantcast that does both audience measurement and now advertising, building large scale lookalikes. So out of my experience using Hadoop and leading teams that were doing the engineering and data science with the technology and integrating it into an ecosystem of tools, I saw a huge opportunity to start a pure play services firm consulting company focused exclusively on Hadoop and helping enterprises succeed. Five years later, we're still going strong. We were acquired by Teradata nine months ago and very excited by the investment and support we're getting from that. And, you know, I want to share with you what we're seeing as we see enterprises getting serious about that journey to put Hadoop in production, to put it at scale, to make it a kind of a critical asset. And more broadly, I want to talk to you about all the exciting things Teradata is doing and the portfolio of products to really help enterprises use Hadoop. You know, on our journey, we'll talk a little bit about the roadmap. How do you get started? We'll talk about building great data lakes so you can manage and trust the data that's in Hadoop. We'll talk about the analytics. Once you've got your data there, how can you look at the data? What are the right patterns and practices? And we'll talk a little bit about some examples. So with that, let me talk about an important announcement that Teradata made earlier this week around Presto. For those who don't know Presto, Presto is a 100% open source SQL engine on Hadoop. It was created at Facebook, the same people who created Hive, because they recognized there was a need for a modern architecture, a Java based system for querying data in Hadoop that was well architected. You could plug in Java UDFs and it's running at massive scale already at Facebook. It has been for years. There are thousands of users processing petabytes of data every day in Facebook on a multi-hundred petabyte cluster. So that's pretty important scalability. I think there's an interesting parallel in the path of Presto as a user-developed open source project that's now getting traction in other scale companies, Netflix, Airbnb have been making contributions. You've got Dropbox and Groupon as important users. So it's working well in these massive scale, just like Kafka came out of LinkedIn at massive scale and started to get adoption among web scale and is now becoming a pervasive technology in the community. So we think it's a foundation for the future of SQL and Hadoop, that it's open, it performs well, and that it has the ability to work across a variety of distributions of Hadoop. So we think that it's a fantastic foundation and with Territid's announcement of enterprise support, it means you can now get a distribution of Presto and you've got a team of experts, about sixteen people, that Territid is dedicating to doing open source development on Presto to make it work well in the enterprise. You can see the roadmap here immediately with the announcement of a distribution that's now available. You've got some fit and finish installation, et cetera. By the end of the year, you'll have Umbari and yarn integration so it fits well in the ecosystem and we'll start to see meaningful progress on good SQL coverage in Presto. And it is a big deal because what we've seen it think big in working with customers is when they've tried out first generation SQL and Hadoop engines, they're prototype quality. You can throw some queries at them and they work. You so throw some queries at them and they blow up. You so throw some queries at them and they don't return. That's not the recipe for a useful analytics capability. So we think Presto is the foundation of what's going to ultimately be successful in the enterprise. And into phase three next year, the team is going to work on robust ODBC and BI integration security and connectors, but we think it's a major advancement to have an engine for SQL in Hadoop from a company, Teradata, that has so much experience in building enterprise-grade SQL engines that you can rely on. Now with that, let's turn to the concept of the data lake. What is a data lake? To us, fundamentally, a data lake is a place where you can put raw data, where you can process it, refine it and provision it for downstream use. So that ability to work with a variety of data is really critical. We see people wanting to do this fundamentally so they can manage all their data so that they can take advantage of all the innovation in the Hadoop community and beyond. There's tools like Presto and Spark and Storm and how many more announcements this week are available for your analysts and for you to work with the data. The ability to remix your ETL so you can optimize and process data in your analytic database where it's appropriate and in Hadoop where it's appropriate and have lots of flexibility. So fundamentally, a well-managed, rich set of data that really lets you have a foundation for creating value for the business. So often we see the initial driver for adopting Hadoop as big data data sets that companies weren't able to put in their traditional databases and they can finally have insight. And we'll get back to that theme in a minute. What we're seeing is people tend to build their data lake incrementally as they should. They don't put all the data in, they don't have a mega project. Instead, they start small and they add data sources, applications, business units and users. And that's a good thing. You want to be incremental. As you do grow it though, you're going to run into how do you access this? How do you govern it and make sure you have the lineage and understand the data? How do you secure it? How do you integrate it into an ecosystem of assets in the enterprise? So with that, what we've seen is the first wave of data lake deployments kind of grew up in a world of having one use case, some very specific things people wanted to do in Hadoop. Maybe it proves the concept. Then you had more limited users. You had more limited data sources and processes. And what's happened though is that these systems have grown. We're now in a new wave where having this kind of tribal knowledge, this tacit knowledge isn't scaling. And as companies are putting in thousands of tables, dozens of sources, many applications, many users, there's lots of problems with incrementally evolving your system. We see this all the time when we're consulting with companies that started working with Hadoop even two years ago that they started off with a few scoop scripts running in Uzi and all was good. And now they've got a nightmare of 100 jobs that they have to monitor and run and worry about. So they need to have better scale. They need to know where the data came from. They need to know when things change. They need governance. And we've put a lot of work into how do you do that? How do you govern and scale your data lakes so that it works well? We've had successes at places like high-tech manufacturers. One of our customers has data about their manufacturing process and multiple plants around the globe. And they want to be able to trace the parts and their integration throughout that process to improve the yield, which means faster time to market and reduced waste of scrap from things that don't work. Putting a Hadoop data lake into their manufacturing ecosystem has been a major accomplishment. They're using a number of technologies in the Hadoop ecosystem. Hortonworks, who of course is a strategic partner of Teradata's for a long time, over three years, and some of the technologies there. We're working with technologies like Spark and GraphX to do graph analysis, some think big technologies, notably a buffer server, which I'll talk about in a little bit for ingesting larger binary files into the system, as well as integrating into Teradata. So that's one example of a large data lake that's really changing from a reactive approach to understanding problems in manufacturing to proactive understanding root causes and being more effective in getting products to market. Another example in the telecommunications sector is a data lake with more of an emphasis on allowing large numbers of analysts and data scientists to access a variety of data and by having tools to govern the data, to be able to track it and trace it and work with data in more of a lab environment, it allows for more productivity and better collaboration. So that's using Hortonworks, Hive, Spark, and Kafka as basic engines, and then Loom, which is a product Teradata also announced a release of this week to really provide the analyst tooling necessary to support that collaboration. This diagram shows what we think of as a mature data lake in its context. So you've got underlying it, of course, the ability to have security, to be able to secure data, you've got regulatory compliance, the ability to archive data, so if you have deep history, the ability to store data in an efficient way that's not used as often, but in an active way. We'll talk more about that. Rain stores and technology supports that. Teradata has also released a new version of this week. Metadata repository, a place to store all the metadata to be able to index and find. And so Loom is a product that Teradata announced. And then on top of those foundational elements, you've got the ability to govern ingestion. Now, one of the things that I think is important to understand is while there's great approaches, there's a lot of interest around streaming data and there's great use cases for it. From new APIs like Kafka, often we see enterprises with more traditional message queuing. And Spark is a great way of doing that. There's also a lot of customers of ours that have large amounts of data and relational systems that they want to replicate into Hadoop to be able to work with their big data and have the right context. So if you have a click stream, you also want to have a customer profile so you know something about who's clicking and what they're doing. So we see companies needing to go beyond point to point, script at a time, integration into a change data capture, scalable bulk replication of data into Hadoop as a more efficient way. So we've got a product available there to be able to capture data at scale in Hadoop. And then we also see a lot of times that the dirty little secret about integration in the enterprise is FTP is dominant or some secure version thereof, right? That there's lots of file transfers that happen, files get cut. And so being able to efficiently manage file integration into the cluster is important too. And that's why we have a technology called the Buffer server. But whatever the means of ingestion, you need to govern it, right? You need to be able to trace as the data is flowing into the lake and know where it came from and what version. You know, we have standards around how do you encode metadata inline and files as well as indexing it in the repo so you can work well. And that sets you up to then have the other two zones in your data lake. One is your repository, your trusted production data. You want to have raw data, but you also want to have a variety of processing tools, you know, MapReduce, Hive, Spark, et cetera, that produce derived views, a consumable layer. You know, one of the myths people always talk about schema and read and they assume all you need is raw data in Hadoop. But in fact, what you want to do is agile modeling so you derive views over time as you see value and doing things like producing sessionization or denormalizing data, you end up with a derived view that's better to consume, right? So having that, that's the trusted data in the cluster that's governed and you can use with confidence to feed downstream analytics systems as in the top left. But then you also want to have a data lab, a place where data scientists and analysts can play with new data sets, can load them in, can join them and combine them with some of the data that's in the repository and play with new tools. So you need to have both. And with the power of yarn, you can now feasibly run them both in the same cluster, which is a big deal. Presto is sitting on top because the power of a SQL engine in Hadoop for us is being able to access all this data in a raw form for advanced queries and exploration. But you also want to have, as in the top left, a system to let you get scalable access for analysts. So our dashboard engine that I'm going to talk about in a minute is a big deal for that. We also see, of course, a need for real-time processing as you see, as you want to go from simple analytics to starting to drive into interaction with users, personalization, predictive failure, alerts, et cetera, so being able to drive real-time response. And then a discovery zone, the ability to have machine learning tools, statistics, graph analysis, et cetera, so the ability to do advanced modeling and so forth on your big data. So all of this is important, and this is what we see as the context of a successful data lake, which then enables rich analytics. So with that, another example of a customer of ours is an emergency response provider where we're putting together a real-time dashboard. So as events are coming in as crises or issues are occurring for police and fire and so forth, they can direct resources to where the needs are the most. And so for this, this is a real-time need. So using Kafka with Storm and Spark on top of Hortonworks is really allowing them, will allow them to have a much more scalable system to allow emergency responders to act efficiently. So another example of a customer of ours. So with that, it's worth noting at Think Big, one of the things we see is many companies are still trying to get their footing on the data lake. So we offer services around a roadmap, getting planned out and incrementally building capabilities. We have a starter. How do you build it in a scalable way that can be governed, right? There are best practices and patterns, so you don't have to incrementally work your way into a data swamp. You can start off on a clean footing. We have optimization services to take existing deployments and improve the governance and scalability, as well as manage services for execution and operation of the data lake environment. Associated with that, we talked a little bit about this. How do you capture metadata? How do you let your analysts be productive? Here, data announced loom as a product this week that really gives you visual tools for wrangling data in the lake, looking at the lineage, capturing metadata, and rain store as a system of record and archive. So you can store highly compressed, you know, 10 to 40 times compression data in a way that can even be regulatory compliant, right? So the ability to have those assets in the data lake can be really compelling. We also see another important thing is our dashboard engine. So this chart, I think it's something that people often think about, hey, if I want to give hundreds of users access to the data in Hadoop, maybe I can use an in-memory technology of some kind, right? But as this chart suggests, when you look at the rate of data increase in big data, it's exceeding the rate at which memory density is increasing. Fundamentally, that chart is showing how much mobile traffic there is projected to grow over the next few years versus how much the cost of RAM, how much RAM you're going to get for a buck or 10 cents is. And you can see that, you know, even if today you have a small enough amount of data that an in-memory solution is going to work well for your users, that's not a good bet for the future. Never mind the fact that with big data we're instrumenting more and we're wanting to look deeper and have better analysis so the demands for analytics keep going up. That's why we think it's so important to have a system that lets you do analytics that let the masses have access, not just the data scientists, but let the business have access to well-processed data. So ThinkBig has a dashboard engine that's a product that lets you aggregate data with Spark and then store it inside of HBase, managed through a configuration with APIs to popular BI tools like Tableau and MicroStrategy as well as JavaScript dashboards. So natively in Hadoop, you can scale and analyze data to power the business. So overall what I'd say is at ThinkBig we've got a range of services to help customers adopt big data, whether it be roadmaps and architecture, implementing, architecting and engineering the data lake, using analytics and data science to build high-value solutions on top of Hadoop as well as training and managed services. So we're excited today. We think that a couple of takeaways. One, we're hiring, we're growing quickly, we've expanded internationally. I think that Cepresto is a big deal that we've got a very exciting new offering from Teradata around analytics, SQL and Hadoop. And that the second wave data lake is really foundational to unlocking value in Hadoop. To have the right patterns around security and organizing the metadata, governing the data, processing it, to let you scale your data sources and users is absolutely necessary to go from early wins to making the Hadoop an enterprise asset. And we really believe that between ThinkBig's services are five-plus years of large-scale deployments and Teradata's meaningful investments in the Hadoop space that we've got a unique offering to help you succeed with your big data deployments. With that, thank you very much and I'll turn it over to Jeffrey Moore. Excuse me. Thank you, Ron, for sharing his vision of Teradata and as they look at it in ThinkBig. ThinkBig is something we've worked with for years because they've done a lot of great work and probably for some of you in this room they've done many of the projects and some of the work that they've done. They've got a great core of expertise around Hadoop and what they're able to do. So that comes, I think for many people who come to the show, one of the favorite parts which is to hear specifically from some of the customers how they think about Hadoop and what they're doing with it. So I thought we'd do is we've got five different companies. So introduce them as they come up. First, Sam from Home Depot. Sam, welcome. Chris from Rogers Communications. Neil from Schlumberger. David from Symantec. And Rob from Verizon. So you've got a good crowd up here. First hand for all these guys, spending some time up here. Have a seat, everybody. So we thought we'd do is first, as we go around, please just introduce yourself and enroll in the company as we go around. And so I thought we'd do is we talk a lot here about Hadoop being transformational, what that means, how it's used, how companies create different business models. But in many cases, what we find is as companies get started with Hadoop, it's really about taking your existing processes and saying how do I apply more data against it and can I make those better? Can I get deeper insights? So maybe we just start there and just give some thoughts about how you think about that in each of your businesses. And Rob, why don't we start with you? Yes. So Rob Smith, Verizon Wildlife Executive Director. I've received big data across the wireless segment as well as CRM and business intelligence initiatives. So a broad brush, but big data has made a significant improvement in our visibility to our customers. And I'll give you a good example. We had a mature data warehouse where our analytics practice very, very mature. Our modeler statisticians did a great job of doing modeling on things like that important to us. Churn, right? So with a wireless company as big as we are, over 100 million customers. Any small incremental change in churn percentage means significant differences in revenue, right? So what we had before in a standardized data warehouse was the presence of the data that was there, fairly static type information. But as we evolved into Hadoop, it allows us to extend the level of data, the level of detail that we have, and allowed us to increase the model scoring. So a couple of examples before where clickstream analysis, chats that are occurring, even as simple as online doing searches, all of those types of data points have direct correlation to a customer's intent and behavior. And because of that, we were able to associate that type of information through the big data platform, ingest into our modeling, and actually increase the scoring of our modeling, our churn modeling, by a factor of about 20% increase in predictability. And it was already high already. So a significant business impact just right there. Thank you, Rob. David, ETL, if we could just talk about that. So Symantec, as you think of it, many probably Symantec customers out here today, as you think about that, as you think about your journey and the type of data you get started with. So my name is David Lin. I go by ETL for disambiguation purposes. And Symantec really, at its heart, as a security company, it's about detection. It's about detection. It's about remediation. It's about protection. And when it comes to the big data technologies, or even small data technologies, any data technologies, Symantec's been doing this for quite some time. And so there were technologies, transactional technologies that we used. That was what was available, kind of circa 2005, 2010 as we grew. But very recently, we started down a transformation. And the transformation really is making sure that we can take the data that comes off of the hundreds of millions of endpoints, all the security metadata that's coming off of the backbones and other areas where we have sensors, and we have to crunch that data. And the point of crunching all that data is really about time to protection. So for us, time protection means from the moment an attacker is trying to do something not good, there's signs that something's about to happen. And so our ability to take the information that we have process it, do accurate decision making, and then run our enforcement engines. I mean that really is the life cycle of semantic protection. And so for us, we started down this journey, we were using Hadoop, we were using the Hadoop and Friends, working closely with Hortonworks to get our initial security data lake, and there's a talk this afternoon about that. But for us, this technology, this pace, the open mindset that comes with the ecosystem has been transformative in terms of how we can protect better, detect faster, protect faster. So it's as I feel the ability to capture more data, more signals to be able to find more patterns, and then just the speed of processing the raw time to protection of what you can get there. Great, thank you David. Okay, no, which slumber shea, right? Oil and gas, as you think about that and how you think of Hadoop. Hi, good morning, my name is Anil Verma, I'm the Vice President for Data and Analytics at Slumberj, which is a $50 billion oil and gas services firm. So the oil and gas business, as you can imagine, right? It's very, very global, very technically intensive, and very distributed. So what we've been looking at is, as we operate all around the world with very expensive machinery and very sophisticated operations, there's a lot of data that's created around the company. But what we are trying to do is how do we get to scale as we aspire to just be much more operationally efficient than we have been in the past. And some of that is the company's own ambition to scale in terms of efficiency. And the second is, as you're all aware, the economic environment and the oil price is a forcing function for companies to become more efficient. So I think our intersection with the data is we're just trying to take all our business data, all our operational data and try to bring it together so you get a consistent view of how we operate anywhere in the world in any kind of geography with any kind of customer on any kind of operation. So that's our long-term vision. Still early days on that. But the point I would make is what we like about Hadoop is it's an integrated ecosystem. For how we want to function as a business where a lot of the data belongs together you get a consistent view, no matter who queries it, and then you can make decisions on top of it. So just maybe for one more question as you think about it, because in many ways companies such as yours and Olive Gas, big data is a term you've been using for years. You've been processing large volumes of data for decades. What's different in terms of Hadoop as you see that in terms of what else is the original big data? I think the way I think about it is you talked about looking at processes and I think traditionally you look at processes the same way. What Druid provides here for us is a more flexible analytics architecture. Druid lets us ask pretty much any question we want on the fly from the analytics UI. So across the top in this image basically what app publishers can do is manage a portfolio of apps. So instead of just getting reporting for one of their apps at a time you can just cross their entire portfolio. So for example, Angry Birds Rio is mixing that with Angry Birds Star Wars. I might want to see those side by side and then remove the Android portion of that and just look at the iOS piece. And all of that processing would be done on the fly from the UI speaking to Druid. And the way you do that is you basically build very wide metrics and then dump those into Druid and then aggregate aggregates inside of Druid all on the fly. That's what sparked our portfolio. That's what powers what we call Explorer. I'll show you guys what that looks like in just a second. So a little bit more about Druid. There was a talk on Tuesday by two people in my team, Eric Cheddar and Lee Rhodes where they talked about how to use sketches with Druid. But if you guys missed that talk, Druid is a real-time, all app oriented distributed query engine. It provides real-time analytics on really, really large data sets, filters and aggregates data from different tables. So that's how we build our data models in Druid. We just denormalize everything and then dump it into Druid. So no joins at all. It indexes the data, stores it as column there, compressed data. We see sub-second latency on really, really large tables. So some of our tables have 5 billion, 10 billion rows. Again, I mentioned the fact that we can do 200 billion records inside of Druid as well. It was originally built by MenderMarkets. Yahoo helped push that out recently, probably within the last two months I'd say. And again, if you want to see how we use Druid at a lower level, you can go back and see the video from a talk done by Eric Cheddar and Lee Rhodes on Tuesday. So this is the Flurry Explorer UI. This actually runs Spark underneath the hood. Explorer is slightly different from the Flurry UI in the sense that you can ask any question on the fly. You declare your metrics, you declare your dimensions, you declare your filters. It runs real-time queries against the data. Slightly different from Druid. But this is really where people come to do their data workbench work and do ad hoc analysis and kind of probe around and play with the raw fact data. But one thing I do want to point out to you guys is I still feel like we're missing what I would call the big data holy grail. And to me, what I want to be able to do and I've kind of jumped to this since I was a little kid is I want to take all the data that I have and dump it into some black box. And I don't want to have to cook the data or pre-compute the data and I want to be able to ask any question that I want using SQL. And we're still not quite there. I think Druid comes pretty, pretty close to this. But we're not quite there. There are similar tools such as Impala but it can't really handle the scale that we deal with at Yahoo. So I think that this is an important piece that one of you guys in their crowd should develop and we'll absolutely use it. So I think that, you know, the takeaways that I want you guys to understand is I still feel like the Hadoop architecture is evolving. We've evolved Flurry Over Time to match that evolution of Flurry. It's still to me a data toolbox. What you should do is be taking the tools out of the toolbox to assemble the architecture that you want. We've been able to take those tools and reduce the latency and increase the capabilities within Flurry Over Time by taking this approach. And the other thing is I'd really like to thank the Flurry Engineering team, especially Dave Leith and me and Friedman and Rahul Gudwani because these are the guys that help shape the architecture of Flurry and I'd be remiss to not mention the fact that we're hiring. So please email me timtiyahoodsink.com I hope you guys have an enjoyable rest of the conference and thank you very much. Thank you, Tim. Thanks. Thank you very much. So it's interesting to hear how Yahoo thinks of Hadoop and as they think about it because they're well into their journey with Hadoop, right? Well into that version of how can they leverage it across the enterprise? What are the opportunities of what they can do and how does it help them in their business? So now I have the pleasure of introducing Ron Bodkin. So Ron, new Ron as founder and CEO of Think Big, a systems integrator and consulting company, right? It's specialized. Ron had done a lot of work. We've done a lot of projects together and now it's part of Teradata, right? Right in their division. He's here to share his insights from both systems integration perspective as well as from Teradata on the journey of Hadoop. Here you go, Ron. Thank you. Thanks, Herb. Well, it's great to be here. I'm very excited to be in front of the audience here today. I started working with Hadoop over eight years ago at a startup called Quantcast that does both audience measurement and now advertising building large scale lookalikes. So out of my experience using Hadoop and leading teams that were doing the engineering and data science with the technology and integrating it into an ecosystem of tools, I saw a huge opportunity to start a pure play services firm consulting company focused exclusively on Hadoop and helping enterprises succeed. Five years later we're still going strong. We were acquired by Teradata nine months ago and very excited by the investment support we're getting from that. And I want to share with you what we're seeing as we see enterprises getting serious about that journey to put Hadoop in production, to put it at scale, to make it a kind of a critical asset and more broadly I want to talk to you about all the exciting things Teradata is doing and the portfolio of products to really help enterprises use Hadoop. On our journey we'll talk a little bit about the roadmap, how do you get started we'll talk about building great data lakes so you can manage and trust the data that's in Hadoop. We'll talk about the analytics once you've got your data there, how can you look at the data, what are the right patterns and practices and we'll talk a little bit about some examples. So with that let me talk about an important announcement that Teradata made earlier this week around Presto. For those who don't know Presto, Presto is a 100% open source SQL engine on Hadoop. It was created at Facebook, the same people who created Hive because they recognized there was a need for a modern architecture a Java based system for querying data in Hadoop that was well architected, you could plug in Java UDFs and it's running at massive scale already at Facebook and has been for years. There are thousands of users processing petabytes of data every day in Facebook on a multi-hundred cluster. So that's pretty important scalability. I think there's an interesting parallel in the path of Presto as a user developed open source project that's now getting traction in other scale companies Netflix, Airbnb have been making contributions, they've got Dropbox and Groupon as important users. So it's working well in these massive scale just like Kafka came out of LinkedIn at massive scale and started to get adoption on web scale and is now becoming a pervasive technology in the community. So we think it's a foundation for the future of SQL in Hadoop that it's open it performs well and that it has the ability to work across a variety of distributions of Hadoop. So we think that it's a fantastic foundation and with Territid's announcement of enterprise support it means you can now get a distribution of Presto and you've got a team of experts about 16 people that Territid is dedicating to doing open source development on Presto to make it work well in the enterprise. You can see the roadmap here immediately with the announcement of a distribution that's now available. You've got some fit and finish installation etc. By the end of the year you'll have a Bari and yarn integration so it fits well in the ecosystem and we'll start to see meaningful progress on good SQL coverage in Presto and it is a big deal because what we've seen it think big and working with customers is when they've tried out first generation SQL in Hadoop their prototype quality you can throw some queries at them and they work. You so throw some queries at them and they blow up you so throw some queries at them and they don't return. That's not the recipe for a useful analytics capability. So we think Presto is the foundation of what's going to ultimately be successful in the enterprise and into phase three next year the team is going to work on robust ODBC and BI integration security and connectors but we think it's a major advancement to have a engine for SQL in Hadoop from a company Teradata that has so much experience in building enterprise-grade SQL engines that you can rely on. Now with that let's turn to the concept of the data lake. What is a data lake? To us fundamentally a data lake is a place where you can put raw data where you can process it, refine it and provision it for downstream use. So that ability to work with a variety of data is really critical. We see people wanting to do this fundamentally so they can manage all their data so that they can take advantage of all the innovation in the Hadoop community and beyond so there's tools like Presto and Spark and Storm and how many more announcements this week are available for your analysts and for you to work with data. The ability to remix your ETL so you can optimize and process data in your analytic database where it's appropriate and in Hadoop where it's appropriate and have lots of flexibility. So fundamentally a well-managed rich set of data that really lets you have a foundation for creating value for the business. So often we see the initial driver for adopting Hadoop is big data data sets that companies weren't able to put in their traditional databases and they can finally have insight and we'll get back to that theme in a minute. What we're seeing is people tend to build their data lake incrementally as they should right they don't put all the data in they don't have a mega project instead they start small and they add data sources, applications, business units and users and that's a good thing you want to be incremental. As you do grow it though you're going to run into how do you access this how do you govern it and make sure you have the lineage and understand the data how do you secure it and how do you integrate it into an ecosystem of assets in the enterprise. So with that what we've seen is the first wave of data lake deployments kind of grew up in a world of more having one use case some very specific things people wanted to do in Hadoop maybe prove the concept then you had more limited users you had more limited data sources and processes and what's happened though is that these systems have grown right we're now in a new wave where having this kind of tribal knowledge is tacit knowledge isn't scaling and as companies are putting in thousands of tables you know dozens of sources many applications many users there's lots of problems with incrementally evolving your system we see this all the time when we're consulting with companies that started working with Hadoop even two years ago that they they started off with a few scoop scripts running in Uzi and all was good and now they've got a nightmare of a hundred jobs that they have to monitor and run and worry about right so they need to have better scale they need to know where the data came from they need to know when things change right they need governance and you know we've put a lot of work into how do you do that how do you govern and scale your data lakes so that it works well you know we've had successes at places like high-tech manufacturers one of our customers has data about their manufacturing process of multiple plants around the globe and they want to be able to trace the parts and their integration throughout that process to improve the yield which means faster time to market and reduced waste of scrap from things that don't work putting a Hadoop data lake into their manufacturing ecosystem has been a major accomplishment they're using a number of technologies in the Hadoop ecosystem Hortonworks who of course is a strategic partner of Teradata's for a long time over three years and some of the technologies there you know we're working with technologies like Spark and GraphX to do graph analysis some think big technologies notably a buffer server which I'll talk about in a little bit for ingesting larger binary files into the system as well as integrating into Teradata so that's one example of a large data lake that's really changing from a reactive approach to understanding problems in manufacturing to proactive understanding root causes and being more effective in getting products to market another example in the telecommunication sector is a data lake with more of an emphasis on allowing large numbers of analysts and data scientists to access a variety of data and by having tools to govern the data to be able to track it and trace it and work with data in more of a lab environment it allows for more productivity and better collaboration so that's using Hortonworks Hive, Spark and Kafka as basic engines and then Lume which is a product Teradata also announced a release of this week to really provide the analyst tooling necessary to support that collaboration this diagram shows what we think of as a mature data lake in its context so you've got underlying it of course the ability to have security to be able to secure data you've got regulatory compliance the ability to archive data so if you have deep history the ability to store data in an efficient way that's not used as often but in an active way we'll talk more about that rain stores the technology supports that Teradata also released a new version of this week metadata repository a place to store all the metadata to be able to index and find and so Lume is a product that Teradata announced and then on top of those foundational elements you've got the ability to govern ingestion now one of the things that I think is important understand is while there's great approaches there's a lot of interest around streaming data and there's great use cases for it you know from new APIs like Kafka or often we see enterprises with more traditional message queuing and Spark is a great way of doing that there's also a lot of customers of ours that have large amounts of data and relational systems that they want to replicate into Hadoop to be able to work with their big data and have the right context right so if you have a click stream you also want to have a customer profile just clicking and what they're doing right so we see companies needing to go beyond you know point to point script at a time integration into a change data capture you know scalable bulk replication of data into Hadoop as a more efficient way so we've got a product available there to be able to capture data at scale in Hadoop and then we also see a lot of times that the dirty little secret about integration the enterprise is FTP is dominant or some secure version thereof right that there's lots of file transfers that happen files get cut and so being able to efficiently manage file integration into the cluster is important too and that's why we have a technology called the buffer server but whatever the means of ingestion you need to govern it right you need to be able to trace as the data is flowing into the lake and know where it came from and what version you know we have standards around how do you encode metadata inline and files as well as indexing it in the repo so you can work well and that sets you up to then have the other two zones in your data lake one is your repository your trusted production data you want to have raw data but you also want to have a variety of processing tools you know MapReduce, Hive, Spark, etc. that produce derived views a consumable layer you know one of the myths people always talk about schema on read and they assume all you need is raw data in Hadoop but in fact what you want to do is agile modeling so you derive you as you over time as you see value and doing things like producing sessionization or denormalizing data you end up with a derived view that's better to assume right so having that that's the trusted data in the cluster that's governed and you can use with confidence to feed downstream analytic systems as in the top left but then you also want to have a data lab a place where data scientists and analysts can play with new data sets can load them in can join them and combine them with some of the data that's in the repository and play with new tools so you need to have both and with the power of yarn you can now feasibly run them both in the same cluster which is a big deal Presto sitting on top because the power of a SQL engine in Hadoop for us is being able to access all this data in a raw form for advanced queries and exploration but you also want to have as in the top left a system to let you get scalable access for analysts so our dashboard engine I'm going to talk about in a minute is a big deal for that we also see of course the need for real-time processing as you see as you want to go from simple analytics to starting to drive into interaction with users personalization predictive failure alerts etc to be able to drive real-time response and then a discovery zone the ability to have machine learning tools statistics graph analysis etc right so the ability to do advanced modeling and so forth on your big data so all of this is important and this is what we see as the context of a successful data lake which then enables rich analytics so with that another example of a customer of ours is an emergency response provider where we're putting together a real-time dashboard so as events are coming in as crises or issues are occurring for police and fire and so forth they can direct resources to where the needs are the most and so for this this is a real-time need so using Kafka with storm and spark on top of Hortonworks is really allowing them will allow them to have a much more scalable system to and now emergency responders to act efficiently so another example of a customer of ours so with that it's worth noting at Think Big one of the things we see as many companies are still trying to get their footing on the data lake so we offer services around a roadmap getting planned out and incrementally building capabilities we have a starter how do you build it in a scalable way that can be governed right there are best practices and patterns so you don't have to to incrementally work your way into a data swamp you can start off on a clean footing we have optimization services to take existing deployments and improve the governance and scalability as well as manage services for execution and operation of the data lake environment you know associated with that we talked a little bit about this how do you capture metadata how do you let your analysts be productive tear data announced loom as a product this week that really gives you visual tools for wrangling data in the lake looking at the lineage capturing metadata and rain store as a system of record and archive so you can store highly compressed you know 10 to 40 times compression data in a way that can even be regulatory compliant right so the ability to have those assets in the data lake can be really compelling we also see another important thing is our dashboard engine so this chart I think it's something that people often think about hey if I want to give hundreds of users access to the data and Hadoop maybe I can use an in-memory technology of some kind right but as this chart suggests when you look at the the rate of data increase in big data it's exceeding the rate at which memory density is increasing fundamentally that chart is showing how much mobile traffic there is projected to go over the next few years versus how much the cost of Ram you know how much Ram you're going to get for a buck or 10 cents is and you can see that you know even if today you have a small enough amount of data that an in-memory solution is going to work well for your users it's that's not a good bet for the future never mind the fact that with big data we're instrumenting more and we're wanting to look deeper and have better analysis so the demands for analytics keep going up that's why we think it's so important to have a system that lets you do analytics that let the masses have access not just the data scientists but let the business have access to well processed data so ThinkBake has a dashboard engine that's a product that lets you aggregate data and store it inside of HBase managed through a configuration with APIs to popular BI tools like Tableau and MicroStrategy as well as JavaScript dashboards so natively in Hadoop you can scale and analyze data to power the business you know so overall what I'd say is at ThinkBake we've got a range of services to help customers adopt big data whether it be road maps and architecture implementing architecting and engineering the data lake using analytics and data science to build high value solutions on top of Hadoop as well as training and manage services so you know we're excited today we think that a couple of takeaways one we're hiring we're growing quickly we've expanded internationally I think that Cepresto is a big deal that we've got a very exciting new offering from Teradata around analytics SQL in Hadoop and that the second wave data lake is really foundational in Hadoop to have the right patterns around security and organizing the metadata governing the data processing it scale it to let you scale your data sources and users is absolutely necessary to go from early wins to making the Hadoop an enterprise asset and we really believe that between ThinkBake services are five plus years of large scale deployments and Teradata's meaningful investments in the Hadoop space that we've got a unique offering to help you succeed with your big data deployments with that thank you very much and I'll turn it over to Jeffrey Moore excuse me thank you Ron for sharing his vision of Teradata and as they look at it ThinkBake ThinkBake is something we've worked with for years because they've done a lot of great work and probably for some of you in this room with many of the projects and some of the work that they've done they've got a great core of expertise around Hadoop and what they're able to do so that comes for I think for many people who come to this show one of the favorite parts which is to hear specifically from some of the customers how they think about Hadoop and what they're doing with it so I thought we'd do is we've got five different companies so introduce them as they come up first Sam from Home Depot, Sam welcome, Chris from Rogers Communications, Chris Anil from Schlumberger David from Symantec and Rob from Verizon so you got a good crowd here, first hand for all these guys spending some time up here have a seat everybody, come on so we thought we'd do is first as we go around, please introduce yourself and role in the company as we go around and so I thought we'd do is we talk a lot here about Hadoop being transformational, what that means how it's used, how companies create different business models but in many cases what we find is as companies get started with Hadoop it's really about taking your existing processes and saying can I apply more data against it and can I make those better, can I get deeper insights so maybe we just start there and just give some thoughts about how you think about that in each of your businesses and Rob, why don't we start with you? Yes, so Rob Smith Verizon Wilds Executive Director, I've received big data across the wireless segment as well as CRM and business intelligence initiatives so a broad brush but big data has made a significant improvement in our visibility to our customers in our data warehouse where our analytics practice very very mature our modeler, statisticians did a great job of doing modeling on things like that important to us churn, right, so with a wireless company as big as we are over 100 million customers any small incremental change in churn percentage means significant differences in revenue so what we had before in a standardized data warehouse was the presence of the data that was fairly static, static type information but as we evolved into Hadoop it allows us to extend the level of data, the level of detail that we have and allowed us to increase the model scoring so a couple of examples before where clickstream analysis chats that are occurring even as simple as online doing searches all of those types of data points have a direct correlation to a customer's intent and behavior and because of that we were able to associate that type of information to the big data platform and just into our modeling and actually increase the scoring of our modeling, our churn modeling by a factor of about 20% increase in predictability and it was already high already so a significant business impact just right there Thank you Rob, David ETL if we could just talk about that so similar, so Symantec right as you think I mean many probably Symantec customers today as you think about that as you think about your journey and the type of data you get started with so my name is David Lin I go by ETL for disambiguation purposes and Symantec really at its heart as a security company it's about detection it's about detection it's about remediation, it's about protection and when it comes to the big data technologies or even small data technologies, any data technologies Symantec's been doing this for quite some time and so there were technologies transactional technologies that we used that was what was available kind of circa 2005 2010 as we grew but very recently we started down a transformation and the transformation really is making sure that we can take the data that comes off of the hundreds of millions of endpoints all the security metadata that's coming off of the backbones and other areas where we have sensors and we have to crunch that data and the point of crunching all that data is really about time to protection so for us time protection means from the moment an attacker is trying to do something not good there's signs that something's about to happen and so our ability to take the information that we have process it to accurate decision making and then run our enforcement engines I mean that really is the life cycle of Symantec protection and so for us we started down this journey we were using Hadoop we were using the Hadoop and Friends working closely with Hortonworks to get our initial security data lake and there's a talk this afternoon about that but for us this technology, this pace the open mindset that comes with the ecosystem has been transformative in terms of how we can protect better detect faster, protect faster so it sounds like the ability to capture more data more signals to be able to find more patterns and then just the speed of processing the raw time to protection what you can get there, great thank you David okay, no, which slumberge oil and gas as you think about that and how you think of Hadoop sure, hi good morning my name is Anil Verma I'm the vice president for data analytics at slumberge which is 50 billion dollar oil and gas services firm so the oil and gas business as you can imagine it's very, very global very technically intensive and very distributed so what we've been looking at is as we operate all around the world with very expensive machinery and very sophisticated operations there's a lot of data that's created around the company but what we are trying to do is how do we get to scale as we aspire to just be much more operationally efficient than we have been in the past and some of that is the company's own ambition to scale in terms of efficiency and the second is as you are all aware the economic environment and the oil price is a forcing function for companies to become more efficient so I think our intersection with the data and the Hadoop side is we're just trying to take all our business data all our operational data and try to bring it together so you get a consistent view of how we operate anywhere in the world in any kind of geography with any kind of customer on any kind of operation so that's a long-term vision still early days on that but the point I would make is what we like about Hadoop is it's an integrated ecosystem so it's a good match for how we want to function as a business where a lot of the data belongs together you get a consistent view no matter who queries it and then you can make decisions on top of it so just maybe for one more question as you think about it because in many ways companies such as yours and Olive Gas big data is a term you've been using for years seismic data, you've been processing large volumes of data for decades what's different in terms of Hadoop as you see that in terms of what else you can do in addition to the types of things you've done with your original big data think about it as you know you talked about looking at processes and I think traditionally you look at processes the same way that you're organized by business lines so if you're drilling you care about drilling processes if you're doing exploration you care about that once you put a data platform in place you can as you start to connect the activities of the company horizontally that you start the exploration process you get data on that you do some drilling and you look for oil and when you find it you extract it as you get that your notion of a process expands beyond business lines that's where this real value generated because there are so many optimizations you can do as long as you have that insight and traditionally the industry is still early on that journey of getting those insights so in many ways moving from what I'll say divisional or departmental processes to an enterprise level process visibility across that what I like about Hadoop is it is a horizontal for us and slowly more and more people understand that if they have information upstream of them and downstream of them it gives them better context in which to optimize their own operation Chris from Rogers as you think of that yeah so Chris Dingle from Rogers Rogers is the leading telecom company in Canada about 12 billion dollars and covers cable wireless publishing and has the lease in the NHL the Raptors and basketball and also the Jays Blue Jays before baseball and our journey was pretty interesting with Hadoop in the sense and Hortonworks we started off in a single business unit actually the two people are in the audience today Victor and Paolo and when we were here two summits ago and it's probably a similar experience for a lot of people in the audience we just got started in that one division and quickly launched a product within six months which then became the leading growing digital product in the media business the interesting thing that we're hearing across the board for all of the companies is it also ends up that you find that when you use that type of a platform Hortonworks and Hadoop that you're on a platform that scales across organizations so during the reorganization a year ago we found ourselves in the customer experience team and Victor and Paolo found themselves in the IT team we had all started off closely together but the platform allowed us to scale into customer experience and into a number of other different agendas for the company without having to change anything and that was for us fundamentally very awesome because we were just going with it and we now sort of think about it as a partnership platform we talk about the enterprise platform but it's really a partnership platform not only with the partners that we had like Adobe and Google where we were able to bring in huge amounts of data but also with the partners that we wanted to use in customer experience and Rob and I were talking about ClickFox and you know Andox and some of the other ones there was a time where these partners would want to bring in their own systems and integrate that into your company as a siloed solution we've all said as telecommunication companies or retailers or banks no we actually want you to leverage right into our Hortonworks or the Hadoop platform and use that as a conversation our business language of communicating together and the really neat thing is to see in 2015 that's happening across the board finally in terms of partnerships internal to your company what you'll find, what we're finding is that the conversations across divisions fade away when you begin to create a 360 degree view of your customer which now as you bring in each of the different groups they become strong very quickly and then everyone benefits from that partnership ecosystem so really interesting to see how it starts with just a few people and then become a platform then for the entire company okay thank you Chris Sam at Home Depot as you think of that same question I'm Sam with Home Depot we're a customer service organization and you know we kind of started internally with Hadoop and we modern our cloud you know we're looking at how it's performing you know the logs, the errors and that's where we started and then we moved on to the customer we really want to get the right product at the right price to the right place on time and make our customers happy that's what it's all about so we're using Hadoop to help us do that and it really gives us a lot of insight there okay so for all of you many times it starts with taking your core processes, apply more signals against it, more data and how can I improve that and what can I do anyone who would like to answer some of the volumes and some of the things you're seeing, I know Yahoo talked through some of the volumes just what you're in terms of either transactions or orders or however you think of it inside of your business I'll start our first use case was really driven out of a need of growth so expansion of data usage on your cell phones for example right drives a tremendous amount of traffic for us and we had a scalability issue and a cost issue as well too in growing that platform the way it sat so in partnership with Hortonworks and bringing a big data solution to it we are and we did this in a matter of five months I believe it was around five months we implemented we are now ingesting over 250 billion records a day so at scale across all of our platforms we have one for the network side we have one also a discovery cluster that we allow for integration exactly like Chris was talking about cross-pollination of all of our various channels and inputs we can talk about that a little bit but we have between those two clusters alone we have about a thousand nodes in play today driving value so 250 billion transactions records a day that is large volume it's large volume David are you for semantic we were playing kind of in the game of terabytes hundreds of terabytes and when we transformed our primary data lake into a Hadoop-based cluster now horizontal scaling means incremental orders at the petabyte 10 petabyte scale and these systems are designed to scale horizontally in that way so we have confidence that we're not going to top out in terms of the system that it's a horizontal scaling problem we can continue and build out and build out as needed one example of the kind of speed that we get from basic technologies we use Kafka we use storm in addition to the Hadoop and friends ecosystem but there are security use cases semantic processes security logs globally for just about everybody and one of the key metrics that we have is basically what's the mean time to process a log that enters the system so in some cases when we talk about large scale attacks and we talk about lots of different data coming in from many places the security system itself used to get backed up and there'd be cases where on average the time to detection was in the order of hours now clearly that's not acceptable so for us as we moved over to the new technologies the new infrastructure on the Hadoop and friends ecosystem we're able to reduce that latency down to seconds and what this translates to for our customers is that instead of waiting that extra couple of hours for protection protection comes in seconds I mean that is an enormous order of magnitude change if you're going for multiple hours to a couple of seconds just to be able to go do that and we're confident because the systems scale horizontally that as we load on more customers take on more data streams and more feeds that we'll figure it out we'll scale it process more data and still do it in seconds okay anyone else on the volume side yeah I think on the industrial side it's a little bit split so if you look at all our business data which you think about as human generated data what people are entering what jobs you're performing on behalf of our customers that's probably under a petabyte so the challenge there really is the data integration because it's so fragmented that having the same semantics so you can bring the data together that's a big challenge there I think then the second one we talked about exploration that is true big data where you're just generating massive amounts of of seismic information that needs to be analyzed traditionally by humans so you're trying to get away from that so that's truly in the big data sphere and I think the third piece is real-time operations when you're actually performing drilling or operating machinery on behalf of the customers that's when there is a desire to analyze the data in real-time and if there is some segment you need to act on you're able to do it right away so you prevent any kind of downstream damage so we have I think one of each I guess as we look across the landscape of oil and gas just to know where you're going in covering as you think of Hadoop as a platform as an enterprise platform across the industry it's how to ingest volumes of data in batch how can you ask questions queries in terms of you want to query what's happening but also I think where you're going is how can you then stream data into the platform from various devices, sensors well heads, drills, other things I want to stream that data in and be able to correlate it and take action in real-time yes and there is tremendous opportunity there because a lot of this data has not been brought together as comprehensively as this I think it's been optimized within its silo so that's what we're looking to improve on a little bit go ahead for us sorry we'll do the sweep again I guess so Rogers for us we're in the petabyte scale in terms of data but I think what Anil was saying in terms of being able to go across organizations it's always interesting if you're in a company and you talk to a new group and they say well we have a 360 degree view of the customer and you're thinking in your head politely well you really have a 40 or maybe a 50 degree view of the customer right because everything's not really brought together and the benefit of you know Rob and I were talking about this earlier when you can actually start to go across channels understand retail call center line around your customers and putting together that 360 degree it's actually something that's never been done before in a lot of companies and this is the enabling technology to allow that and then the last part the scale of time of bringing things where people were you know are earlier excited about a monthly or weekly you know modeling cycle being able to cut that down to in the intro day and then down to the real-time streaming which is the next opportunity for us that is really exciting so the business value that unlocks so you were saying in some way well go ahead I was going to elaborate on exactly what Chris and I were talking about earlier too the significance of bringing the data together because what we found was we had a lot of really good operational data sitting across all the different silos right whether it was a retail channel it was the customer service channel we did really well at knowing and understanding the traversal through our IVR for example and knowing which ones were dropping into service but what we realized and consolidating that information into a big data platform was now we had that holistic view so now we can see a customer behavior on a website that drives a call into our call center and all of you that run call centers know it's costly right especially when we've actually were successful in getting the customer to a self-serve a channel and then we force them into a call unbeknownst to us we find out there's a process issue a policy issue maybe even a system issue that we can approve and we've actually seen specific use cases where we actually look at that journey mapping cross channel identify source issues and process and in policy make those modifications in near real time and actually see the advent of that we actually we do iconic launches as you know right and normal and normalcy and we actually seeing streaming data coming from sentiment outside of Verizon and notice that there's confusion going on and we saw it literally in real time we're able to change the communication in real time and actually watch the sentiment score and go back down again and watch our call lines go back down that's real time that's real business impact and in the end we can talk about technology and scale and everything else that's great for us technologists remember we're serving the business here and first and foremost what are we doing to increase our top line and bottom line on the business side and that's that's truly where we're headed for and why we do what big data Sam our cluster is not quite as big as Verizon's but you know we we've been able to save do the same business and some added things you know we can move batch compute off more expensive platforms and we can run it faster and we can also do you know cheaper storage we can keep we can get the data that we've got for years but we can keep it for a long time and it gives us a lot of power to dig in that data so you know we're not just doing you know data analytics really up front we're storing data and we're getting we're keeping more data and we're able to really help the customer more with that data just the same thing you're doing it's very cool not only saving money you're working faster and you're helping the business you know how better how much better does it get doesn't get much better so I want to switch to a different topic and talk a little about on on the enterprise side one thing I think everybody's always asking about a hot topic lately security right as you think of security because as you're starting to centralize as many of you said and take lots of data and go put it into one place are you actually making it more secure you making it less secure by putting it together and how you think about security in that context and context if I do and probably you're probably the best to talk about security coming from Symantec so I'll start with you I want to split this conversation into at least two parts there are things that Symantec does as corridor business taking security metadata processing the data detecting, protecting that's one part the application of big data the other side is that in order to deliver these services we actually need our own private cloud and so for us we've built an internal cloud that is based on Hadoop also based on OpenStack to provide these capabilities out to all of the analysts all of the users all of the engineers are creating next generation security so speaking on the first part which is how do we use the tools to create more advanced security one of the biggest things about one of the biggest things about Symantec about the data about what we're aiming to do is we need to take these capabilities and give them to as many developers as many analysts as possible so that they can actually go and be creative they can go and dream up a new algorithm and think oh wait hey now we've got this data and this data and this data and this data we've never seen before but by the way if it's all in the same cloud the same system with the same tools suddenly our analysts our devs can be free to create and so this is where the real transformation is coming this is where we are seeing incredible just discoveries just like yesterday with the with David Epstein's talk that was inspiring in many ways because our devs our analysts are looking and finding things they've never seen before using this tool and to the point about more data coming together when you get additional data coming together in unexpected ways there's suddenly patterns that emerge and machines can discover that could never happen if the data was sitting in silos so for security one side is that we are creating next generation security using these tools now the other side of security is okay we've got this data we've got this data from some of the largest financials largest industrials semantic ourselves a lot of the companies that are here are relying on semantic for protection and so what do we do in order to make sure that our data lake and our cloud and our customers data is fully protected and this is where semantic is really embracing the open nature of Hadoop of the open systems including OpenStack where there are places in Kafka there are places in storm there are places where multi-tenancy may not quite work right we may need to plug in some of our own encryption algorithms but the fact that it's an open ecosystem means that we're able to partner up with Hortonworks with other open companies provide solutions code them up and secure our own cloud platform better and because it's pushed out into the ecosystem everybody benefits so security is very near and dear to my heart it's a big topic but I see a lot of progress in terms of people really thinking about security and the really important thing is people have to really think about data governance because until you have that then you're you know what you're trying to do I think to extend what you said it's almost a good problem to have because the drive for security is coming out of a desire to provide more access to the data than you ever had before and that's what drives the access, the governance in my mind are the two important things for us the theft of data and all of those things they're important on the finance and detail side but less so for us in terms of internal operations so for us it's much more about how are we able to segment and deliver the data so that whoever needs it can get to it and they should be able to see what they should see and not have access to what they should not see so that's kind of the first thing that we are actually we are partnering with you on the initiative and it's because of that and I think the security will problem with get addressed because there is so much focus and common desire to get to a state where we can trust that these data stores as they become more and more valuable as more data migrates to them so I think the next two three years will be really important in that way we're thinking and you're both hitting on the point as you think about governance and security they're really interrelated right of what you can do and how you can go run things like Atlas and Ranger and Nox and everything and start to integrate a governance and security platform the tools that are required to do this are emerging and companies organizations are starting to hook them together in the ways that we really need to hook them together but I think one thing that I took away from the true car talk is that there's it's easy to say like oh it's insecure then let's just stay away but in fact there's always ways to secure like when it comes to hard perimeter based security people may say hey perimeter security is dead all that kind of stuff but if you have really really sensitive data you can stick it in a cluster wall the thing off completely and do biometric scanning before people can go in and do something now at the other extreme you can actually use the multi-tenancy capabilities and basically as the multi-tenancy capabilities emerge roll base access control type controls emerge then you start you can start tuning and taking down some of those hard walls and using some of the more flexible mechanisms but it's a journey and it's always solvable it's just what kind of trade-offs do you need to make and do they fit your business it's funny and Rogers we've the experience that we've had we partnered early on with Hortonworks and SAS around their high performance analytics and also use machine learning and spark as well but what's really interesting is for data scientists in the past they would kind of use their packages bring their data to their desktop do their analysis and then operationalize that with Hadoop and with the Hortonworks stack with this whole ecosystem the data sets first of all are becoming reusable being brought into a centralized location and then used by the different divisions but we actually set up virtual machines inside of the architecture and hardware right in on you know so we run like 80 gigabit per second connections those data scientists are now migrating and wanting to run all of their jobs in that environment and you know the state of that is that there will be yes they'll be using their laptop but they're going to be remote desktoping over to VM running the applications at lightning speed and being able to operationalize the security for the company now suddenly becomes all of that much better because that laptop is not going to get exposed somewhere everything is running inside of the you know in the perimeter as you were describing David and the security now is this has massively gotten better and you made a comment on data scientists right and I know you know sometimes people think hey if it's a dupe how many how many of those can I go find and can I open the platform up to and I can only have two or three people on the platform and you shared I think but the other day of how many people you have on the platform today right who are Debs right or accessing it can go do work on it one of the key things for us semantic had acquired a team of data scientists from Boeing and one of the first things we need to do is to onboard them quickly and so with in very short order we went from 20 30 users to now 200 250 data scientists who are actively on the system making security discoveries and so for us getting this tech into the hands of as many technical creatives as possible so they can find the algorithms and so it's awesome when it happens and it's amazing what gets created when it happens I was saying the early versions you know we absolutely had to do the fortress and take care of your data and protect it you took external measures the internal measures are coming that's awesome the power that we're seeing is protecting the data but letting a lot more people get into the data the marketing creatives that you wouldn't necessarily think they're geeks or they're they may not even be able to spell statistics three out of five times but they have an insight into the data that's just boggling I don't know if everyone got that one so you know it's like they start doing things that I'm looking at why did you do that and they come out some really cool insights and it makes a difference it's really exciting to me that the more and I really like the business people when they start really able to get in safely and look at data in really odd and creative ways it's really to the extent that there's a lot of thought around how do you get the right number of data scientists but even when you look at when I personally look at a company like slumberj it's a large engineering company where we've had success is identifying kind of the hidden factory of analytics so there's a lot of people embedded in the business lines that are really performing analytics but in an isolated way or they're one of many different functions put together I much prefer being in a central pool where they can still serve their customers but then they're in a community that talks analytics and talks about the technologies they care about so one way of kind of increasing the team has been to take existing engineers and existing people with a quant bent of mind and then giving them just enough additional training and then deploy them I mean that's a in terms of the internal data scientists or you know making that citizen democratizing the analytics across the company you know Rodgers we have learning enablement involved we have strategy involved the technologies are all there right it's now all about getting the people so that you know individuals in the company can show up like rock stars and really be able to take advantage of these technologies so we find the organizational part is really the part to think through at this point not necessarily technology because that's been provided by Hortonworks and the partners that's interesting because you described we've got more existing users coming through SAS and others on the same platform access in the same data and now newer tools like Spark running on top of Hadoop right in same thing and now you give them a different access pattern absolutely yeah so that they you can leverage those across those different ecosystems all on the same platform which is phenomenal right all accessing the same data in Hadoop in the same place absolutely we had an approach and a philosophical approach at Verizon so that was an inverted triangle where at the bottom of the stack where we have our computer scientists right working on Hadoop but as we go up the layers right we have data scientists and what we're trying to do is enable more and more people at the top of the pyramid that have business functions so exactly the same point marketing folks are really smart in their subject areas we have supply chain folks that are very very smart in their discipline and each of those within our organization have their own data scientists that look in their subject areas and what we've done is we've actually created a governance model that allows them to come in and share their data into a centralized data lake and then be able to foster their insights for within their channel but what we're finding is now we're cross-pollinating some of those subject matter experts where all of a sudden we say wait a second now we've got this channel and this channel talking and then somebody comes in like fraud and says well wait a second if I have visibility across these other channels plus here now I have better visibility marketing goes wait a second now I have better visibility to all the different channels I can market better we've seen marketing effectiveness increase tremendously because we're able to look at the customer experience anticipate what their needs are better through the data itself and then when we market to them be more specific and more personalized to their needs so one thing you're saying just in terms of your journey right because this has been a journey is you probably had a lot of data centralized many years ago you had data marks you had things separated by divisions by units replicating the data copying the data now you've eliminated all that brought it all back together and just give people the access they need exactly there's two very practical things that we discovered when it came to equipping a larger and larger cross-section of semantic employees with these tools one if you're going to be working with something you've got to play with it you've got to understand it and part of playing with it is being in a safe place where you know hey you can do something drop some tables or run crazy queries not produce jobs and most people think of just the metal like the performance and hey kadoops got to run on metal it's got to be close to the store the store has to be laid out in this way but one of the things that we found was in order to enable a lot of these cases we were using OpenStack extensively to provide the VMs in close proximity to the data ponds the data puddles the data lakes whatever it is and we invested heavily in self-service analytics and specifically what that means is the ability at a push of a button to lay out an HDP cluster or a certain zone for people to go in and play so creating that environment encouraging people to get in there try without you know the waters fine jump in is really key and what we're seeing is that more and more people that you may not expect to like you know they're certainly not the PhD data science folks but they're people who are highly intelligent highly technical have the ability to kind of think of new queries and new applications and using these easily accessible tools that can learn and play and build and so it's really exciting to seeing this kind of transformation of an entire company you know education we're talking a lot about internal external education we have a thug group it's the Home Depot users group the point of it I really want it I wanted branding but it's cool we get together a month and you know we have the engineers showing you know frameworks how to use it it may be simple hive queries we have business people we have analysts we have people who aren't even in big data come to the thug group and learn and it's been it's been really awesome and the sharing and you never know where it's going to go to you know when I have an open discussion and that's always a little scary but I encourage you needed that sharing is critical and you can build the talent you need with a few smart people and even you know the really driven people out there they'll find the knowledge and you start to steal your term it works really well and you know and what we're seeing is a lot of times we used to start with a question and everybody goes searching for the answer now with all the insights coming from the big data platform we're actually seeing where there are answers that we found that there was questions that we weren't even asking ourselves so it's tremendously insightful to see wow we didn't know what we didn't know and now with this data across we can actually start seeing things that we never saw before so let me ask is I know we're coming to the tail end of our session here there's a lot of people here obviously very experienced to do but some who are also newer on the journey if you could go back earlier on in your journey and say if I could do something differently do something faster change something as you were just starting off what would you do what insights can you give the audience of what you do and I'll start with you I'll end with you I'll go with you first Rob very good so Ashton Vassalli would say well he's the lead in big data in my organization he'd say we need more people right and it's always about that because what happened was is we were successful in our initial implementation so much so that there was a huge demand for this and so we had to actually create a governance model where we actually do co-resourcing with organizations to bring them in because we couldn't grow fast enough I think number one don't underestimate the demand if you're successful to be successful make sure that your first use case is tied directly to business initiatives that drive a value to the front line and if you do that now you're seeing this plethora of people coming in so I would say being prepared for the onslaught of demand coming in is definitely a big lesson learned okay David DTL I think for me there's a lot of uncertainty when we started this journey you know there's the it'll never work people and it's insecure people and I think going back I would say kill the fear just kill the fear when it comes to figuring it out smart people, cool tech figure it out but just you know haters to the left kill the fear just go for it that's what I would say just get started you'll learn along the way get it started and go okay I think for me it would be really you frame what the journey is and I think there's many ways to talk about it we're going on an analytics journey we're going on a technology journey we're replacing expensive IT stuff but the way I would think about it is you know just organizationally data has kind of become a strong foundation of how any modern company needs to operate and I don't think our organizational structures have caught up to that yet we don't know where to get this data thing and how do you staff it what kind of executive level sponsorship it should have so I think framing that that if we do this right it's going to enable almost every known and some unknown processes within the company if you frame it that way then I think it sets the right context for the journey and then there will be many consumers analytics will be one BI will be one engineering will be another so I think that's what I would do next time okay Chris yeah I think the people part is really interesting by the way we have a Thug Group up in Toronto as well it's the Toronto Hadoop users group I don't know what you're saying but instead of killing the fear you think of Thugs but really I think the interesting pieces and Paolo and Victor or who in the audience will talk to that is really just get started we met with the Hortonworks folks and you know you think of okay well I'm going to be with the sales person you know it'll be a while we were up and running on our first Hadoop cluster of Hortonworks in five business days and everything kind of evolves from there in terms of encouraging those conversations absolutely to Rob's point we started off with the business case it happened to be the most successful business case in that division to date in history and so that was a good thing but that really is then what brings the investment of people's energy and time and passion behind that and then everything follows from that so just get started I think as a framework I'm going to be a little less fuzzy so I would start with Kerberos on day one Kerberos early it's hard to switch to then you know frameworks so the frameworks it's a big word with us but we've wanted the frameworks for ETL so that we have the lineage of the files and two, three solving a lot of that for a lot of you that are just starting but to know where it came from how it got there is it in the same shape it was when it began when it started frameworks for user control knowing how you know where they're laying the data they're not exceeding their limits if they are exceeding them the limits give them tools a lot of guard rail type frameworks is what we're going we have gone we're going even more so but if we started that earlier it'd been I'd have a lot less gray hairs well good well first of all I want to thank you all appreciate your time today first thank you for your partnership as customers but even more importantly thank you for sharing with the audience your insights of what you've seen on the journey I apologize for anyone who accosted you here to go share more information one on one but appreciate you all taking the time and sharing that with the audience thank you everybody just exit that way alright so now coming up to our final session of the day so always a great panel going through this and hearing all their insights now what I want to do is we'll pull it all together I'd like to have Jeffrey Moore come out so I think many of you probably have read his book over the years or spent time with him have maybe even heard him present in other ways in other contexts but I think it'll be really interesting is to go through as you think about Hadoop and you think about this journey and everything we just talked about with all the different customers and users here as they think of their journey how does this all play out so with that let me bring up Jeffrey first of all thank you very much good deal I was here three years ago at the summit of 2012 and at 2012 it was much more of a gathering of the faithful we kind of like hold up your thing we are saying is give Hadoop a chance that sort of it was an early adopter group it was very very exciting and it is new technologies are very exciting at that stage we're clearly in a different place now as you were just listening to the panel just before this we're now at the use case stage of the new technology that means we're across the chasm we're not yet perhaps in pervasive tornado land where everybody has to have Hadoop tomorrow morning at 9 o'clock but it's pretty clear right now that there are compelling use cases that are driving the use and adoption of Hadoop going forward and that means that everyone in this audience has become a little bit of an ambassador so you're kind of an amphibian part of you the fact that you're in this room the fact that you could listen to Tim from Flurry talk about his technology means you speak some version of technology but at the same time as every member of the panel was kind of making a point just a minute ago we need to bridge this to the business use case so the use case is this really interesting time in technology where it's a meld of disruptive technology and a business process or practice and how do they come together and how do they make it work and in order to get organizations to be able to take advantage of that opportunity we need to be able to have a vocabulary for talking to the line of business side so I want to talk to you about two frameworks that I think this is how business people who frankly wouldn't know a spark from a hive would be able to say okay but I do understand these things and I think there's two things that business people get and I want to use them as a way of framing these use cases so that when you go back you can position them successfully with your colleagues to be able to accelerate investment and what is obviously a really really exciting set of capabilities three waves of investment even if you were in a cave for the last 30 years it would be hard to miss these the first one actually just began with the PC movement in the early 90s we added to that client server ERP we added to that the internet all of that happened in the 90s and it created this amazing set of global systems of record that went around the world that let us completely re-engineered the global supply chain it brought China into the world economy it brought India into the world economy it has completely reconfigured the planet so that was a huge and those systems of records are still very much in place right now although for the last 10 to 15 years they've been less engaged with than the next set of systems we spent an enormous amount of money an enormous amount of transformation installing these things in the first place and now people are kind of going well they're there and they've actually been extracting money from systems of record investments probably ever since about 2001 I mean the biggest technology of the first decade of the century in systems of record was VMware you could virtualize computing you could just make it cheaper because it was just like look we just need to now run this thing we're no longer building these things we want to run it kind of like the interstate highway system this decade it started in the consumer world in the last decade but it's now hitting business world this decade is a second set of technologies and it starts at the edge with smart phones instead of PCs which has just hit so fast and been so transformative when you add smart phones to software as a service in the cloud and you add public cloud computing you get a thing that we're calling systems of engagement so systems of engagement are very very different from systems of record every business person understands systems of engagement they get it they watch their kids doing it then they start thinking about their employees doing it then they start thinking about interacting with customers you heard Chris and Rob from Verizon and from Rogers talking about the customer experience Sam from Home Depot same thing it's all about customer experience in a digital world and these things have radically re-engineered the capability to serve people and right now it's predominantly well it started in media started in entertainment it went to social media it then clearly went into retail it's gone into consumer but this is just the beginning this is why I said 2010 to 2020 it really probably should be 2025 this is going to change healthcare this is going to change education this will change civil civic services any kind of social services there's homeless people in San Francisco have smart phones okay so if homeless people have smart phones think about how much better you could do social service the world is really changing and the truth is our ability and there's all this data that the panelists were talking about that says look we can get deep insight into what's going on in these service relationships into the state of the customer's mind into the state of the customer's body the location of the customer's body think what Uber does it's an amazing system of engagement right so lots and lots of stuff going on I think this is the action of the three things I'm going to show you I think this is the hottest ticket in town and will be for the rest of this decade but we're seeing a third one I want to put it up and the business people in certain industries are hearing a third one and that's beginning with the idea that smart sensors now smart sensors is just now starting to profile at scale smart sensors have been around forever but these have to be smart sensors for free they basically have to have almost zero cost and they have to be ubiquitous and we're not there yet but you can certainly see these smart sensors beginning to proliferate you put them together with analytics and machine learning you put together a private cloud to put that together and now you have something called systems of intelligence and David when David was talking about from Symantec was talking about a bunch of their business and their trend detection kind of work that kind of stuff it's still largely working with systems of engagement but you can see that that's going to go up to systems of intelligence too and securing the energy networks, securing the internet itself whether that's there's a lot of folks at work in that I've been on the board of a company called Akamai very much involved in systems of intelligence so it's coming but I don't think it's the hot thing yet I think we ought to keep it in mind and I just put these three things up because I think these are three trends that any business executive who has responsibility for allocating resources understands and I think that you could have a very clear conversation with your colleagues around which of these waves is the most important wave for us and our company to engage with okay you're going forward so customer satisfaction was clearly a critical wave for Sam from Home Depot and for Chris from Rogers and from Rob from Verizon but it wasn't for Anil from Slumberger they're not about like clicks on a click stream some version of either systems of record probably combined with systems of intelligence going forward so different companies are going to have different interactions so just that's thought number one the other thing I want to give you is a really simple framework but it's just so helpful for clarifying why are we spending money on this technology again what was the point okay so whether we're investing in systems of record or systems of engagement or systems of intelligence we can invest at three different levels the first level is look I just want to invest at the infrastructure level I am going to take the new technology and I'm going to do something very old fashioned but I'm going to do it much cheaper much faster much better Sam from Home Depot made a couple of comments like that he said you know we can just store a lot more data it's cheaper we can compute we can do this now we're not changing we haven't committed yet to change our operations or to change our business we can just do our stuff better and those investments are sponsored by whoever is the person in charge of the technology infrastructure we'll say the CIO and there's a couple things for me to understand about doing business at this level one is the budget is already there CIO's budget is typically 1-2% of the total enterprise budget so it's not a ton of money compared to all the money they get spent in the enterprise but it's preallocated this is the CIO's money and if you can bring as a vendor or as a service provider if you can come to that CIO and say look I can do a lot more for what you want to do today cheaper I can help you with your ETL problems or whatever you know that's great and the good news is we don't have to talk to the line of business people this is not about as far as they're concerned this should just be a transparent event I'm just going to help you pull more money out of your budget because that's what your CFO has been pounding on you to do for the last you know end number of years so that's one place to absorb the new technology relatively conservative bring it together we can do that the next level up is saying now hang on that's not what we want to do what we want to do is change our business operations we want to be you know we want to be more effective we want to be whether that's effective at mining assets or oil wells whether that's effective at responding to customer complaints whether that's effective you know managing our workforce whatever the heck it is we want to get more effective we now we're involving the operational side of the house and now it's CXOs whether it's the head of sales or the head of engineering or the head of manufacturing or the head of customer service or marketing or supply chain these are people who are saying I'm under pressure to re-engineer my function because my company in this new world is underperformant you know if somebody you know a rising guy was talking about you know he was did a launch and in real time he heard uh Rob was talking about he heard uh uh Rob was talking he heard about this uh you know tweeting about what's going on I don't know if you guys there was a wonderful viral video about five years ago called United Breaks Guitars for those of you who are imprisoned by United Airlines and I am it was an amazing video but nobody united apparently watched it for like days right so the operating model really got out of control and so in this new world there's a lot of pressure on the customer facing functions in particular particularly around systems of engagement to change the operating model well now what's interesting is if you're part of the ecosystem that wants to serve this problem the good news is this is not one to two percent of the enterprise's budget this is the other 98% this is the operating budget you know this is the marketing budget or the engineering budget or whatever if you can change the productivity of the marketing or the engineering or the effectiveness of that organization then you can ask for a share of that pie and that creates much much more impactful returns on both sides but there's always a but with these things you have to negotiate the conversation with the line of business side and that means you have to speak both languages that's why you're so important to be here today you have to be able to speak business you know Sam was really surprised at the marketing person who couldn't spell statistics three out of five times could still see an insight but that's the point we are not all bilingual right so this ability to frame you know to understand the impact of what does it mean to say read schema upon read instead of schema upon write that's a profound idea it doesn't mean anything to anybody outside of this room but if you could essentially say well what it means is what you don't know today we can figure out tomorrow well that's pretty cool but now related to my marketing problem sales problem or whatever and then the final thing you can do with this model is you say you know what this is all very well but these people are still essentially paving goat paths I mean we should once you get this new technology you should just like completely rethink the planet you're in the wrong businesses the way we do business is wrong I mean the most obvious example for those of us in this room is Uber once you actually use Uber the entire transportation industry is organized wrong it just needs to be Uber in fact maybe my whole life needs to be Ubered right so in other words when you're doing business model innovation not only can you get the operating budget you can get people to put vast amounts of money into underperforming businesses Uber is worth I don't know worth 40 billion dollars or whatever it is but that's a lot of money to spend and now by the way it's the CEO and the board and the investors and what's going on so what's important about this framework is as you think about projects as you think about opportunities that you're looking at I'm going to take you through a frame that's going to say okay I want to contextualize them then in two dimensions I want to contextualize them around is it an infrastructure play is it an operating play is it a business play that's kind of going up the triangle that's the columns I'm going to take you up the rose are we going to interact with our systems of record which are largely well established and what we're going to do is repurpose that are we going to interact with our systems of engagement which are kind of emerging so it's all going to be kind of one mush at a time are we going to interact with our systems of intelligence which are really I think still somewhat futuristic so just to give you a sense of how this because I would love for you to be able to use this simple spreadsheet as a way of saying this is how we're going to contextualize our discussions about what resource allocation in 2015 we should make to a technology like the Hadoop stack and all the various technologies you've been seeing and you're going to see more today so I can remember in 2012 there was a keynote about using Hadoop for ETL for data warehouses today that feels a little bit like you know I have an electric broom I can sweep I mean it doesn't feel very good but it's classic at the beginning but yep it worked and by the way there was a guy from Sears and he had a window he couldn't meet his job window he was literally he was really up against it so it was actually a very big deal today it just feels like that's a sensible way to make some money it's not going to change anybody's life but not bad right the systems of records start getting really interesting when you get to this yield optimism or Ron that was showing that case study but the manufacturer where I'm looking at my data and I'm taking I've always had the data from my SAP system or my manufacturing execution system factory floor stuff but now I can get data from a whole other set of resources downstream customer service use you know cases stuff data from outside of the corporation coming back in yield optimization becomes a really really interesting idea so if you have a yield optimization problem in your world if you're saying look I literally I'm in a process-oriented business model I need to improve my yield the thing that Hadoop changes dramatically is it changes and remember it was also saying and so was Sam you can go horizontally across all the data sets you know how enterprise have great data in silos for sure and now all of a sudden it's like and then we will that SAP will solve that and neither did Oracle this always silos so Hadoop is this great leveling thing and yield optimization is your ability to make these correlations across those data sets really really powerful idea so I think there's a lot of money in that square and then the business model changes what's happening in this new world as we have more and more instrumented relationships with our customers is that customers and particularly in societies that are essentially enjoy the benefit of rule of law which we take totally for granted in our country but if you go around the world it's not that common but in a safe society that's organized by rule of law increasingly the customer is saying I don't want to buy your product I trust you enough I don't want to buy your product I want to buy the outcome of your product I don't want to buy a drill I want to buy holes in metal and I want you to be responsible for that outcome and more and more companies are moving to this thing they don't sell aircraft engines always now they're selling air miles I'll sell you so many air miles well when you go to a product what you're doing is you're taking a product model and you're converting it to a service model you're moving execution risk from the customer to the vendor which is a huge huge change in the value proposition but you've got to change your systems to be able to do that and now all of a sudden you're playing a service level agreement business model and you need a whole new set of data sources to be able to populate that model A to stay on top of what you've committed to and be able to understand and control the sanctions that you may have just imposed on yourself but increasingly for the kind of modern yield optimization going forward you're going to need that as well so systems of record are a rich source of Hadoop style data opportunities particularly when they've been siloed and then across the enterprise systems of record so if we move up to systems of engagement so systems of engagement when they're coming into the enterprise right now the first thing that happened it happened with the iPad well the blackberry was the first example and you said okay we'll support blackberries and that was it none of that other stuff and then what happened was so the iPhone did not penetrate the enterprise the CIO said no I know about Apple no don't do that the Heisman you know but then the iPad came out and the board of directors and then the CEO and then the head of sales and all of a sudden everybody had to have an iPad and that broke open the floodgates and that's when the enterprise went from saying the standard unit of mobile interaction is a laptop to the standard unit of mobile interaction is a smartphone and that is now so the first thing you could just say as well I need to manage this new infrastructure it's got a bunch of log data streaming kinds of things I've got security issues I've never even imagined before I have anxieties about data exfiltration I've got a whole lot of stuff I need better bring your own device management and that's inside the IT guys budget again that's not you don't have to talk to the operating side of the house I have to get control I've got to get my arms around a very difficult problem I need better analytics I need to see what's going on I need to have streaming data I didn't have a source for analyzing that stuff now I need one having said that the next place over is probably the hottest spot on this entire graph and that is systems of engagement the customer of engagement in the middle of the operating model so whether it's real-time wards or churn-defection detection or mobile check-in at the airport or Uber or Airbnb any of these things it's all this issue of it's happening now the amazing thing about events that happen on the internet is they always leave footprints as increasing numbers of politicians are learning there is no such thing as a digital step without a digital footprint so that means for the first time you can actually see what was historically invisible to all prior generations on this planet which is literally what happened when you were in the room it doesn't matter what room it happened in you can see it there's a log file somewhere if it's digital there's a log file there's a log file you can see it so just the the immensity of that idea really to improve interactions, engagements effectiveness of services it's just staggering no matter how much you milk that cow it's not going to run out of milk any time in my lifetime so it's a huge, huge opportunity to take which and then the issue is it's so big that you sort of have to figure out what if I just try to put mind the systems of engagement I'm going to die I'm going to just ingest myself to death and then I'm going to figure out the conversation you want with your colleagues is what in our business if engagement is important whether it's employee engagement customer engagement, consumer engagement patient engagement, student engagement whatever the heck it is what are the moments of engagement that really make the difference what are our moments of truth if you take the idea of systems of engagement and moments of truth and it's called a moment of engagement what are the moments of engagement that if we could re-engineer it would really change the impact of our company on our customers that's the key that's that operating model meets systems of engagement how can we change our operating model to go forward and again without the Hadoop infrastructure without this ability to sort of detect the pattern you can't act on the pattern until you detect the pattern once you detect it then you want to alerted to it have it monitored and all those kind of things all that's the operating model now that then take it to the third one well if we can do that how do we change the business of health how do we change the business of education how do we change and all of a sudden you have an opportunity to create holy new structures you don't go to the hospital maybe you just turn on your PC or turn on something and do telemedicine maybe you have a little something here that talks to a little something here that talks to a little something in the cloud that talks to somebody at Kaiser the whole notion that we can re-engineer services personal services civic services through systems of engagement it's huge but again impossible to do without big data management I mean these are all any moment of engagement is going to require big data management and then the third idea coming to a theater near us soon in this world of systems of intelligence which you might call systems of engagement as opposed to I mean systems of engagement are about customer engagement systems of intelligence are about systems improvements it's about making systems more productive not about making the customer experience better so the middle row is about making the human experience better the top row is about actually making the systems just work better at scale so I need so I get massive amounts of log data how do I just even ingest it that whole thing that Tim was showing around both about showing all these ingestion tools data lakes and then you're going to have data oceans and then you have data universes all that stuff there's a big infrastructure problem at the top but again if you're just storing the data that's the IT persons problem that's the 1% of the budget we then convert that into business outcomes particularly with systems of intelligence we're extremely interested in things like predictive maintenance and catching issues like for those of predictable maintenance which is called the bay bridge east wing with broken rods we apparently have gone from predictive to predictable because apparently these things are going to be terrible for the rest of time but predictive maintenance can we intervene then and then the notion can we do contracts and say look we'll change the business model smart building we'll be the smart guys we'll take responsibility for getting the energy yield we'll put in place the systems that will modulate all the capability starting with tiny little things like the people the google guys with the nest and now the honeywell guys with the lyric and these are these home thermometers that are intelligent thermometers and then you start seeing the home security people want to get together with the cable people want to and I'm sure that Rogers and Canada is going to have all these businesses at some point they're very entrepreneurial but all of that would be in the upper right hand cell around changing the business model leveraging systems of intelligence so the purpose of this of this slide is to say what are we trying to do and who do I have to get to come to the table in order to get it done so there's no point in trying to drag the operating side of the house into the first column it's an IT problem just deal with the IT situation fine but if you're going to the middle column that's when you have that's when this use case business is so important so the use cases you're hearing yesterday the day before and today are all about this middle column largely the ones on the right are frankly pretty dramatic and the way I would position this thing going forward is I go back to this model I would say okay if you look at the top of this model this is where the visionaries kind of hang out these are people who are going ahead of the herd and so if your company is one of these this is a huge deal and by the way it trumps the other two because if you change your business model you're going to radically change the operating model and you're going to radically change the infrastructure model so if you're starting at the top of this triangle it's the whole triangle I think for most people the pragmatists in this world they are use case driven I think they're selectively modifying the operating model around moments of engagement or yield optimization opportunities that are at the operating unit level and then I think for folks that are in the conservative point of view it's like well you know what I don't have the most enlightened management team in the world I wish I did but I don't but what I am doing is I see the value of the technology even inside my own p-patch and I'm going to use it inside my own p-patch going forward so in terms of risk and reward it's just a gradient highest risk, highest reward to lowest risk, lowest reward get a sense of your culture get a sense of the dialogue in your company because you're going to be part of that dialogue your job is to catalyze this conversation and as you do that I think the way to think about that is to say look is my company all about being the disruptor because if that's what we're up to then we have to take on the entire triangle and we've got to take it on we have to have a plan for it now even if some of this stuff doesn't happen for two or three years if we are the disruptee as many of us are in industries today I think that's hugely important to say a disruptee has to modernize their operating model immediately that's how you survive a technology disruption you don't try to emulate the business model you respond, you modernize your own for example the taxi industry in San Francisco is not going to use the Uber driver model but they have released a mobile app for people to call taxis that's an attempt of modernizing the operating model not changing the business model or if you're just undisrupted then just take advantage of the stuff just take advantage of the stuff and that should work so that was my desire my desire was just to say look ask this question ask this question of yourself ask this question of your colleagues and then use these frameworks to help you channel a successful investment in Hadoop and Hortonworks and what not with that I want to say thank you very much I certainly enjoyed having a chance to talk to you thanks a lot, appreciate it thank you Jeffrey, thank you very much for sharing that the insights always great to see that and to hear as it goes through the insights of what we're seeing in Hadoop so we're coming to the end so I want to just leave you with three things as we close out and a great show obviously there's more left in terms of the abstracts in the other sessions but three things really summarizing I think what we saw through the different keynotes so first, Hadoop is transformative in terms of what we're seeing inside of businesses and inside of industries and many of the customers and Jeffrey just hit on this theme as you think of Hadoop and what it means to go from a physical supply chain to a digital supply chain and how you start to look at that and how you start to manage the data in that digital supply chain and the insights you can get across it and you heard a lot of that, whether it's from GE talking about Big Iron meets Big Data Schlumberger, what they were talking about and others, this whole concept of I need to manage, automate and get the signals from the digital supply chain then what can I do with that second, you heard a lot especially from the panel today talking about enterprise grade and what it means to be enterprise grade as you think of security or governance or deployment building into a hybrid cloud and how do you go do that and I loved the comment from David Lin which was just kill the fear and get started the opportunity because of what's there today you can just get started there's enough there, the transform of capabilities there, the enterprise classroom security and operations and governance are all there, so just get started the third is there is a rich ecosystem of providers, partners, companies all built around what's happening there today are the sponsors of this show the people and the partners that enable this and make this possible for all of you to be here, so I want to give a big round of applause to all of our sponsors and everyone and a special thank you to our diamond sponsors, the ones that participated in all the keynotes thank you gave you a good view of how this ecosystem is coming together you know why we as Horton work spend so much time with many of these companies and why they are so focused on what's happening at Duke and how they go build an integrated solution for all of you so with that I want to close down the keynotes say thank you to all of you for joining us on this we'll see you back here next year in San Jose thanks everybody, bye bye