 That drives batch interactive and real-time applications simultaneously on this centralized data set and that was transformative to really drive the adoption and use cases that Hadoop can apply to and as are more importantly the value that it can drive for both the ecosystem and our enterprise customers. And it's through your efforts that that's happened. You've also really, really driven the enterprise services. Look at what you've accomplished with operational capabilities of Hadoop. Think about what you've been able to drive with security and how far you've advanced the security ability to manage data on a highly distributed scale out basis. And look at what you've accomplished even in the last 12 to 18 months in the areas of data governance. So this conference needs to start with the congratulations and a thank you to everybody in this room and everybody who wanted to be. So thank you. So the opportunity that sits in front of us is simply staggering to me. And what we really had the ability to do for the first time in the last 15 or 20 years is to have a next generation data architecture evolve and emerge because of Hadoop. And with that we'll have a modern data architecture that unlocks tremendous value and opens up use cases and business models that have never been able to be achieved before. And I think there was a couple of weeks ago, Tom Davenport wrote a great article on the Wall Street Journal. I'd encourage you to go take a look at it. But essentially he said, as he wrote in the article, he sort of summed it up by saying the shift to the new data architecture is playing out in real time and it will bring the power of open source tools and big data to enterprise in a way where none of the existing capabilities go away. And I think that was always what was so powerful when we started looking at the Hadoop opportunity and the Hortonworks business model. We saw Hadoop as a way to really advance the data architecture, but it didn't have to be a zero sum game for Hadoop to win. In other words, another data platform space didn't have to lose for Hadoop to win. And I think the reason behind that is the opportunity that really fueled the opportunity for all of us with Hadoop is being driven by the explosion of data that started three, four years ago with the new paradigm data sources. And as those new paradigm data sources have grown and increased in terms of the number and volume and velocity, it just continues to, one, certainly put pressure on the traditional data architecture of the transaction systems. They've become very siloed, very constrained. But this new paradigm data source, especially the early adopters have really learned there's tremendous value in bringing all this data together, under management, especially in essential architecture. That number one, it can open up tremendous cost optimizations from their existing platforms. Number two, by having a centralized view of all the data sets, you can have insights about your customer and supply chain that were never, never achievable before. And then most importantly, what this has done is it's opened up a transformation in business models. In today's traditional data architecture, you can only get value from data principally post transaction. And with Hadoop, you can transform your business model to be able to be interactive with your customer and supply chain before transactions ever happened. And we've never been able to do that before. And it gives the enterprise the ability to create value within for their customer, before their customer ever transacts anything from them. And that's what really represents a massive opportunity for us going forward. And it's because, again, of the work you've done to make enterprise, excuse me, Hadoop an enterprise viable data platform, right? Clearly the ecosystems embraced it. There's thousands of ecosystem partners who have created reference architectures and brought Hadoop into their customer base and increased the value that they've driven, many of which are here today. And I really encourage you to learn what they're doing with their solutions. Again, all our sponsors who've made this platform and this summit available again, thank you very much for that. But the underpinning component that's truly enabled Hadoop to function as a market is the commitment to open source. If we weren't all committed purely and truly to doing all of the work in advancing and innovating Hadoop in open source, the model would fracture, it would completely break down, and we would never have the opportunity to get to the enterprise level maturity that we've gotten to with Hadoop in general. And certainly the ecosystem would not have been able to aggregate to a common platform. So it's imperative going forward that we stay committed and focused to purely driving Hadoop through the Apache governance process through the Apache Software Foundation and stay committed to that. Okay. So I think let's talk a little bit about where the market is. Gartner did a great job a few weeks back publishing a report. I thought it was terrific. It had some had some very, very good data points in it. And actually it was met with a lot of controversy, which I thought was good. There were some who interpreted as negative. There were some who thought it was very accurate and positive. That was the camp I in general and Hortonworks also fell into. But I think it's also important to put that data into context. And what we've done is we've overlaid the data that I'll go through in just a moment on where on how many enterprises are in the process of where they are in their adoption curve of Hadoop. But we've laid that over the classic technology life cycle adoption curve that Jeffrey Moore presented so many years ago. And let you put it in context. So the survey said so to speak that 26% of the enterprises that were surveyed are either deploying, piloting, or experimenting with Hadoop. Within 12 months, another 11% plan to invest. And within 24 months, another 7% plan to invest. So you think of this as within four or five years that 44, 45% of the market is already engaging with the platform or planning to make, engage with the platform. And I think the thing to recognize that it's so powerful about Hadoop is that once it moves in, it's kind of like putting electricity into the house for the first time. That so much becomes possible and so much value gets created. And so as Hadoop moves into the enterprise and continues to progress, it becomes prolific very, very fast because of the value transfer that it generates. So I think we're just starting into an industry that's going to be transformative. And I think if you'll draw a parallel to where the relational database was five years into its formation, which at this point was 25, 30 years ago, I think arguably Hadoop is on a much faster adoption and ramp curve. So really, I think it's important to bring in another entity that's done some tremendous research holistically on what are the CIO's goals and initiatives and really where are they going to spend money and what are their strategic plans for IT spend and IT patterns. And Thomas Delvecchio as the founder of ETR, I'm going to ask him to come join me on stage. Welcome. Thank you. Thank you very much for having me. And Thomas is his firm. I'm going to let him in a moment describe a bit about the firm and the model. But I think they've done a tremendous job of taking a very, very broad and deep view of what is the core strategic initiatives across multiple spaces and what are the priorities. So with that, Thomas, if you don't mind telling our colleagues a little bit about, first of all, the ETR model. Sure. And a little bit about how you've approached the initial studies that you've engaged in. Absolutely. So Enterprise Technology Research, ETR, we're a survey-based Ford looking market research firm. We have 3100 CIOs globally that take standardized surveys. So what we really do is we perform then quantitative analyses on what is primarily qualitative data so then we can provide those analytics and the visualization tools back to the survey respondents as well as institutional investors who are our subscribers. So the institutional investor is your client? That's right. So the CIO is involved for free, but the investor is not. Excellent. So I think that obviously shows a great level of just quantifiable objectivity that comes in. So if you don't mind, can you give us a glimpse of what the quantitative analysis says in your surveys about the traction and where Hadoop is in its life cycle? Yeah. I mean, we're right now looking at 2015, likely as the year we look back as we're open source and Hadoop really took off. It's no longer just a conversation. The data indicates it's going to be this year. So if you look at the 274 vendors across the 24 sectors that we track, there is no better spending priority and Ford looking spend than for Hadoop. That's impressive because in one year ago in April 2014, it was 75% less than where it is now. So it's gone to this pole position almost overnight. Well, that's great news for everybody in the room, no doubt. So is the broader market begins to play out? What does this tell us? I mean, when you look at big data, right, as analytics, business intelligence, if you will, big data, right? It's an amazing group. It's a powerful group that's really, really accelerating, but no one stands out further than those focused on Hadoop and open source. Excellent. Well, I'm excited to hear about that. And obviously, I love this slide. So it's a little self-serving as you can imagine. But tell us what this chart tells us. Yeah, I mean, I can't blame you for loving the slide, but you said love the slide. So we've been doing this for six years now, 18 standard eye surveys, 3,100 respondents from large enterprise have participated in these six years. The best forward-looking spending intention scores over six years from large organization respondents, the top two spots, Hortonworks and Hortonworks. And if you look at this slide right here, you're looking at over, look at that, you're looking at over 80% of all respondents plan on adopting or significantly increasing spend in 2015 on Hortonworks. And I think what's, when you look at this in context, it's important because it's actually better now than the data was years ago for Salesforce, Amazon EWS, Workday, and we all know what success stories they are. Well, excellent. That's very powerful. Anything else you'd like to share with the audience before we go? No, I think people should really take the opportunity, this in the next few days, to really soak it up with both the vendors and the technologies, to really think about why you're doing it and who you're doing it with. Instead of just saying, hey, I'm doing a dupe. Who are you doing it with? What are you trying to accomplish with it? That to me is the key. Excellent. Well, this is the place for them to come find that out. Absolutely. Thanks so much. Thank you very much. Have a great day. Thank you. I think another very important third party that you should also go engage with is to go look at Mike Walterry's report on Hadoop is a must have for the large enterprises. He'll be up later this morning in a keynote talking about the keynotes entitled adoption is the only option. There's a lot of great value that you'll see. Please stay tuned for his keynote. I think it really just makes the point that Hadoop market has formed the realization that value can be very transformative. I think the powerful thing that we all take away from this is we're not just limited to the value in one application category or in one industry, but yet it's much broader and it's much more powerful than that and that Hadoop is actually incredibly transformative to virtually every industry. We're going to bring a couple of industry leaders up in the next few minutes to talk about very specifically the value propositions that they're driving in some of the use cases. But just when you think about retail just a few years ago it was one size fits all, no pun intended, and that it was one message, one product, take to the masses and hope you get the right hit rates versus today with Hadoop we're able to be very prescriptive and be able to take very specific products to very specific demographics at exactly the right point and time in their buying cycle through the exact right channel they want to buy it when they want to buy it. You know when you look at the financial services industry around fraud just one very small use case massive value driver in the terms of trillions of dollars but they were only able to manage their risk across their whole portfolio on at best a daily basis. So was there fraud, was there problems in the pricing of particular security, was it being traded inappropriately? Today across virtually every trade that happens institutionally or retail that trade is able to be surveyed and in real time so that it can be detected if fraud is happening or a pattern of fraud is emerging and the ability to stop and disintermediate that is saving hundreds of billions of dollars that's happening. We're going to talk about healthcare in a little while but healthcare is turned and transitioned in real time from being mass treatment one treatment fits all to being able to be very predictive very prescriptive down to an individual at the symptom level in real time as occur as conditions emerge and evolve we're going to talk about that shortly when John Wilson from United joins us you know I think that of another industry that's become very very transformative as manufacturing in general it used to be that we had we things were built we waited for them to run as long as they could when they broke we dealt with all the downstream issues that came with getting them back online finding parts trying to recover from the service outage that happened the work that didn't get done the work we couldn't build all the different attributes in today's world we can monitor in real time down to the component level what's happening with piece of equipment not only where it is but at the at the component level is there a malfunction that's about to happen and so that we can proactively take that offline on a on a rational schedule basis not interrupt service and fix that component a couple of specific areas very large one of the largest fleets in the world was able to salt was able to save 10 million dollars in its first year on just monitoring and predicting when a battery would go out on its trucks and replace the battery when it had 85% life left in it or excuse me 15% life left in it and it ensured that that truck didn't go out the work was able to always be performed and had no service of 10 million dollars in the first year just on battery monitoring and replacement right we're gonna we're going to spend some time talking with our partners at GE shortly to talk about how they're leveraging the internet of things to monitor and censor many of the industrial internet components another area that we want to well we're going to move on actually and let's let's get into this and you know what we've what we've talked about so far is everyone in this room has has really contributed to Hadoop becoming an enterprise viable data platform you've advanced all the enterprise services in the areas of security and governance and operations you've transformed the core architecture of Hadoop it's now ready to be the next generation data platform and enable modern data architecture right our opportunity has never been bigger than in any space probably that we've played in in our career and may never have an opportunity this big again in our career but the most important thing is what we're doing to drive value back to our companies and our partners and our customers and I want to and I tell you the thing that that I think there is there's not a higher and maybe one other only higher better cause than on planet earth than what John Wilson and his team at Optum United Healthcare are embarking on and I'd like to John to come join us and share with you a little bit about what he and his team are doing to really transform healthcare good morning good morning how are you doing very well how are you excellent excellent thank you so John we've got a great great crew out here and it has been an absolute pleasure working with you and your team thank you and John has one of the best visions of how healthcare is transformed and how we out how what they're doing with Hadoop transforms our health our family's health and what a what a better calling on planet earth than there is than that and bettering of ours and our family's lives so I could not be more appreciative of what you and your team are doing it but if you don't mind share with our with our colleagues here a bit about United Optum what your mission is and and then from there let's let's talk a little bit about what you're what you're doing with Hadoop to transform patient care so thank you for the invitation say most of you will have heard of United Healthcare United Health Group is a 1414 company we have about 170 000 employees revenues last year about 130 billion I work for a company called Optum so United Health Group has two platforms United Health Care which is our insurance business an Optum which is our services business Optum serves about 64 million people we touch four out of five hospitals we touch 67 000 pharmacies and about 88 000 physicians groups we touch a variety of people in healthcare space the use cases which I'm really interested in kind of get excited about when it comes to Hadoop but not so much around the kind of classic ETL offload or how we're going to make a better payment integrity or fraud analytic that's all good stuff and stuff we have to do and stuff we do the thing that personally I get most excited about is when we build tools that help our patients right that's the thing that gets me out of bed in the morning and that's the kind of the story I want to share with you today around some of those analytics perfect so let's just take a let's take a view let's take a view of one of our populations so some of you may know that one of the big healthcare problems in the United States is diabetes so diabetes this year will cost the US economy the American Diabetes Association thinks it's going to hit about 250 billion dollars right 250 billion dollars in diabetes loan to give you a kind of perspective of of our world from from my position I have to manage about 700 000 diabetics right which is about how many people are in this room about 4 000 about a little over 4 000 okay so was that 18 times the size of this room something like that 17 18 times so many better math than figured out maybe a lot of people now it's important because one of the problems in diabetes which may seem somewhat bizarre to folks for such a condition that's so prevalent is that one of the challenges we have is that people just don't take the medications right I'm sure most of you will have you know probably got some pill packets at home and you haven't got you haven't finished all the pills very common problem in medicine very very important problem in diabetes the challenge that we have is if a patient doesn't take their medication then I'm have no chance of improving the quality of care for that for that member and the challenge that we have is that people don't take the medications because frankly you know diabetes doesn't hurt right it's not like something you wake up in the morning with in the early stages of the condition and you say oh my blood sugar really hurts this morning people just don't say that the challenge that we have the thing which we wanted to do was predict who was likely to take their medication and who wasn't now people can say well you can build a regular regression model for that the challenge that we have in healthcare systems that we've all these different types of data we've got structured data out of our claim systems we've got unstructured data out of our charts we want to bring all that data together and what have you been able to do is to bring this data together in the both the raw form as well as some structured forms and start to find patterns in that when we started to build these models about trying to anticipate who was likely not to take the drugs what surprised us was the things which we weren't anticipating we saw insights in that data because we had it together now in a way that we just didn't expect and what was kind of frankly awesome about that is that by taking those models by taking those predictive models by taking those insights we can better anticipate predict who's likely not to take the medication so we can intervene ahead of time and from my perspective clearly it's if I'm sure there's going to be a variety of use cases over the next you know a few days that you all are going to describe in different industries for me the thing I get excited about is if I have the ability to touch someone in a way that no one's ever touched them before with some information some better analytics I'm going to make a material impact in that person's life right John would you say that that's an incredibly transformative use because you can intervene in the progression of a disease right and change the course and outcome of that that person's not only condition but the probably the span of their life right so this was a specific use case you've been very successful with it with what you've learned with with Hadoop and bringing insights holistically to data would you say that in the scale of one to ten you're just getting started at one or ten you've reached a full potential and you've got to go find other areas to create value I think we've always started this journey about two years ago now two or three years ago we started this journey we started it very small we we were doing the classic things we did some POCs we got it going we we got a few successes under our belt but I would say this entire paradigm is still in its infancy regardless of whether it's for the healthcare industry or the larger agency this is still early stages and I think that's what makes it most exciting because it's not just the problems which we know about today what I'm interested in is anticipating the things which we don't know about and being responsive and agile to solve those problems in the future that's what I think the exciting thing about this technology stack is that's what gets us out of bed is how does this translate we've talked about how it's incredibly impactful not only to the industry but to the patient care what about to the business what's the impact that this delivers ultimately back to the business so I'd say a few things on that I mean and indeed for any of you who are taking your medication please take your medication as your physician prescribed it and if you don't go and speak to your physician again for my diabetic population it's very simple if a patient with diabetes has a complication and you know the easiest way to mitigate that complication is to take your drugs keep your blood sugars in the right parameters the moment a patient goes outside those parameters and start getting complications complications are about for the patient and they're expensive a diabetic with a series of complications is about 37 percent more expensive than one is not that's not just the united piece that's a bunch of published studies have proven that so good health keeping patients healthy is good for the patient and good for the system right so I think it's not just around the cost the other paradigm it's hoping in healthcare is around this movement towards different bonus payments which we get as a company to manage our population in a healthy fashion and so it makes economic sense from both the the quality payments we get for some of our clients as well as for the population in its own right but to me at the end of the day it's around can we keep the patient healthy right well I would like to congratulate you and your team your vision and what you're doing to transform hours in our family's lives through technology thank you very much appreciate it have a great day thanks for joining us I think that's the exciting thing is that is john pointed out we're just scratching the surface of what's possible and I think one of the things that that really becomes the driver to this and this is a you know chicken or the ag is the role going forward that iot internet of things will will play and prevail and how will prevail and it's some interesting numbers I'm going to show you but the question that we contemplate um between releases is has hadoop enabled iot to flourish or has iot enabled hadoop to flourish not sure just thankful it is some interesting data points for you by 2020 less than five years will be 24 billion connected devices that's going to generate and and create a lot of data and with that data comes a lot of opportunity to create value there's a number of use cases that are in their infancy right now smart factories smart grids smart cars we just talked about health care smart buildings and how they can evolve to become more efficient create better environmental situations and conditions for all of us I think opportunities are are boundless an interesting data point in a study that that sisco did they they estimate that there is 14.4 billion excuse me trillion trillion dollars in net profit potential by 2022 by leveraging the internet of things and creating the value that can be created from them okay no better person I believe on planet earth to come in and talk about that then Vince Kempese from he's the CIO of GE software and I'd like to ask Vince to come join us on stage hey good morning how are you great thanks for joining our summit thanks for having me folks Vince and and and his team have done some world-class work around the internet of things I'm going to let him talk about that but but he's truly one of the trailblazers one of the visionaries and and I believe will become the icon and really setting the bar for what he and his team are doing around the internet of things and so Vince if you don't mind tell us a little bit sure you know we hear a lot about IoT internet of things the industrial internet can you tell us a little bit about from your perspective from GE's perspective what that opportunity looks like and some of the things that you're doing in that area sure and I think we have a backdrop page to to touch upon this but I mean our point of view is that we see a pretty significant convergence happening a convergence between the physical world and digital world as we affectionately like say it's big iron meets big data and so big iron for us is really aircraft engines locomotives MRI machines you know wind turbines and those sorts of assets really coming together in sort of the learnings from the consumer internet technologies that have evolved over the last you know 10 to 15 years and so as 50 you know we see it as 50 billion machines getting connected in roughly the next five to 10 years and as that happens we're going to see the same level of transformation and business model innovation as we saw when a billion people got connected through it I would offer as the consumer internet so that's our point of view is that there's a new internet forming around machines getting connected and we call that the industrial internet okay a new internet forming wow think about what the first one did what do you think the next one does what's the role of Hadoop in that yeah so I mean as we see it there's really three things that need to work together successfully at scale to realize the potential of the industrial internet you know one is as we just talked about you know machines getting more connected and more data being available today you know with that comes the need for industrial data management at scale and I'll talk a little bit how we think about Hadoop in relation to that and then lastly it's you know secure cloud capability mobility so that you can take these insights and get them into the right hands of the right person in any sort of operating condition when you think about where some of these assets are in the world and what people need to do to service them and so you know what we found is there's a bit of a gap in the market as it relates to serving the needs of the industrial internet and over the last couple of years we've been engineering for our own need to deliver services a platform for ourselves which we call predicts and predicts is really designed to try and solve all three of those things and to do it well we believe that you need an open standard approach and we are excited about the role that Hadoop plays in that as part of the ecosystem and so Hadoop in particular you know for us what we you know we're trying to solve three things in that space is how do you really disrupt the cost to store massive amounts of information more information than we've ever had to deal with before and we see it as like a two times growth rate than what we've seen on the consumer internet you know the second part where Hadoop really fits into this is breaking down those data silos and so we've seen data warehousing technologies evolve over the last 10 to 15 years and they weren't really designed or suited to support you know 50 billion machines getting connected so how do you enable people to reason over much larger datasets than they've had to before and enable data scientists to stitch together new relationships and new insights that we didn't know mattered and so that's really where Hadoop fits into how we think about the industrial internet excellent what are some of the benefits that the industry in general and specifically your customers get from this the initiatives that you're putting forward yeah I mean as you rightly said I mean this is all about outcomes and how do you really transform how industries operate and work and for us that's energy as John mentioned healthcare aviation oil and gas etc and there's a couple of different ways we deliver these outcomes I'd say the most straightforward way is around asset performance management and so at the end of the day we're providing this big iron this technology for our customers as part of our install base and our goal is to help those customers pump more iron right so how do they get more reps out of it more reps being more reliability more efficiency more performance and if we can enable no one scheduled downtime and no unplanned outages it as you rightly said it's a 15 trillion dollar opportunity over roughly the next you know 10 to 15 years which essentially says we can collectively create a new digital economy that's the size of the current U.S. economy which is tremendously exciting and not to mention the societal impact and the other sustainability attributes that come out of doing this successfully at scale so it's never been a more exciting time to be in this space and you know we believe that there's this next level of internet forming that's going to unlock some of these possibilities that's incredible we can't thank you enough for your vision and your leadership and the execution of you and your team to make this reality now thanks for having a great day thanks a lot take care thank you I need to wrap up I'm in trouble I'm way over my time I apologize but but most importantly thank you for coming to summit thank you for investing in the time thank you for making Hadoop the enterprise data platform and helping us focus on driving the value we can to our customers our partners our supply chain partners and and our families ultimately have a great summit Herb's going to come back and we've got a great two or three more keynotes so Herb thank you very good thank you so that was great right going through what's happening the transformation and just you know some things just to you know take away from that so one you saw one of the slides that talked about the adoption cycle of Hadoop and talking through and comparing it to some of what the industry analysts are saying and you can see in many of you probably are Jeffrey Moore aficionados and seeing okay that's crossing the chasm or where's Hadoop so we didn't say this up front but on Thursday actually Jeffrey Moore will be one of the keynotes who will be here and we'll be talking about how does he look at this adoption cycle of Hadoop how does he compare it to other technologies other industries and what that means and how does he see that in terms of the opportunity and the potential from a true crossing the chasm perspective so I encourage you on Thursday he'll be taking us through that in the keynotes so a couple things took away from it so one you know with John Wilson and what he talked about going through the opportunity to transform healthcare and what that means and that's something that's not just a bunch of servers in a back closet that's actually has an impact on every single person in this room myself included in terms of the opportunity to transform healthcare and have a personal impact to all of us and then thinking of Vince of what he talked about my love is term big iron to big data we're the opportunity to say as you look at the business and say you can re monetize from being a manufacturing company and what you do there but also your manufacturing is now around the data and the insights you can take from that data and provide back to your customers from the sensors and all the information on the big iron so great opportunities in transforming industries and transforming businesses and what that means with Hadoop so now what I'd like to do is I'd like to bring up Ranga from Microsoft and what Ranga is going to take us through you know is his view and vision of where Microsoft fits and what they do in the supply chain but I also want to thank Ranga because we've been working together for about four years now from an engineering and to go to market partnership and usually think of things like Stinger and Hive and what comes out in the work that was done there around bringing Hadoop to Windows right and now running on Windows and then Microsoft Azure and having Hadoop as a service at HD Insights so appreciate your partnership and everything you're about to share with the audience thank you thank you Ranga thank you very much all right good morning it's wonderful to be here it's a great time it's a great time of possibilities I want to talk to you about data dreams it feels like an incredible time it feels like the dream time across the industry we see so many examples of data being used for incredible benefit I say the world is drunk on data the data dividends are just incredible case after case this is driven by three factors first is huge amount of data that's just pouring out of every place it's as if every interaction of humans with the digital system is getting tracked and recorded for benefit every law every tweet every system every transaction can be harvested throw that in with incredible hardware that's available from the cloud on a on a minute's notice you can get commodity hardware from the cloud thousands of cores huge amounts of terabytes very very quickly in minutes combine that with software like Hadoop scale out software that can process all of it in real time in batch and interactive modes that makes it so possible it is as if we've gotten the new fracking of data data that we used to throw away before because it is useless logs useless chatter between people gossip now has become very very valuable because it can be economically transformed into something that's very valuable so the world is really drunk on data we have customers that span these industries doing things with data that are just amazing for their worlds you're familiar with the fraud detection case that's we all benefit from it we all use credit cards we use them safely we know that somebody's watching out and we get calls and texts protecting us from from abnormal behavior we have the new class of these fraudsters who are doing bitcoin mining and so that's the latest thing it keeps on coming then the level of mischief the different types of things that they're doing and there's a continuous data dividend and on fraud protection the industrial internet we talked about that this predictive maintenance we have a customer at this encrypt they have every elevator in the world instrumented they have all this data that's available to them they harvest the data they can they can understand the data they can look at leading indicators and predict well before an elevator is going to malfunction and correct it without anyone even noticing the elevator that just keeps on running magically and on and on across the entire industrial structure we can apply the same mechanism and so that is just enormous data dividend better help we have a customer aero crime in sweden they look at asthma machines that that help patients and they're so dependent on them and they look at iot scenario they look at the metrics that are coming from the devices look at the health of the devices and they can tell well before some device is going to malfunction and call up their customers and say we would like to do some proactive maintenance on your device and to make sure that you're comfortable you're confident that the device will be with you when you need it and wow what a difference that makes for health so cross industries we have these data dividends it's almost as if it's a time for possibilities it's a time for dreaming let me share with you some other dreams of our customers real Madrid is the world's greatest sports franchise they started around turn of the century in in spain the big stadiums the very very impressive team all sorts of incredible records all sort of world cups and everything so they have a stadium the largest stadium is 90 000 seats 90 000 raving passionate fans and you know if you have a century of incredible game you know you have a few avid fans shall we say they have the largest fan base their fan base is 450 million fans they have more fans in indonesia than spain and for a for a small little club that started that played just good football suddenly they find themselves with this fan base that's around the world so every game is watched on the field by 90 000 raving passionate fans and 450 million minus 90 000 if you do the numbers right so they're all wishing that they were at the game and so their day to dream is to get every game accessible to every one of their fans 450 million fans they want those fans to have the gaming experience as appropriate what is the appropriate gaming experience for somebody who's asleep in the indonesia while the game is getting played what is the gaming experience for somebody who's in the middle of a meeting but we should they could be at the game think through that so how many different ways they can have the fans experience the game so they have a very ambitious program of digital transformation and so they they start with fan empathy and and they have social listening this is a complicated architecture chart but fundamentally their dream is to bring the game to every one of your fans around the world and this is suddenly transformative the joy that you have in going to a game and experience the game of experiencing the ups and downs rooting for your team and watching human nature that's available for every fan anywhere in the world appropriately on their devices in the mobile first cloud first world this is just an incredible journey that they're on they use hadoop of course another one acu weather acu weather is in pennsylvania they have this problem of space and time so they can predict the weather they can predict the weather they divide the world into coordinates and they can do nice rectangles and and they can predict the weather on a particular rectangle and and for a certain period of time in a certain granularity and so this is the general problem they've been doing that and they've been putting it on USA today and other publications you know every day it comes out people discuss that and so on but this is largely one size fits all even distribution in the world and this problem is such that if you have more computing resources you can make the granularity smaller and smaller you know smaller and smaller squares smaller smaller rectangles you can predict for smaller and smaller units of time and so suddenly this world of big data and cloud has completely changed their way they look at this thing in a mobile first cloud first world they get they get billions of requests for weather forecasts every day they have the apis running in the cloud there are these automatic you know if you go to a web page you see weather forecast and that comes up and then people specifically search for weather in a particular area where I am and so these requests amount to billions of requests a second so what they have done is to invest in digital technologies to go to a very Pareto kind of an approach that is they say I need to know the weather where it matters so they can look at the people around the world who are asking for weather information in various ways they look at where the action is and they can automatically for that place for that time they can provide great you know the smaller rectangle finer granularity of space and minute by minute they call it the minute cast minute by minute projections of weather and just look at how valuable this is with every weather incident that happens around the world and so no need to watch the bay area it's always boring perfect weather just occasionally it goes up 10 degrees and everybody complains and occasionally there's a bit of rain and all kinds of cars kid but it's boring weather same same thing in Sahara boring clear skies hot but there are places where it matters and humans matter they they they have fantastic visions their day to dream is you know they're going to have sensors slip sensors and brakes of cars that they're going to harvest to understand oh my god ahead of me on the freeway that our cars sliding slow down just imagine that this is not some crazy wacko idea this is this data is available it is just a matter of deciding to do that and making it economical do that and the systems are available to do that we already have systems plumbed in azure to get data from around the world iot scenarios take the collect that anywhere in the world lots of data centers keep them where they are and be able to do queries across them this is fantastic this is available so well beyond the industrial use cases like the case of mining here shown there are use cases that are possible they dream about that zeos this is an entirely different place they are in Dallas so they're concentrating on the dining experience people come to the restaurant they see on their table what looks like a checkout counter they say oh my god they're doing it again now i need to check out by myself at the restaurant too oh my god i have more work to do but then you see this device it shows the venue you look at the menu you order things off that menu you look at things you talk to your friends around the table pass it around consult which one should we order should we share this and that and then you order and then at any time during the experience you can ask for the attention of the waiter or waitress how many times have we you know wanted to get somebody's attention and nobody's paying attention and this is just discreetly nice ask for that and then at the end of the thing when you have decided the tip there's a convenience survey hey how did you like your waiter waitress this is nice i'm in the mood to give you the survey there's they're seeing tremendous results in just that feedback and then of course is the transactional system runs on tried and true sequel servers and other other well-known technologies so this works and you can take a look at that you can do the bi systems power bi analysis at multiple different levels so at the store you can you can ask which waiter or waitress was the most empathetic of the customer they provided the best service and so they can look at that information they can harvest that information they can use that profitably coach coach waiters and waitresses and then at the level of the chain you can look at you know book a rebel who uses these devices they can look at the entire chain and say you know we offer these different choices and people make these decisions and so let's offer the items of interest for people so they can look at that they can look at the changing changing pace and when they introduce new items they can see that so they can do the traditional bi kind of reports dashboards and then for zos themselves they can do dashboards on how their devices are doing how choices are being had you know had and how the presentation is working and all that good stuff so this is so far still sort of traditional but still fed by the cloud but what's more interesting is they do some interesting Hadoop analysis on the dwell time so when you look at a menu you talk to friends around the table you look at choices you say oh i'm narrowed down to two three choices and which one do i choose you go back and forth now suddenly when people flip through this choices in this device they can see that they're dwelled on certain choices so they not only know the decisions that people made now they also know the decisions that they considered now that's a gold mine so that means you can you can absolutely decide that everybody sort of wrote this piece off early on and so the certain things are more popular in texas or whatever the case may be and they can look at very tailored made analysis suddenly you get more insight not only that so now they look at this data and then they train machine learning models in the cloud and with that they can then decide what to show a customer if a customer chose this particular item perhaps you should show the associated item from our machine learning that's the most common pattern much like books are recommended so you do that and so and then they talk about their their dream is a healthy lifestyle for people so they they want to know people's calorie budget you know they're walking with the Fitbit counting every step you know every breath i mean god knows what they're counting you know all sorts of things getting counted people are really anxious about their health and so they could have calorie information they could enable you to fit within the calorie budget this is just incredible so so you put it together and say wow this is a very different ball game all that's because we have software we have hardware we have the cloud that enables you to get all these capabilities very quickly and conveniently this is what used to be a just a point of sale so a terminal suddenly has become a dramatic business benefit this is this is fantastic another one that's particularly heart wrenching for me so this is the picture of Fukushima Daiichi our customer ultra tendency working with the non-profit safecast so here you are in japan you got this magnitude nine earthquake hit you you're reeling from that and then you hear about the tsunami you're reeling from that and then you hear about the nuclear meltdown oh my god the triple whammy and you're the only country in the world who've been hit by anything nuclear this is not a good place to be certainly the government is scrambling to handle the triple whammy so a small non-profit safecast in japan working with 11 person startup in in Germany magdeburg Germany they decided to do something about that they said you know what we know we have a radiation problem we have no idea what is safe what is not safe we don't know who can go where we don't know whether i'm a threat or not should i leave japan should i leave fukushima that area or how far should i go how quickly should i go i don't know so they did something very simple they said let's take some geiger counters 500 geiger counters let's just give it to people who are whoever wants it we don't know where the problem is anyway if you're anywhere close to the nuclear plant just take one all you do is to upload the information to us they did that and they used to do to process imagine their it scenario how much machine should they buy you know how how voluminous is going to be the data how much how many people are going to really do this is 500 geiger counters enough tremendous variability complete unknown all they know is they need a scalable system and it's a pressing problem now and we need to act fast and so long story short they they distributed these things and they got the basic information they did some projection they used big maps to produce these heat maps and suddenly you have visibility look at what that data inside is going to do so data their their data dream was to give high quality information to people at a time of crisis what an amazing what an amazing timely use of data to change people's lives incredible so i started by saying there's all this data that's coming out and it's used for various sort of industrial purposes use cases and so on what i'm finding more and more is that this data use cases are also touching human lives across these well-known fundamental needs health food and nutrition lifestyle you know sports health giving we have customers that that span these industries and and this is this is the big epiphany for me that oh my god here we are screwing around with bits and bytes and doing queries and you know making high faster and all of the stuff and suddenly it's saving people's lives is suddenly it's making a dramatic difference for people so i often think about the data dream i have a personal data dream i'm a big fan of education the current education system is all wrong it's all judging we have a teacher that drones on for a long period of time with lots of students and the students are given a test and you're either good or bad if you meet the bar you get to go to the next class and do other things otherwise we don't care so it's a judging based system i would love for this to be transformed to a coaching based system every mistake you make we help you clarify the concept we make sure you learn my data dream is that the education becomes a coaching system not a judging system my data dream is that every student gets an A some get it faster than others not this reverse every student finishes on time some students learn most students don't and this is crazy so that's my data dream so you you think about your worlds you think about your personal world you think about your professional world you ask yourself what is my data dream i asked you what is your data dream tell us about that tell your friends about that tell had lobster made about that tell the whole world about that tweet about that thank you very much long gone thank you very much thank you for sharing insights right as microsoft looks at the opportunity of big data and and what that could mean and what that means to the world, right, in terms of what's happening. So I appreciate sharing that insight with everyone. So our next speaker is Arun Murthy. So Arun Murthy is one of the founders of Hadoop and also one of the founders of Hortonworks. And what Arun's gonna come talk about along with some others, Drew Newman and then ultimately Peter Crossley from Webtrends, is to go through what's the potential that we can unlock with Hadoop and what are some of the things they're thinking of and working on as they start to drive that potential. So with that, Arun, thank you very much. And this is yours. All right, check. All right, thank you. Like I was saying, this is my eight Hadoop Summit. You guys are by far the most seniors of the large. So I'm glad you all lighten up, right? It's been an amazing journey, right? We, you know, as Herb was talking about, I was here at the first Hadoop Summit. It was in Santa Clara. We start off at 50 people and, you know, that grew to 100 and then eventually to 200. And, you know, the sort of went completely from there. So today, what I want to talk to you about is sort of give you the picture of how Hadoop is not just, you know, a technology at this point, but it's allowing you to actually unlock your data potential and get you, you know, insights and actionable, you know, more importantly actionable insights and outcomes, you know, with use of the technology, right? Now, back in 2008, I started this, you know, small Gita to talk about how we want to move Hadoop beyond MapReduce and that eventually led to something called Yarn. And the idea then was just to allow people to interact with all this data in different shapes and forms, right? And that's, you know, gone a long way. Now, today we talk about this notion of a data operating system. And the good news is, you know, terms like data operating systems are usually a lagging indicator of the value people are seeing, right? And that's really a testament to all the people here who actually, you know, take in Hadoop beyond just, you know, a Gita or a vision out there, right? Now, if you go back in history and, you know, think in terms of what an operating system was, an operating system, you know, the concepts of operating system came out in the 60s and the 70s. The idea was that at that point, you know, the most contentious or the most, you know, scarce resources were hardware, you know, memory and CPU and so on. And you needed, you know, a piece of software to automatically manage it across different users and applications, right? Now, fast forward, 2015, we don't really think of hardware as the most scarce resource that's happened at a Yahoo or a Google or a Microsoft and now complaining with Hadoop. You know, we can put together lots and lots of commodity hardware, whether it's storage or compute and get you, you know, what we know as a, you know, a scale out system. But what's the key at this point, and that's why all of you are here is to understand that the most important resource right now is not hardware, it's your data. And that's why the notion of a data operating system is so important, right? The operative word here being data, not operating system, right? However, you know, the notion of, you know, the operating system is not enough to just do, you know, to manage your resources. There's actually a bigger picture around here, right? This obviously starts from storage and HDFS has sort of been the backbone of the Hadoop ecosystem for a long time and continues to evolve. We were here last year talking about how HDFS is changing and today you can actually manage not just, you know, spinning disks, but fastest, slowest SSD, even memory as a storage tier with Hadoop, with HDFS, right? Obviously with yarn, you can manage different resources whether it's memory or CPU or network and that's all here. Now these two are important pieces but an important, equally important third leg of this tool for me is metadata and data management, right? And that's why we're so excited about Atlas. We feel like Atlas is going to be that third leg of this tool which allows you to manage not just your, you know, compute and storage but also your data sets, right? And that's equally important. Now, once you go beyond, you guys are here not just to play with the software which is why as enterprises take on Hadoop, you really want consistent operations and consistent security. We'll talk a little bit more about some body and ranger down the talk, right? Last but not least, you want, you know, the developers of the Life Plotter system and you want nice APIs and tools for these guys to actually take advantage of all the data and all the compute resources. And that's where things like Spark and Flink and Thes and all these things are coming up. That's, you know, obviously part of the ecosystem in a big way. So enough talk. So last year, I was here on stage and we demoed something we called, you know, a real-time interactive application where what we showed you was, you know, we took sensor data from a customer of ours who's a trucking company and showed it out where you could actually watch, you know, violations in real-time, right? Now, normally can you also see the violations in lane traffic and so on real-time? Could actually go back and look at these violations and do analytics using tools as simple as Microsoft Excel, right? However, that was so last year, right? As Rob says, you know, the key opportunity here is to go not just real-time, but also predictive. You want to look at pre-transaction analytics and sort of figure out how to influence customer behavior and user behavior, right? So let's look at this year's requirement from, you know, let's say the Chief Data Officer's perspective. Not only does he want, you know, increased safety, increased, you know, reduced liabilities and so on, he's demanding that we anticipate driver violations even before they happen and hopefully take corrective actions, right? So with that, what do I do as the application team? The application team, the thought process I go through is now, let me enrich the application with sort of weather and driver profile data because it's gotta be personalized, right? It's gotta be personalized in real-time. The real-time aspect of this weather that's changing every day, every hour and then the personalization part of it is you gotta take into account every single driver out there, his history by itself, right? So once you have that, you want to explore that data set, you know, the enriched data set for features and build predictive models to actually take advantage of these features. Once you're done with this, you want to start to plug in the model right in so that you can actually start to do predictions real-time, right? So let's sort of show you how. To help me sort of demo this to you guys, I have two of my awesome coworkers here, Drew and Norman, and they're gonna be here to talk to you about how we actually take this application and actually make it something that you can actually run in your business, right? Now, at Hardenworks, we're actually really big on personas so what we'll have here is Drew is gonna play the role of a data scientist who's actually gonna explore the data come up with models and Newman is actually gonna play the role of a data architect who's actually not only gonna take stuff from Drew but also put it in an application, right? Now, we're also big on what we call as end-to-end and what we mean by that is not only do we want to show you how the data scientist plays, but Drew is actually gonna start all the way at the bottom, right? He's actually gonna provision a cloud, a HTTP on the cloud using Cloudbreak. He's actually gonna use a really cool tool we call as Apache Zeppelin and again, that's a testament to the Hadoop community, the overall Hadoop community and how innovation's happening and he's gonna start to run algorithms to do predictive models. Once he's done, he's gonna hand it off to Newman and what Newman's gonna do is sort of walk you through the updated architecture you guys saw the architecture last year, he's gonna show you a new, sort of an updated version of it and then he's gonna demo the real-time app and then he's also gonna show you how he's gonna put that in and actually make that part of your data architecture, right? With that, Drew, Newman. Thank you, Aaron. Good morning, everyone. I'm Drew Kumar and for the next five minutes I'm gonna masquerade as a data scientist, I'm actually a partner solutions engineer here at Hortonworks. So as a data scientist to satisfy the chief data officer's requirements to build a predictive model, the first thing I need to do is to quickly spawn a bunch of machines in the cloud so I can start doing some data analysis and then further down the road, I can build my model on it. So how do we do that? Well, we recently acquired SequenceIQ and we are in the process of enriching that experience for you. SequenceIQ, if you don't know about it, it's a way to launch clusters in the cloud and you don't have to think about the APIs of the cloud providers. So this is what the interface is gonna look like in a few weeks. At this interface, I can just go ahead and sign in. Once I have signed in, I can select which cloud provider I wanna work with. So let's say I want to start doing data analysis with Azure clusters. So I select my Azure credential and I just click this button which is create cluster. Let's give this cluster a name and since we're in the Northern California, I'm gonna select Western United States as my region and now I'm gonna pick a blueprint. So as a data scientist, I'm interested in using Apache Spark for my needs and these blueprints will come pre-built with the settings which will contain all the tools which will be launched in the cloud for you. So it goes by different personas. So over here, I'm gonna just select HTTP Spark cluster. When I select that, I get a bunch of options where I can go inside and customize if I want to. But for this purpose, for the purpose of today, I'm just gonna go and create a cluster. Now while this launches and spins up in the cloud, I have already provisioned these machines offline and I'm gonna show you what, once the machines have been provisioned, what the experience looks like for a data scientist. So this is Apache Zeppelin. It's a notebook which is connected to Spark back end here. It's similar to iPython notebook for all you data scientists out there. In this interface, as a data scientist, I can start doing exploratory data analysis. I can start building models and I can even go on the shell directly from here. So I don't have to leave this interface. So it becomes very powerful as we'll see soon. So the CDO gave me these enriched events which I now want to start exploring. So what do I do first? Just use a little bit of Spark Scala code to bring this data set into Zeppelin and I put, convert that into a structured form and register it as a temporary table. Now this is a structured table. From here on, I can just invoke pure SQL on this. So for instance, I can show you now what the data set looks like. So let's take a look. So over here, this is the data set I will be working with. Shown are the event types. So what was the actual event? Whether the driver was certified or not. What is the payment scheme and some geographical information? As a data scientist, I want to find out what are the characteristics of this data set and piece together a story which can then help me in building my model. So now I'm going to start exploring this right from this interface and try to predict what type of features should I be using. Okay, so let's first question we want to ask is do certified drivers create less violations? Well, I just issued a simple SQL query, got my results back, but now I can start plotting them in different types of manner just to see what the impact is of that feature. Let's say fatigue is linked to incident. So driver fatigue is defined as the number of hours they've worked in a week. Again, the simple SQL query is going to show me a table and because this is converting that to a plot, I can just easily plot that here and see what the impact is of fatigue. So far I'm still trying to understand and analyze this data set, but now I go further down and see other variables impacts. For instance, what's the impact of fog, impact of rain and so on and so forth. Let's see how the geographical information impacts my violations. So here I've created a scatter plot and I've marked all the points where that happened. So now I can also do one thing, I can take this out and put foggy conditions here and see how that influences my data set. So the point I'm trying to make is it's very powerful to just go from this interface, do all your exploratory data analysis and not just stop there. Now I'm ready to actually build my model. I've converted this data into a form which Spark ML libraries can use and I've put that out as a training data and from this interface itself, when I press this button, it actually goes and launches the Spark job in the cluster and this is running right now. We are doing regression model here. It's gonna run for 200 times and actually it's just finished. So let's look at what the impact of the different variables are. So this is basically my model. It's a simple linear equation and in this I find that foggy weather is the one which impacts most type of violations. Now my model is set. This I can communicate as a bunch of numbers to my enterprise architect who's gonna take it from here and weave it together in a bigger story. Right, thank you very much, Drew. Morning everyone, I'm Newman Fakar. I'm a solutions engineer at Hortonworks too, just like Drew and today I'm gonna play the role of enterprise architect. So as the enterprise architect, what I'm really interested in is taking a streaming application and converting it into a predictive application. So let's first walk through the overall architecture of the app really quickly, see what's going on here. So as you can see the events from trucks flow into the HTTP cluster. First to a component called Kafka. Kafka is a highly scalable PubSub messaging component inside HTTP. From Kafka the events flow into Storm. Storm is where the real-time analytics happens. For example, it is a Storm redetermined whether a driver has made a violation or not. Storm also rolls up the event aggregates to HBase. And from HBase we chart it, chart those aggregates onto the predictive UI that we're gonna show you in a bit. Also keep in mind that the events are being streamed into Hive as well. Which means that data analysts can interactively explore those events over a SQL interface or via their favorite BI tool. So now the question that you may ask is well, okay, how do I make this streaming app a predictive app? The way we're gonna do that is we're going to get Storm to reuse the Spark ML model that Drew just built. And what happens now is that as the events come in, Storm scores those events in real-time against the Spark ML model. And the output of that scoring is basically a prediction about driver behavior in real-time. Namely, will the driver commit a violation or not? We take those predictions, put them on ActiveMQ, and then from ActiveMQ, they get consumed by the predictive UI. So let's take a quick look at what that predictive UI looks like. So here's a map widget and there are dots on this widget. Each one of those dots represents a prediction about driver behavior in real-time. If the dot is green, it means that we're predicting that this driver is doing fine. He's not going to get into trouble. He's not going to make a violation. And as we start getting violation predictions about drivers, the dots start turning red and they get larger and larger as we make more predictions about that driver. So for example, for George over here, we predicted that he's going to make 15 violations, right? So you can imagine that there can be an ops person sitting in an ops center who uses this as an alerting mechanism and can radio George in real-time and let him know that, hey, maybe there's upcoming fog on the route that George is driving and, you know, he should turn on his fog lights and be extra vigilant on the road so that he doesn't get into an accident. If you scroll down a little bit, you'll see that there's a table here and this table is showing the prediction aggregates by driver. So for example, for Michael over here, right? We see that we've predicted that he's going to make nine violations. And these values keep on getting updated over time. And we also see that there's contextual information about the predictions as well. Like when the prediction came in, was the weather foggy? Was it rainy? How much of a, how many hours had Michael worked? Was he overworked or not? So you can see that we've evolved an application from, you know, telling you what is happening on the road to telling you what is about to happen on the road. So we've made it, converted it from a streaming application to a predictive application. Now, one question you may ask is, well, I've got new events flowing into my cluster every day. What if I could take those new events and keep on retraining my predictive models so they keep on getting better over time? And a second corollary to that question could be, well, because that's going to be a predictive workload or a periodic workload, you may want to tap into the elasticity of a cloud provider to run that workload. Now, this is where Falcon comes into play. Falcon is part of the HDP platform. And what it allows you to do is it allows you to define pipelines that automate workflows inside your cluster. So for example, over here, we've seen a pipeline that has already run. It took in the raw events that were coming in from Kafka, it put those raw events through an enrichment process. That is it added weather context to those raw events and information about driver fatigue and payroll to the raw events. And then it rolled out the enriched events to your on-prem data lake on HDFS. And it also automatically synced that enriched feed to a cloud storage provider. And what we've done over here is that we've essentially democratized the availability of data so that data scientists like Groove can keep on iterating their predictive models and keep on making them better as time goes by. So bringing it all together, what we've showed to you is that you can build predictive apps on top of the HDP platform by utilizing components such as Spark for machine learning, Storm for streaming analytics. And you can also operationalize your workloads both on-prem and in the cloud by utilizing Falcon. So with that, I'm gonna hand it back to you. All right, thanks guys. So there you have it, right? You know, we're looking at not just faster or cheaper with Hadoop, but you're also looking at the third component, which is earlier. You wanna be predictive and predict the event or the transaction before it happens and it sort of influence user behavior. There's a corollary to this. If you guys know your history, there was Bell Labs back in the 50s and the 60s. There was Marvin Kelly, who was one of the legendary, and I mean absolutely legendary, directors of Bell Labs. He was the guy who hired, who's known as, the people known as the Young Turks. That was Bill Shockley, Bardeen, and Bretain, the guys who did the transistor and Claude Shannon, right? And he had a motto, Bell Labs would only do faster or cheaper or both, right? So with HTTP, it's now faster or cheaper and earlier, right? That's the sort of the next step for us. Now, having said that, it's also important to not just talk about how we get the applications in, but in an enterprise setting, we have other requirements like security and authentication, right? So that's why projects like Ambari and Ranger continue to evolve. With Ambari 2.1, that's coming in the HTTP 2.3 platform, we've done a number of huge enhancements to make it really, really simple and really easy for anybody to actually deploy Hadoop. It includes things like customizable dashboards and so on. So I really encourage you guys to look at Ambari 2.1. Similarly with Ranger, not only do we have comprehensive authorization across the stack, but we also have things like transparent data encryption that's coming in, which allows, again, to move Hadoop into that next set of use cases in your enterprise. Last but not least, you also want as much choice for your deployment of Hadoop. It doesn't, you can pick your favorite cloud provider. More importantly, you can pick your favorite cloud provider for that actual use case, right? You might wanna do data analytics in one place. You might wanna do BI in another. You wanna do data science in something else. That's why Cloudbreak is so big for us at Hortonworks. As you guys know, we acquired it in April when we announced it at the European Summit. Now, the combination of Cloudbreak and in addition to something like Falcon allows you to move data in a transparent seamless fashion. Now, let's also bring in addition to all the technology, let's bring somebody on stage here who's actually lived this for a while. I want to take this opportunity to introduce you people to Pete. Pete is the director of architecture at Webtrends. Welcome, Pete. You know, don't be put off by his director architecture title. He's really, really smart, really hands-on. He was telling me backstage how he was debugging Spark 1.4 release candidate, right? So, you know, give credit for it to you. So, Pete, thanks again. Thanks for having me. So you guys have been on a while, you know, have been on the Hadoop journey for a while. Talk to us a little bit about how you guys see Hadoop at this point. Right, Webtrends has been around for a long time, 20 years or so. And I kind of say that Webtrends was doing big data before big data really existed, right? Because we are a digital marketing solution company and we have all this data coming into us from multiple companies through tags and we've been collecting analytics and now in the realm of doing optimization, testing and targeting. And one of the things that we really had to deal with was dealing with large data sets and large file stores and moving to something like Hadoop just finally getting to that point where we're able to actually work with the data and spin it around and take that raw data that we can actually hold on to and repipid and repurpose it for different purposes down the road versus just dealing with the aggregate. And Hadoop really has allowed us to do that. I mean, we have a 60-node cluster and we're growing 500 gigabytes every quarter and so that's a half a petabyte every six months and we have to order hardware ahead of time just to get it in place and land it fast enough for our growth but it's still saving us 20 to 40% on our cluster costs. Awesome. Now, your business is so much about taking data and turning into actionable insights. I'm sure people here would love to hear about some specifics, you know, how that's made a difference to you guys. Yeah, sure. Webtrends, like I said, is taking digital marketing insight and taking actionable data on that and we have integrations with email marketing companies. We have optimization capabilities ourselves. We take this data that's being collected and when I say real time, I'm talking sub-second. I'm taking from the time, 20 milliseconds from the time that we collect it to the time that we're starting to process it and persist it into disk that we can then spin it around and start querying on it in our new infinity engine that we released recently. And so this is allowing us to now really build new applications on top of that and be able to take these data-driven capabilities to the next level. Awesome. Now, in your business, you have so many data sources, right? You're getting it from, you know, different providers, different markets and so on. So talk to us a little bit about sort of the curation and the data processing, the cleaning side of things and also sort of help us understand how, you know, I think that things like Hadoop and Spark, you guys are kind of, you know, changed your business and what are these sort of, you know, advantages you see with both of these? Right. I mean, with the data, the internet of things, right? We have devices, I mean, everybody has a mobile phone probably in this room. They have a laptop. They have, you know, maybe more than one mobile phone. And all this data is coming in through many different markets and channels. We have to understand the customer life cycle for our clients and our customers. And their customers are using websites, devices and that's sending any shape of data to our collection facilities. And we don't really control that. I mean, sometimes they're tab, semicolon delimited. Sometimes they're encoded twice. You know, there's all this data cleansing that we have to do potentially. And what we've really learned is, is we don't do it. It sounds kind of funny, but you don't do it. You store it in their raw form and then leveraging technologies like Spark, which we've been invested in for a long, long time now. It was allowing us to take that data out in near real time, stream it, mid it, and then mutate it and modify it as we need to on the outbound and de-encoded or split it or zip it or however we want to deal with it. Awesome. All right, Pete, thank you so much for taking the time to share with us. Thanks. So we've covered a lot, a lot of ground in the last 30 minutes or so. Just to give you guys a quick recap. We talked about how this notion of a data operating system is coming through. It's much beyond just one technology-like yarn. It encompasses everything from security to storage. We talked about how we want to make it really easy with technologies like Zeppelin and Falcon to give you end-to-end analytics and make it really easy on HTTP. And last but not least, we talked about how existing customers like veterans are really, really taking advantage of Purdue to make a big difference in their business. Now, there's lots of sessions out here. I'm sure you'll enjoy a lot of them. This is some of a selection that I put up. This is something that I'll hopefully be going to. Shout out to things like what Alan's working on in the high-even edge space for transaction processing, catch some more yarn talks and so on and so on. Last but not least, some of my favorites are just the BOFs. Haroop is so much about the community. If you want to go meet all the individuals who make up this community, come to all the BOFs, whether it's Ranger or Knox or Zeppelin, you'll find a lot of time and a lot of value there. I also want to take this moment to give a quick shout out to the ASF. None of this would have been possible without the stewardship of the ASF. And Haroop continues to thrive, thanks to all the great folks at ASF helping out with Haroop and so on. With that, enjoy your conference. This is my eighth. Hopefully this is one of tens more. All right, thanks. Thank you, Arun. Thank you very much. It's great to see the power of Hadoop as a platform in terms of what's possible as you look at it as a platform, right? As you have yarn and you can start to build a multi-tenant architecture and what's possible with streaming data in with Storm and Kafka, how you start to use Spark on top of Hadoop for the machine learning. They went through Apache Zeppelin as a notebook for data scientists. You want to start to present that information to them and then a lot of the data pipelining with Falcon. And lastly, how do you do the cloud deployment? What can you do with cloud break? So all of this working together with Hadoop as a platform starts to show the power from enterprise class capabilities of what's possible and how different companies and different users are starting to say, how do I assemble these components and start to use them? So we're going to go wrap up the day with the final keynote. The final keynote is going to be Mike Galatieri coming from Forrester and giving Forrester's view from an analyst perspective of how do they see this market developing? Where do they see Hadoop from maturity perspective? And what do they see in terms of adoption and what companies are doing that are out there? So with that, Mike, come on on stage. Mike, thank you so much. Thank you. Hi, everyone. My name is Mike Galatieri, Principal Analyst at Forrester. I cover big data, Hadoop, and advanced analytics. And I'm going to talk to you about some of the trends that we're seeing. And the first one is just very simple. I mean, in this room, we all know it. And we surveyed a couple thousand IT executives and technology decision-makers and said, hey, what are the top-ranked priorities? You can see here data-related projects. You can see the other usual suspects, cloud, mobile, systems of record, and of course, data runs through all of these things. But I love it that data is top of mind everywhere. Now, the other thing that I love is that in 2014, we put out a Forrester Wave, which evaluated the distros. And Forrester produces thousands of research reports per year. And isn't it great that the Forrester Wave was a second-most-read report? Now, look what it's flanked by, customer experience, digital customer experience trends. We're going to see how those two tie together. And it was so popular that this year we're going to put out three waves. One is going to be about the Hadoop distros. The other is going to evaluate some of the pure cloud providers.