 Many of the core technologies vendors and companies that you probably work with already in your data centers and inside of your organizations. What is Hadoop's place within that and then how do all these other companies work closely with it to integrate it in to their reference architecture as they go to market. So you'll hear from a number of partners EMC SAP and HP as they talk through that. And then we'll end today with David Epstein. This will be an interesting one. David will be a fun conversation. And what he's going to go through is we all talk about big data and what that means. David's view is really you can get all this data, but it's how do you find that small difference, that small window that you can go exploit and take advantage of and think of that in things like sports and other analogies, moneyball, et cetera, where it's how do you leverage data to take advantage of it? What can you do when you can find that small insight through data discovery of broad sets of data? And he'll put a fun spin on it in terms of how they think of it and applying it to different industries. So otherwise same today with the community showcase what's happening out there, Cafe Hadoop. How many here got to stop at Cafe Hadoop yesterday? Okay, pretty good group. I think you get to see some of the tech Titans in terms of what they're doing, get to hear how they're thinking of Hadoop and you get hands on and get to go play with it while you're sitting with them and sit with the experts. So same thing wireless. Nothing's changed from yesterday. So hopefully everybody has this from yesterday. So I want to get to the party. Party will be fun tonight. So for all of you here, San Pedro Square, if I could emphasize one thing, you have to bring your badge. Otherwise we'll have all of San Jose right hanging out there with us in San Pedro Square. But bring your badge, you get to walk around different bands, different places to eat. It's a lot of fun. And I know last year, a lot of people had a lot of fun with it because it's a free form environment, good way to interact with all of your peers at the conference, have some good food, some good fun and listen to some great music. Okay. So with that, I'd like to do is get this started today and bring up our first speaker. First speaker is CJ Desai, President of Emerging Technologies from EMC. And what he's going to talk about is as our, you know, as a partner, welcome CJ. Glad to be here. I'll hand this to you. Thank you. We've had a partner. Down goes the flag. Now number one is away. Bump into the saddle and off he goes. A very nice start indeed. Here's McGinnis. Well, what a challenge he's having. So John McGinnis, the gentleman who was shown, he has won 21 Isle of Man TT competitions. And till yesterday, this is one of the most dangerous, if not the most dangerous, race courses out there. And just this morning, he won the 22nd one. So at EMC, we decided that, hey, let's find out, is it the man, the machine, or the combination of two, why on one of the most dangerous race courses, this individual has won so many races, just to give you an idea of how dangerous this race course is, over its history of this race, there have been 240 deaths. So they are constantly working on improving the safety and making sure that they give proper guidance to people who want to race here. So again, EMC kicked off a project. We put sensors on the man and the machine itself. We did a simulation in Spain. And then we launched it off to the data science community to figure out what was the reason? Why does he win so many times, 21 till yesterday, 22 hours of this morning? And the results were really, really interesting. And then in the fall, we are going to actually, for the race, we are going to have John wear sensors as well as his machine, the bike, and see what happens in the actual race. So when we did simulation, we announced a competition was very well received. And now in the fall, EMC is going to be sponsoring the actual race and figuring out what's the reason behind him winning so many competition on this most dangerous race course. So with that, EMC's emerging technologies division is focused on solving the infrastructure challenges for these new workloads like Hadoop. So if I have to simplify, the team at EMC is working on figuring out how best we can help so that the adoption of Hadoop is easier for the enterprises. So some of the design principles behind our infrastructure products or storage products, I would say is one scale out architecture. And the reason scale out is you should be able to start small and grow as much as you would like. So you don't have to start really big. You don't know how much capacity you are going to use and what's the data growth rate. Second is built in analytics. And this is key because EMC is a leader and innovator in storage. We want to make sure that while we are holding the data, you should be able to run analytics workload on top of it. Third is software defined storage. So EMC has recently or the past few years moved more towards the software direction, whether it's scale out blocks, scale out files, scale out object, we want to make sure that the value is in the software and similar to Hadoop, you can run this on commodity hardware. I would say the open source and taking inspiration from this community right here, just last month we announced that our storage automation, as well as provisioning software, we open sourced it, made it available for the first time on GitHub on Friday. And you will see from EMC, more and more products will be open source. Again, similar principle as Hadoop in trying to leverage community's expertise to make our products better. And the last point on next generation flash, this one is important because as the analytics workload move from batch to real-time, how can we harness the power of flash so that you can run workloads at a much faster rate when you want real-time data? So all of you know this really well. It was I think 2004 when the original white paper on MapReduce was published, then the team at Yahoo took it, and what a phenomenal evolution over the last 10 years or so. I mean, I saw the attendance numbers here, 4,000 batches, 300 more to come, definitely huge growth rate in this community right here. And the evolution over figuring out Hadoop for a variety of use cases as you saw yesterday, but our focus and EMC's focus is usually enterprise infrastructure. And what we see is for enterprises that are trying to adopt Hadoop, how can we make it simpler for you? The challenges you have in enterprise infrastructure where you have data silos, you have a variety of tools, you have existing investment in the infrastructure, you have to work with sourcing, lines of business asking you different things. So how can we enable so that it makes it easier for you to deploy Hadoop? Now, when we look at the ecosystem that has evolved and multiple Hadoop distributions, whether you know Hortonworks, Cloudera, and others are pivotal, it's part of the EMC family, you know, pretty good about open source strategy on the distributions and then the tools that are on top of it. And we want to make sure again that we provide infrastructure as this community contributes to the evolution of this great technology platform. We want to make sure that we make it easier for you while you face the challenges with the typical enterprise IT infrastructure and silos. So from our standpoint, because EMC is an infrastructure provider or the storage provider, you know, what are some of the challenges for mass market adoption of Hadoop? So first is, once you want to make Hadoop as a truly enterprise class platform, you need enterprise great reliability. Simple things like making sure it's highly available, you can have backup, replication, security, your compliance requirement, whether you are a pharmaceutical or financial services organization, you have to make sure that this platform can support just the basics of what enterprises typically look for. The second is process the data in place. So if you look at the 90s evolution on data warehouses and others and, you know, variety of databases, the data actually moved to the tools. Data moved to the tools and then people figured out, hey, how can I run reports and, you know, have my data warehouse that can scale to whatever the terabytes or petabytes of data back then. What we are trying to also make sure is that data is in place and rather than data moving to the tools, tools should move to the data. Second is, you know, when I look at RDBMS evolution, and yes, you know, there are questions about, hey, where are we on the maturity curve on Hadoop platform side? The multiple distribution and tool set is just going to be a reality, right? There will be multiple distributions, there will be certain tool sets, and as the industry evolves, we are going to see that a few of these tool sets become very prevalent, both from a market share and adoption standpoint. So when I look at multiple distribution is going to be a reality, then how can we ensure that at least the dataset on which these distributions and tool sets sit on top of is one as much as possible? I mean, that would be ideal because you can run multiple distributions on a single dataset. That's where we would like to go, but right now it is still happening where you are creating multiple siloed infrastructure. Diversity of storage, similar to distributions and use cases, our goal is also from an adoption of Hadoop standpoint, give you the flexibility on the storage, right? So if you require file access or whether you have billions of files and you want objects as the underlying infrastructure or whether you want a specific for real-time analytics, very, very fast underlying infrastructure, diversity of storage is key here based on the use case you're going after. And then the last point is around multi-tenancy. So it's from an enterprise standpoint, when I go and speak to customers, they feel comfortable, hey, they've gone with a particular distribution, they're using a few tool sets on top of it, but multi-tenancy for lower end of the enterprise where you can run Hadoop as a service is a key feature that will enable the mass market adoption of Hadoop. Now, this is one thing that we are very focused on at EMC is what we'll call multi-protocol data lake for analytics. So one is to enable this adoption, we want to process the data that's in place. We want to support multiple protocol when you are writing specifically to your data as into your infrastructure, when you want to modify or when you want to read from that specific infrastructure, you should be able to do that. And we want to provide as many protocol support as possible. So you are not creating siloed infrastructure. So again, polyglot storage is key here that you have a single common data set, and you can use various languages to either read from it or modify it. We also want to make sure that if you have a single data set, then you can be secure and all the policies around data management, audits and others for your regulatory compliance is still there. And then from reliability standpoint, we have to make sure that this data is available in a 24-7 type of format. There is no downtime as you add more data, ingest more data. I could easily see you use Splunk for ingesting data and then use Hadoop for further analysis. So one of the goal is multi-protocol data lake will enable rather than having multiple silos, you can just use one data lake, use multiple protocols to access data. And because the underlying data is still integral in the data lake, you can use multiple distributions on top of it. So right model for right use case, of course, we know the history of Hadoop and DAAS was the primary mechanism. We understand it. We completely endorse it that when you want to use multiple distributions, you take data out of whichever the source it is and ingest and have models running whether it's your favorite distribution, whether it's Hortons, Splunk and so on. On the shared model, like I said, if you use the data lake as a foundation, you have a single copy, then it allows you to use multiple distributions on top. So depending on your use case, it really varies whether you want to just go with DAAS and start small or whether you want to use shared storage. So from our standpoint, the way I look at it is if you want to have multiple distribution, don't know your two to three year strategy on where you're going with your Hadoop distribution and analytics tool set, maybe the one on left is better. If you have at least a lot of data already in the shared storage and you want to have multiple protocols supported, then maybe the right side is better. And this is really, really important when you get budget approved for Hadoop and the infrastructure that is lying below it that you can get a good TCO, all the enterprise class features, security that you expect and most importantly, performance as well. The second thing I would say is from a real-time analytics standpoint, going from descriptive to predictive or from batch to real-time, when I look at, say, in financial services, you want to run Monte Carlo types of simulation, your fraud detection algorithms, your security analytics types of application, where the time is of essence, the latency required is in microseconds, you're not talking milliseconds, and the performance required from the infrastructure should keep up if my credit card is compromised. I don't want my credit card company to tell me after 12 to 24 hours, I would like to know that information within minutes that protects me as a consumer and also protects the enterprises in the financial services organization from writing off a huge loss. So we are doing a lot of innovation in Flash to ensure that as we move from batch to real-time, we provide you the technology so you can basically run analytics really fast. So EMC has three analytics storage platforms. We are a big believer in having rather some overlap than creating a gap for you. We want to make sure that if you already have a lot of data in our ISALON scale-outness clusters, then you can use ISALON and, like I said, with multi-protocol support, run analytics or multiple distributions on top of it. We also have a cloud-scale object technology called ECS, and what ECS allows you to do, geo-level replication. I'm sure some of you may say, today, why would I do geo-replicate? I have a DAZ environment and I'll figure this out, but if you want geo-level replication, single global namespace, performance for billions of files, small or large, then object is the right technology, and we have a product for that. And last but not least is DSSD. This is an acquisition that we did about a year ago, a space right here locally in the Bay Area, and this is a rack-scale Flash, and what this allows you to do is microseconds latency, real-time analytics. We are working with some of the ecosystem partners here today to ensure that we support multiple distributions so you can run reports that could take 12 to 24 hours to run on a DAZ type of environment, could run in few minutes, some queries for which you sit very long time where the business unit is asking that, hey, give us the performance data for our sales, marketing, whatever the case might be. Our goal is to enable so that you can run these reports really, really fast. So I would say that on a shared storage, and the term is very specific here, on a shared storage, 800 of you today in a variety of environments are using Isilon for in-place analytics. ECS, which is the recently launched products, allows you to have cloud scale, and I will use one example, a financial services organization that had credit card transactions in mainframe. They were copying that over 16 different datasets, try to run various processing on it. We, using our Isilon technology, were able to leverage the infrastructure of Isilon, put all the data out there, they decided whatever they want to do it from a processing standpoint. It went from 50 racks in the data center to only five racks. It also allowed them to use whatever tools they wanted, whether it's distribution or tools on top of it, to leverage the particular platform as in Hadoop platform here, and in a word security government where they have peace of mind, because it's a financial services organization. So we are seeing examples where shared storage does make sense. And again, there is no right model shared versus DAS, but depending on the use case, you can do that. And in the closing comments, you can visit us at Booth D3. We have a couple of sessions today in the afternoon, but just to reiterate the message from this deck is, at EMC, we are very committed to providing you the storage infrastructure, whether you want to do batch or real-time analytics, and our teams are working really, really hard to make sure that you can easily deploy and have a high performance and reliability at enterprise class level for your Hadoop platform. Thank you very much for your time. Thank you, CJ. So I appreciate the conversation, everything we're doing together. I know work with the whole EMC Federation, everything as you mentioned on shared storage, all the work in the open data platform initiative, all the work with Pivotal, on certifying what happens with Hawker and HDP. And the last example we used was ODP based. Exactly. Well, thank you. Appreciate it. So now I have the pleasure of introducing Scott now. So for some of you who may know this, I first had the opportunity to meet Scott. Scott was the president of Teradata Labs. Had it been there for many years driving the direction of Teradata as an enterprise data warehouse in the architecture and how it fit in the enterprise. Scott's chosen to continue on with his journey in the data architecture and now join Hortonworks as CTO and to work with Hadoop and to start to see where does Hadoop take its place as that next generation data architecture inside of the enterprise. Scott joined recently. So this is, I'd say the first time in a public stage and coming out and talking about the customer journey and how he sees Hadoop playing in that customer journey and what it means. We'll also have Russell Fullsmith joined from Truecar who give his perspective on what they've done as they've built their architecture on Hadoop. So with that Scott, come on out and welcome. Thank you. Thank you. Here you go. Thanks. Well, it's really great to be back here on stage at Hadoop Summit for my fourth time. And it's amazing just to watch the transformation and what's happened in the whole industry and how all of these new technologies that we're delivering are starting to really yield that customer value, that customer benefit that we've been talking about for so long. So what I'm going to talk about this morning for the few minutes that we have together is a little bit about what's driving our strategy and why I think we're at the right place at the right time with the right tools. But I'm going to focus also on some of those benefits that have been yielded from the Hadoop ecosystem and the Hadoop stack. So core to the strategy and why I think this is so transformative for the industry is we call Open Enterprise Hadoop. Open means that we can now tap into the innovations and the ideas from around the world from a limitless developer base. And Open also implies not only lots of new ideas, lots of new brain cells kind of trying to solve those new big data problems, but it's also that community of openness where there's some friendly competition and there's a drive to innovate and a drive to innovate faster. And that openness is one of the biggest transformative things that I see in the Hadoop space. Enterprise means you got to collect all of that and turn it into something that is trusted, that is trusted, that can be used, that can be deployed, that can be redeployed, that can really yield value in an enterprise. And that's really the core of our strategy, to build the best, the most breakthrough analytics. And those analytics have to be innovative, they have to be new, they have to show us something that we didn't intuitively know and that is happening today. But to be relevant, they also need to be trusted, right? Have to be able to believe it, have to be able to take action in your business based on the analytic because you trust it and you're trusting your business, you're trusting your customer relationships to those analytics. And so that depends on this enterprise notion that I talked about, that includes governance, security and operations. All of those things are really important. So let's look at some of the transformational analytics that are happening in the industry today, with some real customers, right? So web trends, the challenges are not unique. Yesterday in the keynotes and in the breakout sessions, we all heard about the challenges that are out there. There's a lot of data, it's hard to consume, it can be expensive to consume, it can be complicated, it can be hard to be agile in this space. And in this machine learning with Spark solution, what web trends was able to do is reduce cost for data storage, deploy in the cloud to, again, make deployment easier and more seamless. And to process a whole lot more stuff, 10 billion events a day, 20 milliseconds or so per event. So this is really good. And this is helping, obviously, to drive extreme business benefit for their company. We have a stream analytics use case from Symantec, who I think also is presenting here at the conference today. Again, the challenges, expensive, can't keep all of the data, right? Latency is a problem, and that's been solved. And they're now processing 105 million log events per minute. And, you know, the bad guys are out there trying to break into all of our systems. And so being able to be this agile and respond with all of the data and all of the analytic power that that data provides is extremely important. And processing time was reduced to a time that's actually helpful for driving value in the business. Bell helicopters, we're going to talk about sensor data and understanding what's happening in very expensive equipment and being able to proactively manage that equipment to reduce downtime to provide a better customer experience and to combine all of the data from different siloed parts of the business into one place where the data can be analyzed. And that data can turn into analytics. And those analytics can really provide proactive recommendations, a better customer experience, and better total cost. So, and finally, getting a 360 degree view of the customer. And this is one of the more common use cases here. We're referencing a very large retailer who has used a combination of solutions, including traditional databases, as well as a Hadoop stack to really create that 360 degree view of a retail customer. And this is extremely important, right? As consumers, we're all becoming more and more demanding. We expect, if you looked at yesterday's keynote speech, we expect to be treated like royalty. And if we're not, the cost is extreme because we will go somewhere else. And so being able to create a 360 degree view of a retail customer in high definition and be correct is extremely important. And combining different silos of data, different kinds of data, being able to analyze that data, right? And in this use case, our customer has saved millions of dollars in storage costs, streamlined their inventory. And as a side benefit, actually got increased revenue by being able to be better and more intelligent about pricing and actually drove the top line, as well as the bottom line with these solutions. All of the solutions that reference in these customer examples were all done with earlier versions of Hortonworks data platform. Earlier this week, we announced Hortonworks data platform 2.3. Lots of information in our booth on the expo floor, talking about the different features and functions. But let me just give you kind of a high level view of the things that are in here that we think will continue to help accelerate adoption and value creation from the overall Hadoop stack. Broke it down into really three easy categories, make it easier for users to use this stuff, make consumption easier. That's really good. Make it more secure and easy to govern and support it better. These are really, truly the enterprise grade features that we've added into HDP where we combined the open, the innovation, all the new algorithms, all the new analytics, and put it through the enterprise grade test to make it easier to consume and more trusted. So in the user experience, there really are two kinds of users of the system. There's the operational aspect of a user of HDP, and we've made it easier for operations. Easier to set up and install, customizable dashboards for the operational staff so they can actually track cluster health and cluster utilization over time, and much easier provisioning for more agile analytic delivery and deployment. Again, all centered around Ambari and a graphical user interface that makes it easier for the operator. For developers, we've also created some really important ease of use capabilities including visualization for SQL, again making it easier to kind of interact and see what's going on, improvements to machine learning and Apache Spark on Yarn to make processing easier to implement and a little bit more efficient, and some fault tolerance and some other enterprise enhancements for streaming applications to make them more dependable. So the user experience, a large amount of investment and a large piece of HDP 2.3. Security and governance, also very important as part of being trusted, we've created a whole bunch of things that help the security administrator, including encryption of data at rest, which is obviously really important in today's world, security and privacy and all of the data that we're collecting, being able to be confident that only those that have a need to access and a need to know can have it, encryption of data at rest, as well as easier deployment of authorization and security access and scalable metadata services to provide the ability to actually audit what's going on. Data governance, keeping track of what data you have, where it is, how it got there, really, really important. So we've created three basic concepts inside of here, transparent governance standards for data governance and the data steward, data landscapes to make it easy to reproduce, relevant data landscapes for additional applications and users, and again, enhancements to metadata services to really understand what's out there and what's in. The final piece of enterprise grade is really around support. Once these things are built and deployed, they've got to be supported to be trusted. They become dependent on by the business. So in addition to the support, the traditional support, including customer portal and knowledge base, on-demand training and access to support analysts, we've actually added Hortonworks SmartSense to be a little bit more proactive in terms of the overall support and the trusted nature of the cluster. And SmartSense provides dashboards and recommendations on how to improve the overall operations and management of your system. And there's an example of one of those dashboards here. And again, in here, there are a few proactive recommendations that are actually on the screen. So we think that in addition to continuing to invest in optimized case resolution and providing that real-time support that you need, this proactive SmartSense adds another facet to the supportability and truly to the enterprise grade capabilities of HDP and HDP 2.3. So I've rushed through a bunch of things. I think a couple of key concepts. We're in the right place at the right time. Open Enterprise Hadoop is a really important concept. We're actually starting to see those values that we knew were there once all the data got together and smart people got to look at the data. And we think it's important to continue to invest in making those analytics both innovative and trusted to find that next tier of enterprise value. At this point though, I'm going to stop talking and I'm going to turn it over to the presenter you really want to hear from. One of our customers is actually gone and delivered value from the Hadoop ecosystem and the Hadoop stack. So please help me welcome Russ to the stage. Thank you. Thank you for being here. Appreciate it. I'm going to warn you all. I got great news right before I came on. They killed the presentation clock and said I could go as long as I wanted. So be warned. I may deviate from the slides. I'm the head of our data platform at Truecar, which can mean a lot of things to people. But basically if there's data that comes into the company and needs to go out, it'll flow through the technology that my team delivers. I don't know if everybody has used Truecar. I assume not because I'm not seeing it in the sales numbers. So I encourage everyone if you're in the market or you're about to be in the market, please go use Truecar. What we are is a marketplace that helps people buy and sell cars. We've been around for about 10 years. We went public last year and the whole premise is very, very simple and right in line with everything that everybody here is doing. It's about giving data to everybody who's operating in the marketplace. We believe that truth and transparency is just a better way to do business. And so what I'm going to present to you is our true story about our usage of Hadoop and our growth in that. So here is, I guess, I'm probably revealing the money slide too early in this presentation, but I wanted to get it out so you could understand it. A couple of years ago, about 15 of us from my team and a couple of other teams in Truecar came out here to Hadoop Summit. And I distinctly remember the chuckles I got when I put in the budget to have everybody fly up here, spend all this money at hotels and restaurants and learn all this Hadoop stuff. And they're like, Russ, who are you kidding? We're only using about 20 terabytes of data. We can shove that in traditional warehouses, which we were doing. We had some five data warehouses with 206 different databases. And everybody was like, but really, we can just clean some of that and move on. This Hadoop thing is this really going to stick around. And I was adamant with a couple of my other tech leaders. I said, no, you don't see where this is going. As Truecar grows and it becomes more mobile and more real time at the dealership, the data is going to demand that we think completely differently around how we collect, analyze and distribute the data. Lo and behold, we jumped headfirst into it with Hortonworks in July of 2013. I and my team, you know, the executives have empowered me and said, Russ, you spend what you need to spend to give us capabilities faster. So I did that and immediately had my pal John provision as much hardware as we could possibly order. And so we threw a couple of petabytes of nodes up, which we were able to get the economics. I just want to lay to rest. Anybody who's still thinking economics are an issue, we're able to achieve 23 cents per gig on our storage, which just means we don't have to think about it anymore. So we got going and it was like, okay, great, we have all this hardware. We have all this, this latent capability, but we have no applications. We don't have any applications because we don't have any developers that are great at Hadoop, which probably many of you in this room are, it may be in the back of your heads that that's the real challenge. This is a brand new technology. There's all sorts of capabilities. And it's really hard sometimes. And I said, you know what? It's not a problem. It's not a problem. And of course the chuckles were still there. I said, we're just going to train everybody. We're going to hire and we're going to train. And so we started doing that. And we started with one Hadoop developer that was actually pretty good. And then it was two and then it was three. And today we have over 25, what I would consider experts in, in our Hadoop infrastructure, and we're extremely effective at recruiting new ones now to the point where I don't have to put up the, we're hiring slide anymore. We're actually getting inbound requests for people that want to play with our data and play with our Hadoop infrastructure. Well, to add to all of this, we, we desired to go public last year. And we had a very accelerated pace for going public, which puts a lot of pressure on when, when you're the guy who says, you know, in the middle of all this, I'm going to go ahead and transform our entire data infrastructure while we're attempting to do all the things you need to do to go public while you're growing at a 30% year over year clip on revenue and other metrics. We're just going to go ahead and continue to do this crazy transformation. Lo and behold, it works. We go public. And we start launching real applications on Hadoop. And what I mean by real applications. And the Hortonworks guys sometimes hate when I say this, especially the sales guy. And I think many of them will remember when they were talking with me early on in our partnership, I said, I don't do POCs, because POCs are crap in the end. They, they end up giving you sort of a half application that may tell you something, but usually doesn't. And I don't, and I mean that in terms of actual big data, because the real applications escape the scale of your existing systems very quickly. So there's really no way for you to say, is this related to this, are these metrics compared? They're not. And I wanted to do something that was just going to very clearly establish value. And so we started rebuilding what is one of our most important systems, which is our vehicle intelligence system. And what that is, that's a system that takes in all of the vehicle records that are laying out there in the world. Everything that you would want to know about a car, we have to bring in constantly throughout the day as the data on those cars are changing new and used cars, the prices on those cars, everything. And we have to constantly be synthesizing it and spitting it out into the various applications. It's an incredibly business rule intensive system, which is also not something you typically launch with in Hadoop, at least in what I've seen. You typically do the more aggregate analytic thing, not some big rule driven, gotta be perfect, get all the data right. But we did it. And we launched what we call the 2.0 version in August of last year. And it was, it was one of those transformational things because we went from being able to spit out all this vehicle intelligence once a day on the, on the core vehicles we cared about. Being able to do that man every 30 minutes, you know, across the entire set of vehicles that we're bringing in. And our awareness of vehicle costs also, so what I could go into many, many other applications that we've developed over the example, we're just constantly churning it out every month. And it's starting to get to this really fun stop, a key component of driving the motion learning. Some of the stuff we'll talk about in a few minutes. You know, just take a look at some of the metrics that we're getting to the attendee. They don't represent the whole story, but I'm hoping they give you a sense of, of the growth and the real work that we're doing there. And on the last part, you can see just, just how much data that we're bringing in by no means do we have the most data of anybody on the planet. But I wanted to give everybody here who's just getting started or isn't like their second application, just to sense, you know, how quickly this stuff actually does grow. So what's the point? You know, what is underlying? What's driving what we're doing? And so I've, I've showed this slide, usually I put a giant brain on it, but for the sake of the slide being readable, I didn't put the brain on it. But the idea is to be the brain of the industry. And what that means, if I decompose that, we need to accurately identify assets in the marketplace. That can be a vehicle that can be a consumer that can be a loan that could be a lease that could be an insurance policy. And why I say identification is super important. You can't be wrong in automotive. The transaction is way too complicated. So if you're wrong, you lose the transaction. If you're wrong, you get in trouble. You can't be wrong. So we have to accurately identify what we're dealing with all of the time. Then the premise of true car that I just talked to you about was make sure you can assess the value. And what I mean, I didn't use price or anything like that because values, what's important, we need to establish prices and then show context for why those prices are what they are. If you've seen the true car curve, use the mobile app, and you've seen the analytics on this stuff, we the goal for our consumer and our dealers and the auto manufacturers is to make sure people understand why the market is currently pricing where the market is. And again, that relies on knowing what you're dealing with and then constantly finding out new data points that might tell us more about the value. Third, and this is a theme that you've heard out of lots of folks here, which is we need to be able to predict and prescribe who, what, when, how much, etc. And so if you look at the stuff in the middle, I try and say things in a simple and straightforward way, especially so that I can constantly keep the developers on track. Our goal is to acquire everything. Literally every piece of data we can in the automotive industry synthesize it within 15 minutes. Now, that may not sound like real time, but if any of you are familiar with automotive data, you've gone to a dealership, you've went through a transaction, you can understand and appreciate what it would mean to actually be able to synthesize any given data source within 15 minutes. There are data sources, yes, I can consume in real time, such as user behavior on the web, people clicking on things, etc. But there are other data sources in an industry like automotive that move at different time scales and have different fidelity during those time scales. So I'm proposing something that I believe is pretty radical for this industry in technology. And then on the bottom end, it very much is a mirror to the top part of that, which is we need to make everything easily accessible. And what that means isn't we aggregate everything into a nice EDW, we put some reports on it, everybody gets an email about it's the same thing every day, blah, blah, blah, blah. No, we are much more moving to what I would consider a contextually aware intelligent search engine. I think there's lots of big giant tech companies that are also realizing the fact that consumer demand is dictating and user interfaces on phones and different devices dictating that there will not be a set UI. There will not be a time for you to perfectly create a linear experience and a linear data set that will deliver perfectly for every person. Instead, you kind of have to open it up and let people search through your data and forage for what they need. And in a lot of cases, in the case of automotive transactions, you also need to be able to learn from that foraging and push people the contextually relative information, you know, at the time that you think they need it. So the technologies that we've been deploying, obviously, HDP, we've gone through a couple, you can tell from the last couple years, a couple upgrade cycles. So super excited that they're working on more and more automated rolling upgrade stuff. We've deployed Spark into some of our most mission critical algorithms on the back end to do our transaction matching and things like that. We've deployed Elastic Search to great effect, which has had a similar transformational effect on what we do and how we deliver the information. And then more recently, we're developing advanced multi dimensional real time visualizations using the Unity engine and some processing. And if you want to go where hopefully your heads are going, very much so exploring minority report like experiences within the true car data sets. And it's not fiction. Like, if you want to see some of this, I'm happy to show you, you know, if you catch me on the floor out there. So to give you some numbers on this, this is what we do at true car. And you saw it in the first slide, our date has grown 24x in the last 12 months. That comes from, you know, over 12,000 third party data feeds that come in every day. Obviously, we have a bunch of events, event data that we also generate to the tune of 65 billion data points that we're processing through this data platform. We have to put valuations and price reports on 200 billion possible new car combinations and all the options and things that you can put on a car. We've processed over 710 million vehicle images, which I hope is surprising to some people that you wouldn't think that that is something a company like true car has to do. But actually, you know, we've had dealers tell us and I kind of laughed at it first when I heard it. And I thought, that's kind of true. If an image, if there's no vehicle image, the car doesn't exist. And so images are actually an important data set that true car deals with both just in being able to show people what vehicles are out there. But there's a ton of intelligence embedded in those in those images. And then we, you know, over a 10 year period, we've seen over 20 million car buyers go through our platform. So that's that's no small data set. And then, you know, in Scott had mentioned the open enterprise, you know, is that just a buzzword? It's not. And we've kind of lived it even before Horton works started to use the phrasing. Because our vision with that data platform was to make sure that we got out of our own way, put all the data somewhere, make sure everybody who's supposed to get at it can get at it easily. Our own the stuff that I was saying about giving search away to the consumers is also what we give to our internal people. We eat our own dog food. Internally, you need to be able to search those same data sets and learn from those and decide how you want to compose those and get those out to the world. So, you know, again, in terms of number, we have a 72 x improvement on our inventory processing, even while adding more business rules, more data, etc. And that's only accelerating as we get better and better with the technology. And as the platform, the Hadoop platform itself continues to improve. We had 24 x improvement in image processing, which we will we'll take you through the technology will open up and show you that technology later in a talk that me and one of my great engineers and Neil will be given later today. And I'm really excited because he'll go pretty deep into what we did there. You know, we had 20 x improvement, the number of internal Hadoop experts. We had 12 x improvement for things like clickstream. So obviously, that's good stuff. And we just it doesn't matter where we've deployed Hadoop and Hadoop related technologies. We've seen these kind of improvements. And when you combine that with the economics I talk about, I mean, I don't have a budget that I sit there and worry about. No, what I worry about is the speed at which I can continue to drive features that drive those kind of numbers. It's a competitive advantage that true car did what we did, which is we jumped into this technology long before this room was completely filled with people that are just getting started. It's important in the auto industry to lead the way, even if it's super risky, what that's led to in qualitative terms, developer productivity. Yes, early on it was hard because people had to learn new things and some of the things were still early and immature. But over time, the key was giving people access to the data because if the developers can understand the data and see where the value is, they'll develop better solutions upstream for processing that data. Data science capabilities, those are obvious. I think some people think that the data science, the key to data science is having really cool stats models and really advanced computer software things that make you... No, the key thing is speed. Because whatever models you're doing, and I went to the red point talk on machine learning and kind of hinted at it, which was you need to be able to test your theories as fast as possible. Almost every machine learning technique, its bottleneck becomes the speed at which you can train something. So I focused almost exclusively on speed, getting information processed faster so that everybody can experiment even more and more. And no experiment is going to tell us the answer. There's going to be answers that get pulled back and we have to rethink it, just constantly redoing it. That's the key to our improvements in data science. And then of course, the recruiting engagement. And I can't stress that enough. If I had underestimated one thing in my own hubris thinking I can talk people into things, it was really making sure that we could hire people. It's super competitive out there. And we took it... Instead, the way we were going to do it is be a little risky and say, we will train you. And we know these skills are going to enable you to go on in your career and make lots of money doing whatever you love. But we're willing to take that risk that if we continue to be fast and inventive, you'll want to continue working here. And I think to talk to any of our engineers in my group, we've had extremely low turnover in the last two years, which to me is a metric that I really, really care about. So where is this all going? And again, this isn't fiction. That's why I call this the future present. Because a lot of times we'll say, oh, the future, the future. And I'm always like, when is the future going to get here? Like the example I always use, it's like, dipping dots has been the ice cream of the future for like 30 years. And we're not eating dipping dots every day. So I don't want to keep saying the future. I'm saying it's here. It's actually here. So here on the screen, I have some as visuals to represent what I'm going to say, but I want to make sure you understand it. We have already started deploying what I call market simulations. And what these are to us is that we give dealers, OEMs, ourselves ways to turn dials, turn knobs on various facets of the data, some things we know very well, some things that are on the edge. When you turn those dials, our data platform goes to work. And it says, Hey, if somebody decides to drop an incentive in this region in this marketplace on this vehicle, what's going to happen to the rest of the market? Then another guy does the same thing over here and over here. And you start letting these these simulations interact with each other. And all they are is millions and millions of experiments. But it's important that we run those experiments and have the user base run those experience and have the dealers run those experiments because we learn through those experiments about things that are likely to happen should all these dials actually in the real world be turned. Well, what that does is that helps people prescribe their own strategies and deploy those strategies. Well, I'm talking from the auto manufacturers as well as the dealers. So what what that helps true car understand is where is this dynamic marketplace moving? And and it's interesting because is there any way to do a long range prediction on something like the auto marketplace? Absolutely not. But is there a way to get a two day advantage, you know, against your competitor on where that market be might be going? Absolutely. And now those two days kid can mean a big difference when you're talking to the scale of the automotive industry. So beyond the simulations, which obviously some of those simulations are deployed purely as analytic things. But ultimately, the simulations get deployed through our mobile experience, where users are out in the real world and over half of our users now are using mobile, they're going to the dealership, we understand where they are, we've geofenced them. We can deliver information based on where this marketplace is going at the right time. We've already deployed this this product in a little way, and I encourage you to go use it and experience it. And we're right now in kind of the version one oh of it, but it's going to get pretty advanced. And then on the furthest side there with all the pretty maps, we're going to give all the data away. We believe so much in the truth and transparency and what it can do to a marketplace to make it efficient, that we're going to expose all of this through beautiful interfaces, through search interfaces, to all of our partners, all of the auto manufacturers, all of the dealers, all of you, and you will be able to plug into our data platform and explore it to your heart's content. Because at the end of the day, we just believe the more everybody's informed about the automotive industry, at every aspect of it, the more that they will actually transact and the better experience that they will actually have. And again, this is this is not the future. I encourage everybody to go experience this on our service. Yes, because I would like all of you to use the service, but also I want you to understand that this is not in the back office. This is our platform is powering what you see out there, and it's doing a lot of business. The one last thing I wanted to mention, because Scott kind of got into it, and I think if I had a little bit of a technical challenge that people gave me was, you know, is Hadoop going to work in the enterprise? As if that's somehow like I wasn't doing enterprisey things with Truecar, which is really interesting. You know, so we talk about data governance and master data management and security. Of course, of course, it could do it two, three, four years ago. It just was harder than it is going to be in something like HCP 2.3, which is great. But for me, that was never a roadblock.