 that's it perfect and oh by the way the other bar is open here too and you can probably see the slides from there although hearing it is more important alright so we have a panel that is our first man we have two panels today one is a panel from the perspectives of the technologists from the the the ODP I guess it's now called ODP I I'm not that good with the names ODP I what's the I stand for little I what's the I stand for initiative okay all right cool and then the second panel is a customer panel along with Merv George will be back up as well so I'm looking at Merv yeah right so we got Sean Connelly is here he's vice president of corporate strategy at Hortonworks long-time cube guest pulled that nice shot of you and there you go the first year at the Warwick hotel and Mike Maxie there's corp dev and strategy for pivotal and also John Finnelli who's the head of marketing for data so gentlemen thanks very much you know welcome to the to the stage to the to the cube here at pillars 37 so Sean I wonder if you can get it started the sort of theme for this panel was the ODP perspective on the vision of the enterprise for the future so I want to start with where we come from you know we've known each other for a while we've seen this industry evolve very quickly where are we today in this whole Hadoop world how would you describe it so it's a couple of different metrics interesting and room earthy actually I think spoke at Apache con in 2016 is what 10 years of Hadoop right so interesting you know 2005 2006 is when a lot of the Hadoop technology started I'm you fast forward to today and there's actually a very strong ecosystem of solutions around it it's fundamental in a broader architecture it's deployed on prem and appliances in the cloud and there's really a lot of innovation that is happening in and around open source Apache software foundation new innovative engines both commercial and open source being applied to the data in the system so I'd say you know strong litmus test of you know vibrancy around the you know the evolution of it you know I think interestingly you know I see more Adrian I think he's going to be up next Mervin I see eye to eye on where things are from an adoption perspective is you know things are I think progressive progressing really well for Hadoop within the enterprise those deploying it those looking to deploy new solutions and things like that so you know I think it's you know working its way towards a data dial tone so to speak it still has a lot more work to do but you know I think it's it's very relevant yeah so so indirectly I heard some of Merv's data or Gartner's data I'm not sure we just did a survey it said 60% of the people we surveyed had had do I think I had heard a lower number I heard I heard somebody's interpretation I don't know if it's a out of context interpretation that Gartner was saying Hadoop adoption was very slow I don't know maybe you could comment yeah down what's your number is it 50 60% of what are you guys saying 44% planning or investing so that we did a survey 300 it was a little higher you never know what these surveys and you got to see get we just got the data it's early and isn't who you serve if you look at the sort of standard market adoption curve that's right you know we ain't well into the early majority and you know where things begin to exercise themselves and stabilize right so okay so you were talking about this you know sort of this infrastructure tone the dial tone of infrastructure but so Mike I want to ask you I know if George mentioned this in his talk but he has this discussion he always uses this line as there's a slow motion collapse of infrastructure software pricing going on in the industry and you guys infrastructure software companies so what do you do about that how are you evolving up the chain up the value what's the mission given that we were sort of all aware of this yeah I think there is a point there it is prices are going down I think is the way to think about it but the demand is not and the way you connect that up to the application is super important so pivotal we do a lot of data we also do cloud we do agile and we bring that together and that's really where we're seeing value in a lot of big enterprises so being able to sort of connect it to that end application I think George's slide around I think he called it Hadoop 3.0 or big data 3.0 was a pretty interesting take on that I think John your architecture is similar to that so we're seeing lots of standardization at the bottom of the stack and more investment up stack whether it be Spark whether it be new projects coming in from particular vendors or whether it be end users trying to build those applications all you guys are driving standardization at the bottom of the stage and that would be a big part of the line to do for sure but I think we're also trying to make it easy to consume right to the 44% number I think that vendors have made money on management consoles for the last 10 years but consumption still not where it should be super valuable tools in this stack and more and more showing up every day and we feel that sort of standardizing at the lower layer allows innovation above that's a lot of what it would be what API is about so John do you have a perspective on on on why is it just so complicated right so we sort of do a lot of end user customer facing things and we do a lot of distro facing things because we run on top of Hadoop as a YAR native app and you know the feeling that I'm getting from customers and what I hear from them is there's kind of three concerns I need to see addressed right so Hadoop was created by developers for developers right so as a result it's actually at its core very difficult and as we try and take things to the enterprise right they want to be able to see that they can make it easier for the enterprise not just for the developer but for the day of ops guys for the data scientist and even the line of business guy so the first problem is actually can I build something can I build an app and so then the second problem becomes I built one does anyone care right because a lot of the Hadoop apps are just kind of repurposing or redoing things and they're science projects and so this so the second part is can I make something that's interesting and get people excited and if I fought through everything and I built this app and everybody loves it can I operate it is it gonna fall over can I put it in my data center is it 24 7 and so I think a lot of the work that we put in as a company and I think a lot of the direction that we're taking with ODP I is to really bring the ability for Hadoop and apps on Hadoop to be enterprise ready so they're easier to create that they add value to the organization and they can be operationalized once you put them in production. On the other notes as some of the when I view the drivers from an ODP I perspective I viewed it from the lens of the ISV ecosystem and the enterprise from an adoption perspective and the thing I've consistently heard and whether it's Hadoop or other technologies it's it's very similar is there's from ISVs are trying to deploy into variability of different permutations of Hadoop based systems it makes it hard to get procured so they get held up in the procurement process because it's sort of are you compatible or are you certified with this version of XYZ so number one is removing that as an obstacle make it prescriptive and what it means to actually deploy into a compliant system because then that helps the enterprise actually focus more on the interesting part which is what solutions do I actually want to run on the platform and not get held up in this is incompatible or this doesn't work right so you start at the bottom and you start raising that bar of interoperability into other components and I think frankly that needs to be our charter right so more solutions can come to bear to your point right focus on the apps how do you enable the apps well we heard Mike Olson today say last year made the bold prediction that Duke would disappear is that sort of your shared vision to make Hadoop disappear make like storage invisible like some people say an infrastructure invisible or so you know a analogy so I've been with Hortonworks actually four years we form formed the partnership with Microsoft and terror data at the time so it's two very different reasons for investing in the Hadoop stack but how I phrased the why we did the Microsoft partnership was at the end of the day I want power BI Excel users or what that the billion users in that ecosystem to tap into the value of data served up from a Hadoop platform and virtually be entirely oblivious that there's this yellow elephant behind the scenes so if that is what you mean by making Hadoop disappear then yeah because it makes it seamless to consume and actually enables them to use the tools technologies and other things that they're familiar I don't think you might die yeah make it invisible you know it just needs to be consumable for particular user communities your sequel BI user community your scala you know developer community your you know Python it should speak your language and it should be intuitive so we had a little contest in the cube today who said spark the most I think IBM won I'm pretty sure and so not surprising there but help us squint through the whole spark trying so many jokes this could be called you know spark world what's going on there what's your point of view on it do you embrace spark the same way that VMware embraces open stack do you you know have a different angle on that is it something that sort of fits in what if you could help us parse through that want to start I just I would add the survey that we did so on streaming data torrent by far the most popular data streaming platform interestingly ahead of Kafka which is new ahead of you know spark streaming so that was sort of testament to the work that you guys have done which is not surprising because you've done a lot of advanced work but what about spark where's it fit I mean everybody's talking about it what do you guys say thing and stretch it for everything you make certain design decisions that are trade-offs and you know what we see if we expect in the future as things as Hadoop becomes more standardized and you know there's yarn and other activities that enterprises will run multiple processing paradigms you might use data torrent for streaming you might use spark for batch you might use something else for graphing and so the idea is that the the engines should all be compatible and I think this idea of one doing everything is really almost sort of a desperation move of Hadoop's not moving we're not seeing the adoption it must be MapReduce is broken we're gonna all do spark and it's out of like student body left sorry Mike but everyone sort of going that way I guess except for me and so you know the whole idea of one platform sorry that's actually gonna do it all doesn't really make sense and and I think it's almost you know the emperor wears no clothes everybody's looking at spark and saying beautiful outfit and I talked to people on the side and again there are use cases in spark is good I just worry about we over hype things as vendors and that sets expectations for enterprises that we're not ready for and so I talked to a lot of people and they say I'm using spark and I go great how long have you been in production production and I'm just using it right and so we can't pin everything on one piece and we have to be honest with ourselves and I think you know even the cloud era announcement that they were focusing on spark they said they were 50% there that was the first time I heard somebody from a vendor kind of say where it actually sits guys and nothing yeah do you want to touch that okay oh no I agree I well underscore a couple points so data to our quarterly you guys invested early on on being a yarn tenant in a loop cluster from our perspective Hadoop is a multi workload platform right and so we need to enable all these different paradigms so you can get value out of it whether it's spark for discovery machine learning or you know data torrent for streaming hawk for sequel access I don't really care I all those engines need to be able to plug in cleanly in a way that's predictable so the enterprise has the choice to light up whatever particular user base they have and so there's a from my perspective in this you know in this market I think there's a really interesting user community around spark as that analytic engine within a Hadoop you know context and even outside of a Hadoop context it's able to get data from other data sources and others so I like that I like that from a application application creation and enablement perspective so so you know it's there's an embracement there it's it's a many strategy not a one strategy I guess is so I want to be calm on this when when we talk to practitioners in our community they the big guys say we've invested a ton in Hadoop and we're not tossing it you know we're gonna evolve it there's a fat middle though this is you know we're we're still trying to figure this stuff out we don't have the resources and yeah you know we haven't invested a ton in Hadoop and we're we're we're looking at spark you know seriously to try to figure out where that fits does that make sense to you is that what you're hearing the ecosystem in general what's hard is consumption and knowing when and how to place your bet can't hear me about now that's a sound guy sound guy thank you so what I was saying is there's a ton of innovation whether you're talking spark or really anywhere in big data where we've seen it gets hard as consumption right and largely you rely on a vendor to help package it for you and hope that the vendors making the right choice but you know a voice into how that gets consumed becomes harder and harder as more options show up and competing projects show up and you know I think as part of ODPI our goals to be opinionated and have an opinion that really addresses the industry by everybody weighing in on that opinion and making it easier for consumption so that's really where we're focused and sparks part of that as will be many other projects as we grow and expand well and I mean let's expand it out to again I I like to think about enabling specific communities of users to actually use their tools and get value out of the data so let's pick sass the analytics company there's a whole set of sass developers out there they have the ability as a yarn tenant to run the sass engine native in Hadoop so that community of users can actually run their advanced math against data in Hadoop right spark community same thing right streaming around technologies like data tour and other same thing you kind of need to appeal to whatever the use case is whatever the right technology is the more users that can come to bear against the data to get value out of it the better from my perspective it isn't just whose engine is better it's what's the use case it enables and how large is that population and does it help them all right I got a couple more questions and I want to I want the audience to sort of think about some questions that you guys might have so I want you to participate so if you don't have questions I'm gonna call on you the I hear the guy in the back is gonna throw Skittles is what I heard so they were loaded up so so if you didn't like that question I'm not sure there's a question here but I woke up this morning I hit up I went on silicon angle and we're reading the reviews of what happened on the cube yesterday and it was a quote by me it said it was a quote by me and so they watched the live stream and they put this up they said Volante says that the Duke ecosystem big data ecosystem is crowded overfunded profitable profit list but has a lot of potential I was like holy crap I said that I'm gonna really piss off a lot of my friends but then I said hi what am I gonna what am I gonna say here there's a silver lining there what is the silver lining okay crowded that means because it's big space everybody jumps in overfunded that's not necessarily a bad thing that means that innovation can occur profit list maybe just means it's early lots of potential but what about you can actually make money as a public publicly traded company you can look at the Hortonworks financials in the Hadoop space so there's revenue that comes in but we definitely invest ahead of the curve in order to drive the tech forward well you go to enable those integrations and to enable the user communities to enable the Microsoft ecosystem to tap in cleanly etc right so that's expensive and you got you're trying to go to a global footprint so but some comments I mean we had a lot of enthusiasm around this business VCs poured in still a good business I mean I know you're gonna say yes but what gives us confidence that this is actually gonna live up to the potential that we've or the promises that we've put forth comments thoughts I'd say yes still a good business lots of money have showed up right and as you well know I think your quote was pretty accurate there's a lot of a lot of money pouring in a lot of money coming out but I think that everybody's betting on the potential and you know the the folks that are started a lot of this are the big internet companies that are disrupting big enterprise they're dis-intermediating their customers and they're becoming you know taxi services that don't own taxis for example and the investment that's happening now is the long payday so I do think it is still a good investment I do think that to your point it's buying a lot of innovation it's also changing the way companies behave we started as you know and Pivotal came out we popped out of EMC and VMware we had all proprietary products for the most part we're now pretty much an open source company right everything's going open source and we're not doing that for shady reasons we're really doing that because we think that's where innovation is and that's where standards live so some of the investment may not turn into dollars right away but in the long run it will show up so just a self-serving data point so because some people go actually make money in the open source business right so who's the fastest company to a hundred million in revenue? Portmarks. Salesforce.com took five years we're on Barclay's report was we're on a four-year trajectory right so the thing is is that market area is moving quickly there there's value that makes a difference it's not just about shaving you brought up the cost angle I think you can save money deploying Hadoop infrastructure if that's all you focus on that's uninteresting it's about the data discovery use cases it's about the single view of whatever use cases is about the advanced predictive analytic use cases and real-time streaming use cases that actually make a fundamental difference in business outcomes and so I you know I have paid to be bullish but I have to measure my comments from a public company perspective but I think there's a lot of value delivery that is happening right in front of our eyes and if you listen to a lot of the use cases there's really a strong return on investment in rolling out these solutions well I think they go ahead so I mean I think you raised the question of building a business and the idea of the funding coming through so I think in some regards the funding is enabling a lot of innovation with the if we build it we'll figure out the business model later and that may work for one or two companies but it's not going to work for the ecosystem and I mean we've recently joined the open source community with Apache apex if you guys haven't seen that you should check it out and you know we built a business first being a proprietary company for two and a half years we built an enterprise grade product we have Fortune 10 companies running us that are generating value then we open sourced into the community a product that's 18 or 24 months ahead of what's already out there so we built a business model we built how we can sell software on top of that and our open source isn't a well we're gonna open source and you know charge for support and services we have a software company that's got a core open source technology that is taking off in the last 45 days as an Apache project it was the question yes she looked eager to ask so I didn't want to overlook Mike maxi will answer this question sure that would be great so Margaret Dawson and I this was not my question but I'm with Red Hat and so when you talk about an open source company that knows how to be profitable you know I would say that we are the best example but I'm an ex-Red Hat or two okay there you go so my question was actually kind of related to what you're all saying multiple engines I'm hearing standards I'm hearing a mix of different things around simplicity and complexity and I think that what is confusing is people throw around the word open and standards and at the end of the day enterprise IT needs integration interoperability and I want to know what each of you is doing not only to simplify big data solutions within the either the Hadoop ecosystem or otherwise but importantly work with other shit because that's where the pain is right now excuse my French go ahead what about that I think that's a great question so I so I think we're going to change the ODPI tagline work work with shit better work with shit better all right so I think somebody tweeted that okay so what about that what are you guys doing Mike yeah I'll take a swag at that I think you know it's kind of back to your consumption point right and everybody says open and what is open mean and those sorts of things I mean if you look at what we've built with ODPI I'm gonna go back to that because I think that was the point of the panel it is it is extremely open anybody can join anybody can vote everybody weighs in it's really a sounding board to build an opinion and make things easier so when we say open we mean not only from developers with companies and everybody else but the deliverably out of that and this will take time to do but the deliverable out of that will be a really simple certification program where as a consumer when you're buying a stack because that's how people buy stuff these days you'll know that everything in that stack all carries the same logo it all works together and it's not your job to put it together to pay you know the system integrators $400 an hour to put it together it's it works and it's known and it's it's an easier model for consumption so that's a big part of our goal to make shit work exactly and that's why we have a range of folks folks who operate at the platform layer and how do you lay it down in a consistent way right across all vendors to the application and engine providers to end customers right who are also building solutions on it because each one of those perspectives frankly helped shape some of the initial deliverables that we announced this week around what's the specification look like what does compatibility and compliance mean for the ISVs who need to deploy from the platform vendors who are providing the substrate that they need to deploy into what are all the knobs and dials and how should they be set by default to make that easier right more predictable and so and then how do you actually have a test suite that asserts that scenario and we have folks with on-prem solutions we have all to scale who's a Hadoop is a service solution they have the same desire is they want those workloads to just seamlessly drop in and run on their platform right they don't want the integration headache they want the ease of interoperability right and so I think that's incumbent upon our efforts to deliver on that right and that's what frankly every meeting I'm in which we're in a lot of them that's what we're measuring ourselves against Matt from Encapsa you got a question okay I'm not from Red Hat but I'm a small chief IP council from a big data solutions company called Encapsa and you can consider me as the man off the street like the layman because I you know I don't have the gravitas or the understanding of everything that you guys are talking about but I do understand some of the needs in the community and particularly the health care industry one of the big problems in the health care industry is that they have disparate databases that don't communicate with each other okay you have either IBM a sequel database here Excel or something there two health systems want to combine they can't they have to hire a whole team of software engineers that spend millions of dollars take millions of months to create a common field structure to combine this data so they can search patient A or patient B well what can Hadoop or Spark do in this particular very I agree a very small dedicated industry to combine these that this that these data for these databases so we don't have to deal with database software engineers and all the riffraff that is costing hospitals tons of money it's just one very particular issue that's going on right now now at least an estimate solve the cancer problem that's a heavy lift right there I just I'll toss out a framework and we can expand on that so one I mentioned a variety of the common use cases that Hadoop gets used for one of them is this single view use case and I was actually it's it happens in the health care industry it happens in retail the pattern is very similar where Hadoop in many cases can act as the catch basin for data across those disparate systems you can enrich it in your health care scenario it's not only about the existing data sets but it might be about the new patient sensor data as well as other feeds and other data that are from new sources being able to get them in one spot where you can actually integrate them aggregate them join them in ways that you've never been able to do before whether you spark or other tools on Hadoop it actually gives you a place where you could begin to make sense of from a 360 degree view right the value that's inherent in that data that that patterns being applied across different industries but that's it's common particularly in the health care industry that I see just a pile on there I think there's a presentation earlier today from one of our data scientists about health care and how they're doing predictive to figure out I want a heart attack victim shows up when they'll check out and it's you know saves lives and lots of dollars for health care and it's flipping from predictive to am I looking up patient A or B in your example so I think beyond sort of Sean's point of hey you can bring it together you can actually start to do forward-looking systems of what was the term used in intelligence operational systems that you start to pull and play as your point in live data from whether the person's at the doctor at the hospital or you know wearing their Fitbit and then you want to supplement that with existing databases to then drive to an outcome and you know the ability to do that on Hadoop whether it's with you know Apache apex or data torn or whatever we have the ability to take in multiple data sources and combine them very rapidly because you can parallelize them I think in the old days the having these guys that are you know basically taking data files and merging them and creating data warehouses and building bigger files and they're all sort of based on previous limitations and those limitations don't exist anymore with the parallel processing and the inexpensive storage of Hadoop so hobo lie kind of running a small data analytics shop being in this space about 20 years so I don't pretend to be expert in this but I have two questions hopefully simple I mean I guess one thing is if I remember right I mean those Hadoop it's coming from to basically it down was writing it following two Google papers one is for Google file system another is my produce a pair of them I mean those type of things so essentially it's an open source solution of parallel computing on commodity hardware I mean if open source is commodity software so I guess coming from EE there's a line that where you actually need to do parallel computing so it's probably not everywhere so second I have a chance to go to like on the financial which is PayPal of Alibaba which is wrong a few times higher volume than PayPal I asked about them I mean basically on the transaction based data they're still doing a lot of things on my SQL and other stuff so I haven't really see people talking about the marriage one day between the totally structured data and unstructured data Hadoop is it's coming came from unstructured data so where there's a future where is what's going to happen I mean are you guys going to solve structured data problem in the future so this is a softball handoff to the two of you and I'll tee it up so Hadoop evolved from MapReduce and HDFS to actually have the yarn layer which enables MapReduce enables Spark it enables MPP SQL solutions and enables streaming solutions to all run in a shared infrastructure right so it it started batch oriented MapReduce but it's evolved to actually multi-purpose real-time interactive data platform and so you certainly can store structured data in a Hadoop system using technology like Hock which is recently Apache right as well as other SQL solutions and and streaming from DataTorn and other solutions when the platform as well simultaneously yeah right so yeah I would say you know to Sean's point it's super flexible today I think it's always been advertised as that but didn't always deliver that but it's matured to a point where there's lots of solutions and you know SQL from many vendors and really nice streaming solutions so I think it's delivering on that promise that was made you know ten years ago when Hadoop was hatched of hey let's combine structured non-structured and get value so there's plenty of examples over at the show and you know I mentioned our data science team is doing that all the time for SQL we introduced Hock Apache Hock yesterday I believe so that's a nice SQL engine there are many others available and then you know so I do believe it's there but it took some time to get there so I mean again I think I would echo you know the whole idea of having a flexible platform and again for our perspective yarn is the way to go is that enterprises will use the right tool for the right job right so you know we do a lot of data work on data in motion and so you may want to do SQL queries on a data in a certain time period in the last hour in the two hours but there's also going to always be a need to do SQL on you know the last five years the last three years the last two years of data so you want to be able to run you know SQL on static data and you want to be able to run analytics on your data in motion and you want those both to be in the same place so your data can be in motion you can access that data take action on that data and then put it at rest for these ad hoc queries so there's actually room for both we have to wrap thank you so much John Mike John okay quick break great questions quick quick break we're gonna do a switch we have the customer panel coming up we're gonna talk about building systems of intelligent intelligence bringing those transaction and analytics systems together so we write back thanks for