 TheCube at Hadoop Summit 2014 is brought to you by anchor sponsor Hortonworks, we do Hadoop. And headline sponsor WANDISCO, we make Hadoop invincible. Hey welcome back everyone, we are live in Silicon Valley in San Jose for the Hadoop Summit 2014. I'm John Furrier, the founder of Silicon Angle, with my co-host, big data, leading analyst Jeff Kelly with Wikibon.org, this is theCUBE, our flagship program, we go out to the events and extract the signal from the noise. Our next guest is Joe Hellerstein, co-founder and CEO of Trifacta, really emerging, fast growing startup, heavily funded, just announced this week in new funding, around $25 million. Great guy, computer scientist, visionary, all across the world, congratulations on the funding, welcome back. Hey, good to be here. So we're a big fan of you guys, we'd love to see the computer science curriculum expand into multiple disciplines with data science, just seeing all kinds of new stuff around computer science, driving a lot of new stuff, and so we'd love anytime you see data, databases, all the stuff happening. So I got to ask you first question, what's going on with Trifacta relative to the funding and employee headcount and just stats with the company? Well today we're actually here, focused obviously, Hadoop World on this announcement. We made our partnership and certification with Hortonworks, so that's big news for us. We're certified with Hortonworks. We've got a few companies that certified on Yarn, so that's part of the excitement. We did also go through this funding round, 25 million series C led by Ignition Partners up in Seattle, with participation from our previous investors, Greylock and Excel. So that's all sort of the big news on our end. We are growing, we're investing in sales and marketing, we're investing in the engineering and design side of the house too, which are really important for us. So the certification thing is just on Yarn, or is it across the board on all Hortonworks? Across the board on the Hortonworks platform, and then as a special part of that also the Yarn. So talk about some of the computer science things happening in the Hadoop community and what you guys are doing at the bridge, kind of like that mainstream adoption as data collection and storing data to exploiting the value of data or the insights really where there's real-time actionable insights, because you have the business benefits which I would just oversimplify by saying people want real-time actionable insights with data. And then at the back end we're just storing it all. How do you guys connect the dots between those two worlds? Yeah, well you started the question with computer science and what I'll say is that my academic roots that were evolving I think is a field that was strictly technical and really looking at sort of what are the hardest problems in computing and realizing over time that some of those hard problems that are worthy of academic study are how do people actually interact with computational infrastructure, interact with data and there's technical challenges there because really technology is supposed to make our lives easier, not more complicated. So there's lots of interesting technical challenges there and trifacta grew out of an exercise to try to study how do people work with data, how do analysts work with data, where are their frustrations and how can technology be brought to bear to make these human processes more efficient, more effective, more accessible to a broader community and really shorten that time to insight and increase the number of people who can do these kinds of tasks. So do you guys explain to the folks, do you sell software, is it SAS, what's the product? Can you guys do a little bit of detail on that? With pleasure. So trifacta transformation platform is a piece of software that you install alongside your Hadoop cluster and then it gets access to the cluster and allows you to visually transform data from the raw form it comes in at into a form that is suitable for analytics in predictive analytics or business intelligence kind of platforms. So it's deployed next to a Hadoop cluster which means that it can be deployed really anywhere that Hadoop runs so that can be on-premises in an enterprise, it can be up in the cloud and a cloud-based instance of Hadoop as well. Who's the buyer of the product? Well that varies depending on the customer and as we've seen I think at Hadoop Summit today that's an evolving story as to who's the buyer of Hadoop, right? So it starts with big data science projects, it's evolving and what's interesting is that it's becoming increasingly relevant to the line of business and so we're seeing that in the market as we go take our product out that that Hadoop ecosystem is starting to span from science projects to IT to that actually business-facing uses as well. So in terms of the analytics market I want to ask you where are you guys seeing the traction in Hadoop? Jeff was just renouncing a survey this morning we haven't announced it publicly as a document but we've been teasing it out that only about 20% roughly the people are buying some dupes of descriptions which means it's a huge tsunami coming of paying customers I'll certainly a lot of deployments people are tire-kicking is it what do you guys do outside of Hadoop relative to some of the mainstream deployments? Took a left turn on me there so the Hadoop market in general I would say we are seeing some of what you're talking about which is that validation of it as a platform that is not to be ignored in the future for business use cases and real adoption in the enterprise is has come and it's that's an awareness that is across the industry at this point and so various customers are in different stages of moving down that path but I think the existence of that path is very widely recognized now and I would say that's a change even in the last six months that that's sort of that has arrived so that's one of the things we definitely see out in the market our product I should emphasize is a technology it's an interaction technology in essence for how do you work with your data and it's a technology that's not pretty say wedded to Hadoop we chose Hadoop as a first platform to go launch on because we saw the promise here and also because the Hadoop environment is so agile and it does sort of celebrate this idea that you should collect and gather any data you can and then empower people to transform it into shape to get used out of it and traditional platforms even though they could be used for many things are often used for much more engineered pipelines that are governed by sort of more traditional software that said though I mean the Hadoop market is expanding and we will stay there and perhaps move outside of it as well as we go right well I mean there's going to be a lot of opportunity in the Hadoop market for sure I mean we're at that we're getting close to if we're not at that tipping point where we're going to see a lot of those POC start moving into production and I think that 25 percent number of paying Hadoop customers is going to increase significantly over the next year but in terms of you know your core value proposition I think to put a little context around it so we've done some survey work and the number one challenge technology challenge associated with Hadoop is transforming data into you know a form that's suitable for analysis so you're clearly on to something onto a problem that is challenging a lot of people and I think the real value or one of the real value propositions here is you're making data scientists who are extremely expensive and rare much more productive and that's going to obviously help their output but it's also going to help them feel happy in their job and want to stay there when you go into customer situations what kind of feedback are you getting what seems to be the major driver for your customers when they come to you well so there is an emotional aspect I think for certain customers certain users certainly where they look at this and they're just so happy to have it because they really first of all would rather be doing other things than transforming data most of the time but also they see our ability to make data transformation interactive and to be able to explore multiple paths and get visual feedback on what you're doing and it just it is pleasing in an emotional sense that very little software in our space has been traditionally it's been very sort of boring software at some level and so we get that we also get just you know the fact that when you can shorten that time to utility for your data when you're going from raw data to something you can actually do an analytic exercise on that just tightens this feedback loop of acquire analyze change your business and go back around and that's kind of the use case where there's that's the meat of the market I would say for big data is to drive change in an organization and the bottleneck as you say in that feedback loop that bottleneck has been in transformation for too long now so when you go in what are most people currently doing for these for transformations I mean this has been an issue not just in big data scenarios but having covered kind of the data management market for many years this is an issue with traditional data small data whatever you want to call it and there's a lot of at least in my experience a lot of encoding going on a lot of manual effort is that still what's happening in Hadoop environments what are you seeing kind of when you get in there and how are they doing things now compared to the opportunity that you offer them yeah it's happening even all the more so in Hadoop environments because of the DNA of the community being so technical so the first to do you know installations were being used by very technical organizations they could code everything and so it's your first approximation they just did they would just just code their way through any any mountain they would run into right in the traditional market actually though once you got into really messy data the kind of stuff that we pull in for big data you also in essence were being dropped down into code even the tools from that era that were ostensibly for transformation we're mostly mapping tools to map schemas from one input to one output if you have to do more than just simple mappings you are in a scripting environment so if you look at a the data step facilities from SAS for example it's a coding environments powerful that's quite technical and so that's the kind of thing we're looking at it's either code in a traditional programming language like Python or Java or it's code in a framework that really isn't tooling very much it's forcing to write code again so we have some questions from the crowd chat here for you the question first is from Burke Lattimore as you just moved to production with Hadoop are they building internal systems are looking to cloud search providers for Hadoop platforms and do you play in that yeah I mean as a technologist you have to respect I think two really important sort of laws of physics about technology the first is that data has a lot of gravity moving data from where it is to somewhere else in bulk is pretty expensive so what that means is that today most of the Hadoop distributions and most of the big data installations are on-premises in enterprises and that body of data is unlikely to move on mass to the cloud at the same time you have to respect Moore's law which says that in a couple years the amount of data being generated will be far larger and will swap out the data we have today and where is that data going to get generated in exponentially greater volume in the cloud so what I would say is most of today's customers adopting Hadoop I see when we're out in the field on-premises I think in the future and it's all a question of timing but in the future we'll see you know that hosted environments are going to be the efficient place to generate store and analyze data and from try back this perspective is it your agnostic to whether it's on-premise or in the cloud it's an opportunity because it's data transformation is going to need to happen regardless of where the data lives I would imagine that's right that's a good that's a good place to be if you're trifecta so I'd love to ask you a little bit about just the environment you take on the show and kind of the you know the ecosystem behind us of all these different players you know kind of what's your take on the on the vibe here this year and and do you feel like the Hadoop community still has the kind of vitality it had maybe a couple years ago and it was a little bit younger and it was a little bit more of a Wild West feel to it and that we've got more coming a little more grown up is it getting a little more button down I think I'm the only guy wearing a tie here but other than that I mean what's your take on kind of the community environment well I remember in 2008 being at Yahoo for the first Hadoop Summit and I was called data intensive computing gathering and it is a little different now there's fewer professors in the room I'll say but there's almost as many people with with sort of technical cred still I ran into one of the former Berkeley grad students who's now at one of the major Hadoop district companies who was at that 2008 meeting he's here today so there's a lot of the same people you know and a lot of the obviously the executives of these Hadoop companies have that scrappy background and are still those same guys they understand the code they understand how people get business done with data so a lot of that's going on at the same time we heard on stage today and people are talking about this is real business and it's solving real problems out in the world I think it's really gratifying for technologists to see that transformation so it's it's a little bit of both I think there's more business going on that's a good thing that's a design of the success of the technology the impact we can really have and so I think everybody's actually pretty happy about that yeah and so we're getting some questions on our crowd chat for Hadoop Summit I wanted to want to hear about some of those real world examples what's some examples some of the real business value you're delivering some some customers yeah sure so happy to talk about that you know one of the use cases we talk about is our work with Lockheed Martin which is in the health care space for the federal government so there's been I think a real shift in recent years in health care to a sense that in fact a transition in health IT is going to happen and that this is this is finally the time where there's going to be investment not just in say new radiology machines but also in the IT infrastructure because data really informs and and drives what can be done efficiently with with health care and you know one of the really typical examples which is actually part of the reason we're in this this deal with Lockheed is the Affordable Care Act so one of the pieces of the Affordable Care Act is what's called accountable care organizations and this is an incentive structure for health care providers to hit certain metrics and if they do so there's economic incentives from the government based on those metrics well you got to measure yourself against these basic metrics and if you're a if you're a hospital you have many many providers in your network who all have different building systems different EMR systems and a fairly large volume of data if you're the Medicare or Medicaid organization that's a huge volume of data and getting that data organized and ready for analysis is a ton of work and so that's the kind of thing you know there's one example in the health care space that we've been involved in. So Joe what's it like to be a CEO of a venture-backed company in a great marketplace that's growing like crazy what's your what's it like for you I mean I'll see you do you still I mean I love how you have the GitHub logos on your mid-same multiple times on the cube I think that's the future having the little you know here's my code look at what I do as an executive it's always impressive but like what are you doing now I mean what's it like for you are you happy you're bored I mean I can probably tell you're bored you've got a great company but like you still code you're still doing some stuff I mean how do you keep fresh one of the things I've observed with startups over the years I've been involved in a bunch of them in various roles is that the CEO who codes is probably not a long-term thing and it's probably not healthy for the company so I coded a very little bit at the beginning trifacta and I had research projects at Berkeley that I was coding on at the beginning as well and I had to set that aside because it is a very more than full-time job running an energetic startup at the scale we're running so I also wanted to leave lots of room I have this amazing team of software engineers and architects they need their room to run and they really don't need me to be a part-time coder so my perspective is yes I write code and there have been phases in my life where I do more or less right now is the last phase and I think that's important for the business you know if you're running I think a smaller shop than we are running now and intense a scale and it starts to make some sense but it's a lot you got a lot of crowd favorites out there question for Joe trifacta CEO how is trifacta different than Palantir oh that's interesting so Palantir which I honestly you know despite there being a very large presence in the valley I haven't spent a lot of time understanding this can get a sense of what they do buying all the buildings I know rents going up because of you know Palantir has been very focused on providing full solutions to you know federal particularly security as we know and what that's included is a great deal of services and consulting trifacta is a software company so I think at the first level that's a very basic distinction okay so final word I'll give you out there so share the word your own words the folks out there some of the most exciting things going on in the big data landscape and pretty broad broad canvas for you to talk about what's exciting to you what really outside of the intoxication of running a big company the new experiences the team just in the market what technology gets you excited and that's going to be game changer well for me two things two themes that I've been pursuing for a while are really coming to fruition the first is that after years and years of data being kind of a sideline and computing what we're seeing is that data drives computing there's widespread acceptance of that yarn is a piece of evidence about this what yarn is it's this gamble that the the scheduling and resource management ecosystem around big data is the scheduling and resource management system that you want to use for a whole class of cloud and distributed applications this is the data folks taking over the entire resource management for computing it's a it's the takeover strategy it's an operating system that's I mean the idea that you're the internals of your data stuff is your operating system that's bold so I'm excited by that so that's one theme is that data drives computing intellectually technically and in the market and then the other theme which trifecta is very much a piece of is this idea that allowing people to more effectively use computers is the next big challenge in computer science so this means program or productivity for developers it means that professional data folks should have tools that make them much much more productive and that by doing these things it's not just that people get faster at what they do but that new things become possible the ability to go just at a much higher level of abstraction will enable inventions and developments that we can't even imagine and I think that's beginning to happen with more energy as we get past the problem of are my computers fast enough which we really don't worry about nearly as much as we used to Joe thanks for coming on the Cube really appreciate it you get great insight I mean you've been a professor you've done great programming and research now you're running a startup and a ton of experience thanks for sharing out here in the Cube we'll be right back with our next guest after the short break