 Live from San Jose, California. It's theCUBE, covering Big Data Silicon Valley 2017. Okay, welcome back, everyone. We are here live in Silicon Valley for Big Data SP. This is our event in conjunction with Strata Hadoop, our companion event, the Big Data NYC, and we're here breaking down the big data world as it evolves and goes to the next level up on the step function, AI, machine learning, IoT, really forcing people to really focus on a clear line of sight of the data. I'm John Furrier with our analyst on Wikibon, George Gilbert, our next guest. Our two executives were trifecta, the founder and chief strategy officer, Joe Hellestine, and Adam Wilson, the CEO. Guys, welcome to theCUBE. Welcome back. Founder, co-founder. Co-founder, because multiple co-founders, I remember, because you guys were on the first sites to have the GitHub and the About section on all the management team, just to show you how technical you guys are. Welcome back. And if you're trifecta, you have to have three founders, right? So that's part of the track, right? The triple threat, so to speak. Okay, so big year for you guys. Give us the update. I mean, obviously we had Elation on, so there's partnering going on, some product movement, but there's a turbulent time right now. You have a lot of things happening in multiple theaters, the technical theater, the business theater, and also within the customer base. People, it's a land grab, it seems to be, on the metadata and who's going to control it. What's happening? What's going on in the marketplace and what's the update from you guys? Yeah, yeah, well, yeah, last year was an absolutely spectacular year for trifecta. It was four times growth in bookings, three times growth in customers. It's been really exciting for us to see the technology get in the hands of some of the largest companies on the planet and to see what they're able to do with it. I think from the very beginning, we really believed in this idea of self-service and democratization. We recognize that the wrangling of the data is often where a lot of the time and the effort goes. In fact, up to 80% of the time and the effort goes in a lot of these analytic projects and to the extent that we can help take the data from raw to refined in a more productive way and to allow more people in an organization to do that, that's going to create information agility that we feel really good about and that our customers are telling us is having an impact on their use of big data in Hadoop. And I think you're seeing that transition where in the very beginning, there was a lot of offloading, a lot of like, hey, we're going to grab some cost savings. But then at some point, people scratched their heads and said, well, you know, wait a minute, what about this strategic asset that we were building that was going to change the way people work with the data? Where is that piece of it? And I think as people started figuring out, in order to get ROI, we got to have users and use cases on these clusters and the data lake itself is not a use case. Tools like TriFacta have been absolutely instrumental in really fueling that maturity in the market and we feel great about what's happening there. I want to get some more drilled down before we get to some of those questions for Joe too because I think you mentioned you got some good growth. I just want to double click on that. It always comes up in the business model question for people, what's your business model? Doing democratization is really hard. Sometimes democratization doesn't appear until years later. So it's one of those elusive things. You see it, you believe it, but then making it happen are two different things. So I appreciate the vision there. But ultimately at the end of the day, the business model comes down to how you organize. Proof points, customers, partnerships, we had a relation on Stephanie McReynolds was on. Can you share just and connect the dots on the business model with respect to the product, customers, partners. How is that specifically evolving? And give some examples. Sure, yeah, and I would say kind of, we felt from the beginning that we wanted to turn what was traditionally a very complex, messy problem dealing with data into a user experience problem that was powered by machine learning. And so a lot of it was down to how we were going to build and architect the technology and the affordances for really getting the power in the hands of the people who know the data best. But it's important, and I think this is often lost in Silicon Valley where the focus on innovation is all around technology to recognize that the business model also has to support democratization. So one of the first things we did coming in was to release a free version of the product. So Trifacta Wrangler that is now being used by over 4,500 companies, tens of thousands of users. And the power of that, in terms of getting people something of value that they could start using right away on spreadsheets and files and small data and allowing them to get value. But then also for us, the exchange is that we're actually getting a chance to curate at scale usage data across all of these. Is that a SaaS product? Is that a SaaS product? It's a hybrid product. So the data stays local. It never leaves their local laptop. But the metadata is hashed and put into the cloud. It has some instrumentation into that. Absolutely, and so now we can use that as training data that actually, as more people wrangle, the product itself gets smarter based on that. And so that's creating real tangible value for customers and for us is a source of very strategic advantage. And so we think that combination of the technology innovation but also making sure that we can get this in the hands of users and they can get going. And as their problem grows up to be bigger and more complicated, not just spreadsheets and files on the desktop, but something more complicated, then we're right there along with them for products that will then monetize. How about partnerships with Alation? How are they, what are the other deals you got going on there? So Alation has been a great partner for us for a while and we've really deepened the integration with the announcement today. We think that cataloging and data wrangling are very complementary and are natural fit. We've got customers like Munich Re, like eBay, as well as Market Share that are using both solutions in concert with one another. And so we really felt that it was natural to tighten that coupling and to help people go from inventorying what's going on in their data lakes and their clusters, to then cleansing, standardizing, essentially making it fit for purpose and ensuring that metadata can round trip back into the catalog. And so that's really been an extension of what we're doing also at the technical level with technologies like Cloudera Navigator, with Atlas, and with a project that Joe's involved with at Berkeley called Ground. So I don't know if you want to talk about that. Tell him about Ground. Sure, so part of our outlook on this, and this speaks to the kind of way that the landscape of the industry is shaping out, is that we're not going to see customers buying into sort of lock in on the key components of their infrastructure. So for example, storage, HDFS, this is open and that's key I think for all the players in the space that HDFS is not a product from a storage vendor, it's this open platform and you can change vendors along the way and you can roll your own and so on. So metadata to my mind is going to move in the same direction that the storage of metadata, the basic componentry that keeps the metadata, that's got to be open to give people the confidence that they're going to pour the basic descriptions of what's in their business and what their people are doing into a place that they know they can count on and it will be vendor neutral. So the catalog vendors are in my mind providing a functionality above that basic storage that relates to how do you search the catalog, what does the catalog do for you to suggest things, to suggest data sets that you should be looking at. So that's a value add on top but below that what we're seeing is we're seeing Horton and Cloud Era coming out with either products or open source in sort of the metadata space and what would be a shame is if the two vendors ended up kind of pointing guns inward and kind of killing the metadata storage. So one of the things that I got interested in as my dual role as a professor at Berkeley and also as the founder of a company in the space was we want to ensure that there's a free open vendor neutral metadata solution. So we began building out a project called Ground which is both a platform for metadata storage that can be sitting underneath catalog vendors and other metadata value adds. And it's also a platform for research much as we did with Spark previously at Berkeley. So Ground is a project in our new lab at Berkeley, the RISE lab which is the successor to the AMP lab that give off Spark. And Ground has now got collaborators from Cloud Era, from LinkedIn, Capital One is significantly invested in Ground and it's putting engineers behind it and contributors are coming also from some startups to build out an open source platform for metadata. How old has Ground been around? Ground's been around for about 12 months. It's very young. So brand new, how do people get involved? Just standard, similar to what the AMP lab was just jump in and code away. Yeah, it loads up on GitHub. There's Docker images to go download and play with. It's in alpha and we hold weekly meetings for committers and the usual open source deal. This is interesting. I like this idea because one of the things we've been riffing on theCUBE all the time is how do you make data addressable? Because ultimately, real time, you need to have access to data, really, really low latency to the inside to make it work. Hence the data swamp problem, right? So how do you guys see that? Because now I can just pop in, I can hear the objections, oh, security! You know, how do you guys see the protections? I'd love to help get my data in there and get something back and return in a community model. Security, is it the hashing? What's the, how do you get handle the security piece? And what are the issues? Yeah, so I mean the straightforward issues are the traditional issues of authorization and encryption and those are issues that are reasonably well plumbed out in the industry and you can go out and you can take the solutions from people like Cloudera or from Horton and those solutions will plug in quite nicely actually to a variety of platforms. And I feel like that level of enterprise security is understood. It's work for vendors to work with that technology so when we went out and we made sure we were curb-rised in all the right ways at trifacta to work with these vendors and that we integrated well with Navigator, we integrated well with Atlas. That was, you know, there's some labor there but it's understood. There's also- It's solvable basically. Solvable basically and pluggable. There are research questions there which you know on another day we can talk about for instance if you don't trust your cloud hosting service, what do you do? And that's like an open area that we're working on at Berkeley Intel SGX is a really interesting technology in that space. Probably a topic for another day. But you know I think it's important- We're certainly getting out of the studio in Palo Alto, we'd love to drill on that. Yeah, I think it's important though that when we talk about self-service the first question that comes up is I'm only going to let you self-serve as far as I can govern what's going on, right? And so I think those things really go- Restrictions, guardrails, I don't think you hear about handcuffs. Yeah, so, right. Because that's always the first thing that kind of comes out where people say okay, wait a minute, now is this, if I've now got, you know, you've got an increasing number of knowledge workers who think that is there, who believe it is they're an alienable right to have access to this data. Well that's the emphasis to democratization. That's the top down, you know, governance control point. So how do you balance that? And I think you can't solve for one side of that equation without the other, right? And that's really, really critical. Democratization is not anarchization, right? Yes, exactly. Yeah, but it's hard though, it really, I mean, and you look at all the big trends whether it was, you know, Web 1.0, Web 2.0 all had those democratization trends but they took six years to play out and I think it might be more accelerated with cloud to your point about this new stuff. Okay, George, go ahead and get in there. Well I wanted to ask about, you know, what we were talking about earlier and what customers are faced with which is, you know, a lot of choice and specialization because building something end to end and having it fully functional is really difficult. So what are the functional points where you start driving the guardrails in that IT cares about? And then what are the user experience points where you have critical mass so that the end users then draw other compliant tools in? You with me? Sort of the IT side and the user side and then which tools start pulling those standards? Well I mean, I would say at the highest level to me what's very interesting especially with what's happened in open source is that people have now gotten accustomed to the idea that like I don't have to go by big monolithic stacks where the innovation moves only as fast as the slowest product in the stack or the portfolio. I can grab onto things and I can download them today and be using them tomorrow. And that has I think changed the entire approach that companies like Trifactor are taking to how we build and release product to market, how we interoperate with partners like Elation and Waterline and how we integrate with the platform vendors like Cladera and MapR and Horton because we recognize that we are going to have to be maniacally focused on one piece of this puzzle and to go very, very deep but then play incredibly well both with all the rest of the ecosystem. And so I think that has really colored our entire product strategy and how we go to market. And I think customers, they want the flexibility to change their minds and the subscription model is all about that, right? You got to earn it every single year. So what's the future of data prep? Because that brings up a good point. We were kind of critical of Google and you mentioned you guys had saw some news you guys were involved with Google. Being enterprise ready is not just hey, we have the great tech and you buy from us, damn it, we're Google. I mean, you have to have sales people, you have to have automation mechanisms to create great products. Will the future of wrangling and data prep go into where does it end up? Because enterprises want, they want certain things, they're finicky on things, as you guys know. So how does the future of data prep deal with the, I don't want to say the slowness of the enterprise but they're more conservative, more SLA driven than they are, price performance. But they're also more fragmented than ever before and while that may not be a great thing for the customers, for a company that's all about harmonizing data, that's actually a phenomenal opportunity, right? Because we want to be the decision that customers make that guarantee that all their other decisions are changeable, right? And I go and I- They have legacy systems of record. This is the challenge, right? So I got the old Oracle monolithic. That's fine. And that's great. The more of the merrier, right? Does that impact you guys at all? How do you guys handle that situation? To me, to us, that is more fragmentation which creates more need for wrangling because that introduces more complexity. You guys do well in that environment. Absolutely. And that is only getting bigger, worse and more complicated and especially as people go from on-prem to cloud, as people start thinking about moving from just looking at transactions to interactions to now looking at behavior data in the IoT and sensor world. You welcome that environment. So we welcome that. In fact, that's where, you know, we went to solve this problem for Hadoop and Big Data first because we wanted to solve the problems at scale that were the most complicated. And over time, we can always move downstream to sort of more structured and smaller data. And that's kind of what's happened with our business. Awesome. Yeah. I guess I want to circle back to this issue of which part of this value chain of refining data is if I'm understanding you right, the data wrangling is the anchor. And once a company has made that choice, then all the other tool choices have to revolve around it. Is that a... Let's think about it this way. I mean, the bulk of the time when you talk to the analysts and also the bulk of the sort of labor costs in these things is in getting the data from its raw form into usage. That whole process of wrangling, which is not really just data prep. It's all the things you do all day long to kind of massage these data sets and get them from here to there and make them work. That space is where the labor cost is. That also means that space is where the value at is because that's where your people power, where your business context is really getting poured in to understand what do I have? What am I doing with it? And what do I want to get out of it? As we move from bottom line IT to top line value generation with data becomes all the more so, right? Because now it's not just a matter of getting the reports out every month. It's also what did that brilliant woman in sales due to that data set to get that much lift? I need to learn from her and do a similar thing. So that whole space is where the value is. What that means is that you don't want that space to be tied into a particular BI tool or a particular execution engine. So when we say that we want to make a decision in the middle that enables all the other decisions, what you really want to make sure is that that work process in there is not tightly bound to the rest of the stack. And so you want to particularly pick technologies in that space that will play nicely with different storage that play nicely with different execution environments. So today it's Hadoop, tomorrow it's Amazon, the next day it's Google, and they have different engines back there potentially. And you want to certainly make sure it plays with all the analytic and visualization tools. You want to decouple from all that. You want to decouple that. And you want to not lock yourself in because that's where the creativity is happening on the consumption side. And it's where the mess that you talked about is just growing on the production side. So data production is just getting more complicated. Data consumption is getting more interesting. That's actually a really cool, good point. So elaborating on that, does that mean that you have to open up interfaces, either at the UI layer or at the sort of data definition layer? Or does that just mean other companies have to do the work to tie in to the styles and structures that you have already written? In fact, it's sort of the opposite. We do the work to tie in to a lot of these other decisions in this infrastructure. We don't pretend for a minute that people are going to sort of pick a solution like Trifacta and then build their organization around it to your point. There's tons of legacy technology out there. There's all kinds of things. They got to change that immediately. Absolutely, so we, a big part of being the decoder ring for data for Trifacta and saying is like, listen, we are going to interoperate with your existing investments and we're going to make sure that you can always get at your data. You can always take it from whatever state it's in to whatever state you need to be and you can change your mind along the way. And that puts a lot of onus on us. And that's the reason why we have to be so focused on this space and not jump into visualization and analytics and not jump into storage and processing and not try to do the other things to the right and to the left, right? So final question, like you guys both take a stab at it. Just kind of pivot off what Joe was saying. Some of the most interesting things are happening in the data exploration kind of discovery area from creativity to insights to game-changing stuff. Ventures potentially. The problem of the complexity, that's conflict. So how does we resolve this? I mean, besides the trifecta solution which you guys are taming, creating a platform for that, what is, how do people and industry work together to solve that problem? What's the approach? So I think actually there's a couple sort of heartening trends on this front that make me pretty optimistic. One of these is that the incentive structures are enterprises we work with becoming quite aligned between IT and the line of business. It's no longer the case that the line of business are these annoying people that are distracting IT from their bottom line function. IT's bottom line function is being translated into a what's your value for the business question. And the answer for a savvy IT management person is, I will try to empower the people around me to be rabid fans. And I will also try to make sure that they do their own work, so I don't have to learn how to do it for them. And so that I think is happening. Or the IT guys, the line of business guys, a bunch of annoying guys that don't get what I need. So it works both ways. It does. And I see that that's improving sort of in the industry as the corporate missions around data change. So it's no longer that the IT guys really only need to take care of executives and everyone else doesn't matter. Their function really is to serve the business and I see that alignment. The other thing that I think is a huge opportunity and part of why we're excited to be so tightly coupled with Google and also have our stuff running at Amazon and at Microsoft is as people re-platform to the cloud, a lot of the legacy becomes shed or at least become deprecated. And so there is a re-platform happening. Or containerize or some sort of microservice. And so people are peeling off business function and as part of the cost savings to migrate it to the cloud, they're also simplifying. So and things will get complicated again. But there's an opportunity right now. There's a solution architect out there to kind of reboot their careers because the old way was, hey, you got networks, I got apps and stacks and so that gives the guys who could be the new heroes coming in and thinking differently about enabling that creativity. In the midst of all that, everything you said is true. IT is a messy place and it always will be and tools that can come in and help are absolutely going to be happy. I think this is obvious now. The tension's obviously eased a bit in the sense that it's clear line of sight that top line and bottom line are working together now on, mentioned that earlier. Okay, Adam, take a stab at it. It was hard to beat that one. I was just going to give an example, I think that illustrates that point. So one of our customers is Pepsi. And Pepsi came to us and they said, listen, we work with retailers all over the world. And the reality is that when they place orders with us, they often get it wrong. And sometimes they order too much and then they return it and spoils and that's bad for us. Or they order too little and they stock out and we miss revenue opportunities. So they said, we actually have to be better at demand planning and forecasting than the orders that are literally coming in the door. So how do we do that? Well, we're getting all of the customers to give us their point of sale data. We're combining that with geospatial data, with weather data, we're like looking at historical trends and industry averages, but you can see they're stitching together data across a whole variety of sources. And they said the best people to do this are actually the category managers and the people responsible for the brands because they literally live inside those businesses and they understand it. And so what happened was the IT organization was saying, like, listen, we don't want to be the people doing the janitorial work on the data. We're going to give that work over to people who understand it and they're going to be more productive and get to better outcomes with that information and that frees us up to go find new and interesting sources. And I think that collaborative model that you're starting to see emerge where they can now be the data heroes in a different way by not being the ones being the bottleneck on provisioning but rather can go out and figure out how do we share the best stuff across the organization? How do we find new sources of information to bring in that people can leverage to make better decisions? That's an incredibly powerful place to be. And I think that that model is really what's going to be driving a lot of the thinking at Trifacta and in the industry over the next couple of years. Great, Adam Wilson, CEO of Trifacta, Joe Halosin, Chief Strategy Officer at Trifacta, also a professor at Berkeley. Great story, getting the UX right is hard but under the hood stuff's complicated and again congratulations about sharing the ground project, ground and open source, open source lab kind of thing in Berkeley, exciting new stuff. Thanks so much for coming on theCUBE. Appreciate it, great conversation. I'm John Furrier, George Gilbert. You're watching theCUBE here at Big Data SV in conjunction with Strata Hadoop. Thanks for watching. Great, thanks guys.