 Live from San Jose, California, in the heart of Silicon Valley, it's theCUBE. Covering Hadoop Summit 2016, brought to you by Hortonworks. Now, here are your hosts, John Furrier and George Gilbert. Okay, welcome back everyone, we're live in Silicon Valley in San Jose for Hadoop Summit 2016. This is SiliconANGLE Media's theCUBE. It's our flagship program. We go out to the events, extract the signal from noise. It's our third day of three days of wall-to-wall coverage. I'm John Furrier, my co-host, George Gilbert. Our next guest is famous CUBE alumni who launched this company three years ago in theCUBE this October. Prakash Nanduri, co-founder and CEO of Pexata. Welcome back to theCUBE. Great to see you. Great to see you, great to be back. That's the end of the quarter, so always want to know how's business. You launched your company on theCUBE. It was really a proud moment for me because you were the first coming to launch on theCUBE. We had Bill Schmarzel launch his book on theCUBE that same time, Big Data, the dean of Big Data. So it was really fun. I mean, but at that time, you don't know what the future's going to look like. How are you guys doing almost three years later? How'd the quarter go? How's business? So you know, today, today, I was telling you just before we got on, today's a very momentous occasion. It's so nice to be back where when we launched on theCUBE close to two and a half years ago and we launched the concept of self-service data preparation, as you recall. And here we are now, two and a half years later, self-service data preparation and the data preparation space in general has just mushroomed. It is now one of the most important aspects of driving towards information-driven decisions. And it has become really one of the most critical pieces of every analytical exercise. We are very proud that as a company of firsts, we launched first on theCUBE. We brought the concept of self-service data preparation to the market. And here we are finishing the first half of 2016 with stellar performance. We're very proud now that we don't have to explain to people what data preparation is. We are the leader in enterprise-grade data preparation, having completed a phenomenal first half, having earned the trust and respect of some of the biggest and largest brands in the world, both in the financial services world, in the retail and CPG world, in the high-tech industry. And we are so proud that the biggest brands have selected Paxata and continue to grow with Paxata as their only enterprise-grade data preparation platform. Well, congratulations, first of all, it's amazing. One, two, as a co-founder and entrepreneur, it's always hard because you always want to enter the market and get a nice little narrow beachhead and then it's sequenced from there. You made a good choice, but data prep is changing, right? You're getting, and some say ETL, all this data prep is going to be invisible because the acceleration of data coming in needs to move into these data lakes or whatnot. But there's a real interesting conflict or balance that needs to be struck in this market. I want to get your thoughts on this. And that is that in the data space, having an observational data space is very important, which implies sharing a lot of data. At the same time, there's a governance piece, right? So I need to have access to data, sharing, and governing. How are you guys addressing that dynamic with Paxata? Yes, you're absolutely right that many people have different understandings of data preparation, but enterprise-grade data preparation first begins with having a comprehensive set of capabilities. Those comprehensive sets are being able to do data integration at a quality, governance, enrichment, collaboration, and reuse, all in a single platform and architected on the modern architecture. And then you need to be able to address the needs where data preparation is something that can be done by the mere mortals, the business analysts of us. I like to say that there's a data scientist in all of us. There may be 200,000 data scientists, but there are 200 million business analysts. And all these people need to go from data to information that's ready for analytics. So what is Paxata doing? Paxata is going and helping the largest brands in the world, whether they're in financial services or high tech, by deploying an enterprise-grade platform that consumes both Hadoop and non-Hadoop data, consumes both structured and semi-structured data, is able to go and in interactively allow a business analyst who doesn't know SQL, who doesn't know PIG, who doesn't know Perl, to be able to go from data to information in minutes and not months with clicks and not code. And that's really what we are doing to change the game. The last piece, which is really important, that our customers all have multiple analytical use cases, and as a result, multiple analytical tools. So they use Tableau, they use Palantir, they use Click, they use proprietary analytical models, and that's fine, because that's what their business dictates. We're talking about large businesses with multiple functions, multiple divisions, and what we need to do is to say how can we serve the data preparation needs of all these use cases without being curtailed or pigeonholed as either a tiny desktop product or a small piece of a larger analytical solution. You need to have an independent platform that serves every analytical need and every business function in a large enterprise. So Prakash, bump us up a level and help us understand the sequence of work that has to get done when you bring data into the data lake, who makes it navigable, and then when do you apply analytics? When does that become operationalized? But most important, when you were talking about the governance and lineage, if you're using multiple tools, that must be very hard to maintain the integrity of all that. How does it hold together? So you bet, so it first has to start with what are the business outcomes we're looking for. Right, I was speaking to my good friend, Mike Olson, Chief Strategy Officer of Cloudera recently, and he and I share the exact same vision that at the end of the day, it's not about pig or yarn or this, these are all capabilities that are absolutely important, but at the end of the day, the business needs outcomes. The business needs to pass their stress test or they need to know what product needs to be built for what consumer, and these type of outcomes are what our businesses are looking for. In order to get to the right decisions, they need to be able to get the right information. There's a big difference between data and information. So when you see raw data being deployed into your HDFS or into your any other data store in your enterprise, you need to be able to pull that data. You need to be able to first of all, discover what it is. You need to be able to clean it. You need to shape it. You need to enrich it. And then you need to be able to get the confidence that this information that you've derived out of the data is ready for analytics. And if it takes you 90% of your time just to do that, then you don't have time to get to your analytics. And therefore you end up having garbage in, garbage out, or too little, too late. So what we are doing is we are completely automating the whole notion of going from data to information by leveraging the power of machine learning algorithms, the distributed computing methods, the ability to have interactive workloads and a consumer experience. And then to be able to have a completely independent information creation platform or a data preparation platform, which then can export data to every analytical use case. And because we are sitting on top of your data lake, or your data infrastructure, we're able to get you to be on the same page, regardless of you using multiple analytical tools. That's the power of the Paxata platform. What's the secret sauce of Paxata? The secret sauce is in three fundamental aspects. Number one, it is the eight patent pending proprietary algorithmic and distribution computing techniques that we have leveraged and we have built on a highly parallelized spark-based platform. Second, it is the unbelievably simple consumer experience we're delivering to our business consumers. The difference between the Paxata experience and any of the other solutions you see in the market is that Paxata needs the least technical qualification. Our consumers don't need to know what SQL is. They don't know what an inner join or an outer join is. They just are able to work with their data and be able to go from data to interaction. The third secret sauce is our ability to work with highly, highly large and massive amounts of data interactively. So why is this important? When you're focused on a use case like anti-money laundering, saying that you're going to try to pull samples of data to clean it doesn't cut it. You need to be able to use the entire workload. If you're finding a bad guy and you're working in national security, you need to work with full data sets, not samples. That's where the Paxata platform power comes in. Prakash, how does companies deal with you guys? How do they engage with you? How do you guys price this out? Do I need to just give you all my data? So let's just say that I have a big data pile that I'm, or data lake, or however I store it on an object store. Do I pay you per ingest? Do I pay you for the license? How does the business model work? So the reason why we're the number one enterprise grade data preparation platform in the industry today, and that's because of our customers, is because we've made multiple things easy for them. First of all, delivery model. Paxata is the only platform in the industry that works as a multi-tenant public cloud offering and the same exact product is deployed behind the firewall for the largest institutions. Third, the only platform that works in a hybrid environment that is both on-premise and in the cloud. We've got customers working with that today. So it's very easy to deploy. Second, our pricing is purely subscription-based and it is based on very similar to how our partners like Portnwerks and Clareira and others price their products. So it's very much on a subscription, on per core basis subscriptions. We do not charge for user-based pricing because we have as many consumers as possible leveraging Paxata platform. So you don't need to be curtailed. We say cores, you mean like machines? CPU cores. Okay, got it. CPU cores and no base pricing is what we have. It's a very simple- Clouds, not like Amazon or anything like that. We run on top of Amazon, but we are, again, the only platform that allows our customers to choose. We can run on any three of the major platforms, whether it's Amazon, whether it's Azure, or whether it's Google. So I'm showing my stuff in S3. How do I work with you? Fantastic, go to town. Paxata runs on, we have over 30 customers running on a multi-tenant cloud in AWS today. And you can easily go and pull your data directly from S3, work it in Paxata. You get provisioned in less than 30 seconds, and you're off to the basis. And so you charge me on EC2 instances? We have a special cloud-based subscription, which is also very much centered around how much compute power you use. Okay, so it's not going to gouge someone on Amazon, because you could have some in many instances. We have the most flexible pricing model because not only are we a subscription-based pricing, but we are a very much scalable pricing model. And we are the only platform or company in the world today doing data preparation that allows you to do burst compute-based pricing. So you guys are basically the more that you get used, the more pricing you charge. It's a nice, back-loaded, back-end reward for you guys. Absolutely. And that puts pressure on you to have a great product. We have a fantastic product, and because we have architected it right, and because we have the only elastic cloud architecture in the industry around data preparation, we have operating costs that are manageable, and we can share some of those savings with our customers. Okay, Prakash, thanks so much for coming on theCUBE. Great to see you. Wonderful. And we might see you at Data Week, which is going on in the fall, with conjunction with our event, Big Data NYC, as well as Strata Hadoub. theCUBE will be in New York City this fall. We hope to see you there. You bet. I always love talking to you guys. Take care. Paxata, doing well here in theCUBE. Launching theCUBE, great operating model, very flexible, value-driven. Thanks for sharing. I'm John Furrier with George Gilbert. We'll be back with more live coverage after this short break. Thank you. Thanks.