 Okay, welcome back. We're here live in New York City. This is Silicon Angle and Wikibon's theCUBE, our flagship program. We go out to the events, extract the ceiling from the noise, and sometimes we have our own events. And our event is Big Data NYC, hashtag Big Data NYC, in conjunction around Big Data Week Stratoconference, Hadoop World. I'm John Furrier, the founder. I'm John, my co-host, Dave Vellante. And our next guest is Prakash Nanduri, CEO and co-founder of Paxata. Congratulations. Welcome on theCUBE, first time on theCUBE. Thank you. First time on TV, web TV period, for your exclusive announcement here on the side of theCUBE. This is our first, well, actually our second, we had Trasada launched on theCUBE, our second company launching on theCUBE. Tell us what your company's doing. Launch your company, go. Thank you very much. Very excited that today Paxata is launching the industry's first adaptive data preparation platform. We are launching with live customers. We're also going to be announcing three very important and strategic partnerships with Tableau, ClickView, and CloudEra. And last but not least, to fuel our growth, we are thrilled that we've closed our next round of financing, led by Axel Partners. And we are- Ping Li. Dinesh Katyaar, Samir Gandhi, and the rest of the team at Axel. They are big supporters. They've been supporters right from the beginning. My interview about the Axel Stanford event. Yes. Samir. Yes. They led our series A, and now they are leading our series B. How much in funding do you guys take in so far? So far, totally 10 million. On two rounds. On two rounds. Okay. And we just closed our series B with eight million in financing. So you did a lot with a couple million? Yes, we did. We are thrilled that we've been extremely capital efficient. In fact, since we began in early 2012, we've gone from having a prototype to having a full live product on our multi-tenant cloud. We have live customers, several live customers who we are announcing, including the Pat's Brewing Company, Dan and Box, and a couple of other very large customers. So we're thrilled about that. And we promised ourselves when we started this journey that we would not launch just because we have been funded or just because we have founded a team. We wanted to have our customers and partners launch us. And today I'm very proud that we have achieved that milestone where we're launching with some of the leading brands gaining value from our product. And then we're also in this journey together with very strong partners such as Tableau, ClickView, and CloudEra. Why did you start the company? It's a great question. So my personal journey has been, my entire professional career have been in information management, data management. I was one of the pioneers of the master data management space when I co-founded a company called VeloCell, which is now part of TIPCO. And it's TIPCO's MDM platform. After that, I spent a significant amount of years both at TIPCO and SAP and was very, very much involved in the whole BI analytics strategy. And as I was going through this journey, I clearly identified this white space, this bridge between, or this missing link, frankly, between the visualization and analytical tools on top such as Tableau and ClickView, and then the core data management next generation, no-SQL data management platforms such as Hadoop and SAP HANA at the bottom. What was missing was the middle link around data preparation, which centers on the needs of a business analyst who spends way too much time in taking all these raw data sets that they're getting from Hadoop and other sources and putting them together in data that is ready for analytics. And what you find is in an enterprise, there are a number of analytics tools because there are multiple analytical needs. You have multiple data sources, whether they are within the enterprise or outside the enterprise. And the biggest challenge in an analytics exercise is for a business analyst to very rapidly prepare the data that is to combine, to enrich, to merge, and to clean the data sets and get it ready so that they can visualize or analyze that data in any tool of their choice. And that's where we thought that it's fantastic to have this piece. We thought that that was the white space and we came together in early 2012, and here we are today. Talk about the, well, two questions. One is, what's more disruptive right now? The BI side of the business, business intelligence or the data warehouse transformation? Those markets. And then two, tell us about your go-to-market technology. Was it the cloud, multi-dimensional cloud? So first, BI or DW, data warehouse of business intelligence. Which one is going to be the driver? Which one, if you had to weight it, like is business intelligence more important? I think you can't say one or the other. It's a yin and yang. So what has happened, if you look at the last five to 10 years, tremendous amount of innovation on the BI tool side. The big difference there has been self-service BI where the business user has been able to do analytics on their own. Products such as Tableau and Click if you have really driven that. And that continues. If you see their growth rates, you see that. That growth in self-service BI is really happening. At the same time, when you look at the data warehouse market, you see that it is impossible to maintain the cost economics of the large big data sets that companies require. And therefore, there's a need for new technologies. And just today, I was with the folks at Cloudera and it was absolutely true that as the enterprise architecture is changing and there is a need for both structured and unstructured data, the traditional data warehouse technology cannot handle the economics of what the current day analytics needs are. Therefore, I believe that it is a yin and yang. Both the warehouse market is evolving and the analytics market is evolving and which is pulling which, frankly, it always starts with the business pulling the need in the enterprise. So the business is wanting more data for analytics and decision making, which is driving the entire stack underneath. And... So help us understand a little bit more about what you do and sort of how you do it. So you're talking about this middle layer between the visualization whether it's a key value store or an in-memory database. And so it sounds like you've figured out a way to automate the merging, cleansing, preparation. So preparation, data preparation means it's ready to be analyzed. Is that right? Yes, absolutely. Okay, so in my right, you've automated that process. That is right. So the innovation that Paxata brings to the market is that for the very first time in the industry, a business analyst and not a very deep technical person, a business analyst can very rapidly merge structured unstructured data, whether that is personal, proprietary, premium or public data sets, whether they are within the enterprise or outside the enterprise. They can bring those together, they can merge them, clean them and get them ready for analytics. Now what they do is they are more interested in putting the data together and looking at what analytics problems they want to solve. And the Paxata system using our intelligent algorithms and very powerful distributed computing technologies in conjunction with technologies such as Hadoop does all the heavy lifting of automating that particular process. You're absolutely right. Okay. So you're talking about the cloud. You didn't get the cloud multi-tenant thing in there. So what is the product, the cloud, is it cloud-based? The product is consumed via the cloud, absolutely. So Paxata offers our solution via our multi-tenant cloud. And a business analyst, all he or she has to do is to sign up in our account, be able to load data right away, merge it, do their work and then be able to either automate the export to a tool like Tableau or ClickView or to basically extract a CSV file and then load it into even an Excel spreadsheet. So talk about the secret sauce a little bit. What's the tech behind what you guys have done? The technology behind it, which our co-founder Dave Brewster actually came up with, is centered around a set of proprietary algorithms that detect relationships across multiple data sets whether they are structured or unstructured. And after detecting the relationships, there is a smart way using probabilistic techniques to figure out what is the best join for the data sets. We're using text analytics and semantic technologies in order to automatically find the relationships across different data sets, varied data sets, but then not just to find the relationships but actually to merge those data sets together. And most important, semantically type the data so that you can now intelligently enrich the data based on the meaning and not the metadata or the model, but the meaning of the data drives how you enrich it, how you clean it, how you merge it. So you're inferring context. Yes, absolutely. Using math, essentially. Math, statistics, and graph theory, yes. So latent semantic indexing and what do you like? You got it, you got it. So we are using latent semantic indexing techniques, statistical cluster graphing, and pattern recognition all together in a distributed computer environment. Now why, now this technology's been around for a while, but it's not been commercialized in a cost effective way. What's been the catalyst for you guys to be able to bring that forward? That's a very good question. So what has happened is there's been a number of critical catalysts that have come together. So the first thing, if you look at the lowest level, distributed computing and the advancement of technologies such as Hadoop have really made it very, very much more economical to be able to manage and process large amounts of data. That's a really important piece. The second thing that is very critical is in-memory technologies and being able to have in-memory technologies. The third is the use of algorithms which were able to be developed in such a way that they could be executed in parallel across a large scale at an effective cost, which was impossible in the past. And the last but not least is the next generation visual technologies which allows us to deliver a solution which is very, very simple for the business analyst to use. It's a perfect storm for you guys. And I was gonna say, you've seen examples of this before, but very, very small scale. You couldn't economically scale in the past. This is, I mean, obviously you've studied it, but what are your thoughts on the TAM? I wonder if you could share that with us. I would think this is an enormous opportunity. So this is a very, very large TAM. We estimate our addressable market to be anywhere between $13.5 and $16 billion in the next three years, ending December 2015. We look at this primarily as being a subset of the larger addressable market that enterprise analytics as a whole has. And we are focusing on the data preparation piece of that. And if we look at that, the segmentation goes across three levels. It goes across the ad hoc analytics use cases. It also goes to analytical applications in the cloud, such as Anna Plan and others. And last but not least, the next generation data warehouses. You know, I love, this is, I can't speak to the tech, but I'm dying to learn more and bring in some of our guys who can go deeper. You were attacking the labor component of the marketplace, which today, nobody gets revenues for it, companies just spend, and they would much rather spend money with you to save time than having to spend their own time to get to that point. But the labor is absolutely one piece, but the other piece that is there is, if you look at any analytics exercise, 80% of the time is spent in preparing the data, and only 20% of the time is spent in analyzing data. That's what I've seen across my enterprise. Yeah, isn't that 80% of time on labor, like non-differentiated heavy lifting? That is true. But it's also one of the things that really diminishes the value of analytics and getting to the right ideas. Which is why people throw up their hands there, forget it. Yes, so all of us in business want to go around and make data-driven decisions, but we can't because the data preparation is the biggest hole, and now what you do is we're wanting to flip the equation and have people spend 80% of time in actually doing analysis and 20% in preparation. So the productivity gain is significantly higher than just the labor costs, right? And that's why we believe that this is a game changer. And that's true, the productivity gain is enormous. It's not just, but with a lot of markets, it's like, okay, we're going to essentially replace sort of an old way of doing things. Linux replaces Unix, okay? Even though everybody says, oh, Hadoop, it's incremental, but we know Hadoop is going to ultimately eat away at some of the data platforms. But this is really attacking spending and it's not, like I say, going to any vendor today. It's just really driving value from the customer side. So I love that plan. And that's what our customers are delighted about and that's what we've seen the tremendous traction from the customer side and the support from the customer side. Because when we see companies like perhaps Brewing Company where the CFO there has questions like which product is being sold by which distributor at what time in what retailer, and they want that answer fast. And their big problem is they're getting data from Nielsen, from their distributors, from their own backend systems, from their manufacturers, and they have to put this all together. And they spend months and days doing this. It's a daunting task. So Ben Haynes is a customer of the box now. He's coming on. He's coming on tomorrow or Wednesday. Yes, Ben Haynes is a wonderful supporter and he has been truly a visionary. He's a Cube fan of ours. We've been doing crowd chats with him. He's been great. He's awesome, yeah. Excited to have him. He's okay, good. So we'll ask him about what he's doing with Paxada. Absolutely, you should. He's been a great supporter and we're delighted to have him as a customer. Awesome. Okay, so we are here breaking news, launching companies, analyzing and dissecting the Strata conference, Hadoop World Trends, big data week, New York City. This is Silicon Angles theCUBE with Silicon Angle and Wikibon live on the ground for two more days. Today, we'll have live coverage every day. I'll be right back for a wrap up from day one from Dave, myself and Dave Vellante. I'm John Furrier, Dave Vellante. We'll be right back with our wrap up after this short break.