 Hey, we're back live at Strata. I'm John Furrier, founder of SiliconANGLE.com, and I'm here with my co-host. I'm Dave Vellante, wikibon.org, and we're getting close to the end, but we got a lot of action left. We're here with Paul Dosha, the CEO of Lucid Imagination, welcome. Thank you. Welcome to theCUBE. We're going rapid fire. Tim O'Reilly was on, and it was worth it, but it squeezed our time a little bit. So you guys are on top of Apache, right? Talk about a little bit about your product, but I'd like to just ask, put some context to it. Dave and I cover the information governance market, which is the big HP, there's a lot of conversations, autonomy is bought by HP. Search is a big part of all this, real-time analytics. So tell us about what you're doing right now real quick, and then we'll go and jump into some questions. Okay, so as you mentioned, Lucid Imagination is a commercial open source technology on top of Lucid Solar, which is the most broadly deployed enterprise search technology, I would suggest in the world, but with thousands and thousands of customers and thousands of implementations, the business model for Lucid is to sell a commercial set of modules, if you will, on top of Lucid Solar, it makes it more enterprise-ready. So we deal with issues around security, improvements, ranking and relevancy, configuration to make the applications easier to manage. And we just recently launched, it's called LucidWorks Platform, we just most recently launched that into cloud. So now you can actually hook our LucidWorks technology and have full enterprise search capability into your cloud-based application. So what's the business model on your end? Because obviously this is an open source product, so obviously with open source is always the pure open source, are you going to go like cloud era, are you do consulting around it, services, how is it, how are you guys going to? The primary revenue driver is recurring license on the commercial product. Got it, okay. And we do have services and support, we do have a full support for Solar, so if somebody wants to implement Solar, of the 20 primary committee members from Apache, we employ seven of them. So we have a wealth of expertise inside the company of Lucene Solar, guys that actually have developed the product and are primary committers to the product. Apache has just been such a successful community over the years, going way back in the days when I was tinkering around with web servers, back on the web, hit the scene, but now it's so great. I'll be met up from Trasada, was talking about you guys about, he thinks that the killer app is text-based search within the massive amounts of analytic data, so what's your comment on that? So he mentioned you guys Lucene, so I'm very pro you guys, and he's obviously doing a really cool financial application, Trasada is doing that, so what's your perspective on his comment and could you elaborate why is that a killer app and what's happening? So the reason why, you're right, so first of all, they're all Apache, you take a dupe and you put Lucene Solar on top of it and it's all inside the Apache family, which is great, but more importantly, what Lucene Solar provides is the ability to scale to the size of a dupe, so you don't need any intermediary data sources, you don't need ETL information out of a dupe to get it into an intermediate data store in order to run traditional BI technology on top of it, you can actually just run a map-reduced job, support or sponsor an index inside a Lucene Solar and then do real-time analytics right off of that index, so it scales to the size of a dupe, it's as adaptable and flexible as a dupe is, and it deals with unstructured content, because again, the paradigm that people come into these applications with is, I'm dealing with structured text or structured information numbers, whereas as Gardner says, 80% or so of the information that's inside the enterprise is unstructured, and so now what we see is we see an evolution of enterprise search where it's starting to overlap with business intelligence and starting to provide analytical applications or a platform to develop those applications where you can actually integrate structured and unstructured content within the same user interface. Can you give us an example of a use case where you can, because that was really cool, just give us an example of that application. Absolutely, so you look at a traditional customer support application, right? You call up a customer support person and in today's enterprises, they traditionally have to open up somewhere between six and 12 applications in order to understand the context of your relationship, like you go to Comcast or something, right? You could have voice, you could have internet, you could have cable TV, et cetera, et cetera, right? Well, when you take enterprise search technology and the ability to integrate content across the broad spectrum of the different applications that would identify a customer relationship, you can actually now take all of that and integrate it into one index, and then you build an application on top of it that actually displays all the information from a portal, and now you can do natural language query. So you can search based on the customer name, their address, their number, whatever it is that they want, and you can not only see the full and total relationship that you have, but any, you know, peculiarities or if there's an issue with the local router or whatever. So what I think is eventually going to have, does that answer your question in terms of customer app? And I think what's going to happen where this is going is there's so much volume of information now that's coming into the enterprise and into users where you see that most of the traditional business intelligence applications are all based on a pull model. People have to know what reports they want or know what information they want. I think the volume of information is getting so big that eventually this paradigm is going to shift and you're now going to start to see more of a push model. And enterprise search technology provides that opportunity because you integrate enterprise search with machine learning capabilities like Mahout and now you're able to build recommendation engines and triggers where you can actually start to push information to people based on their role in the company, et cetera. And the whole mindset is that you talked about, you know, unstructured data being the dominant, you know, characteristic of data today, but the whole mindset has changed. You know, I remember the mid 2000s, the federal was a civil procedure and email archiving and using things like search to help with e-discovery and things like that. And now it's, the example you gave is customer service, more value oriented, but is there a way in which you can help solve some of those traditional problems as well? You see those use cases like e-discovery or other sort of legal compliance. If you look under most e-discovery applications, the foundation is search. Okay, so what you do is you take a search capability and we actually power a lot of the e-discovery ones because at the end of the day what you're doing is you're allowing an attorney, right, or somebody to be able to query a corpus of information whether it's legal documents or emails or whatever. And they now have to raise all the ones that are relevant to the particular case. The application on top of it deals with aging, deals with accessibility, deals with sort of locking it up so that you can then present it in a court of law with all the restrictions that are required. But fundamentally it all starts with the search application. So I have a question about that because discovery is volume driven, right? In other words, the more data I have, the more I got to pay my lawyers. And search is kind of this blunt instrument that we've been using. Right. Lawyers love it. Right. The general counsel of Pfizer might not love it so much. So is there any developments in terms of classification to be able to extract more value, maybe reduce the size of that corpus of data in a way that I don't have to troll through so much or is it just a function of technologies and scale out are so fast now and hardware is so cheap that that doesn't matter? So I'm going to come back, I'm sort of going to come around that question. Most people are familiar with the search is what you do on the internet. Which right, you type into Google and then you get this one in 10 million pages and nobody gets past page number two. Right. What happens inside the enterprise and what traditional enterprise search technologies do, as part of that results it, they actually provide facets or categories that give you smart navigation tools so that you can actually start to focus very quickly on exactly that information that you want. And so in an e-discovery application you are able to pivot on the information from multiple different directions to actually get to exactly what you want and do it in a relatively short period of time. Where the legal support comes in is to actually then go through each one of those documents to make sure that in fact it meets the criteria that has been suggested. So the technology allows you to focus very quickly and then from that point forward, of course, yeah, you're paying by the hour. Let me give you an example. We just developed a prototype of a big data instance for an intelligence agency in the Southeast Asian government on top of our partner, Cloudera. And what we were able to do is we were able to index and this is basically a lot of email traffic and then some other sort of unstructured content. We were able to index 12 billion documents and we're able to ingest 65 million documents a minute. And now we're able to then turn that over to the analysts so they can do natural language queries on that information and they can get sub-second response time across that entire big data corpus and now start to be able to connect and intersect all that intelligence information so they can start to start to focus on where the bad guys are. That's fantastic and I think really excited by the whole open-source explosion around the developers and I think Apache's got a nice momentum around some of their projects. Obviously Hadoop has been fantastic for them, among other things. My question is to ask you what you're seeing in terms of applications. Obviously Dave and I reviewed kind of early on. We kicked off Strata, you know, 2010 was what is Hadoop? Okay, hey, what's Hadoop? 2011 was hey, this big data stuff is a real business. It's very cool. 2012 is platform stability and application explosion, emergence of applications, variety. You know, lifestyle applications to big venture backed applications. And then 2013's where it starts raining money. For the startups and for the companies and for the people that deploy big data and techniques to drive their business or change their business. So my question is, what are you seeing relative to your piece, which is search, a big part of that, what kinds of applications are you seeing emerge? So independent of some of the use cases, what are the key things you're seeing? That's a trend, that's real, that's going to be a growth area. These are the kinds of apps we see demand for. So if you look across all the different verticals, so in oil and gas, there's a tremendous amount of information, especially with fracturing and how that's creating a different dimension as to how the oil and gas are collecting data around their pipelines, huge application, enormous amount of information. Any kind of log capture information, whether it's your security logs, whether it's logs from your internal IT operations, whether it's, like I said, logs from oil and well, I mean oil and gas drills. In the healthcare space, patient records, patient information, tracking of medical insurance claims and things like that, it's a volumous amount of data that gets collected. So in each individual vertical you start to see where what used to be sort of out of the scope of reality or the data warehouse was so huge that it was really impractical. So now you're seeing this opportunity where they can start to actually aggregate this information and then start to build applications on top of it that produce real information and real business value. And what kind of applications is that encoded in? Is it, we talked to revolution analysts around R and around that area, we're seeing. We'll see, what we suggest is that there's actually going to require an integration and this is one of the things that we're sort of converging on where you take search and you can maybe take R and you take Mahoud and you start to integrate these different Apache capabilities and now we're able to build an application framework that allows you to solve a number of different use cases because you can combine search with analytics with natural language processing, with machine language, I mean, machine knowledgeable recommendation engines. So it's the capability and the opportunity to actually use the open source technologies and now you're able to develop application, sophisticated application platforms with APIs that provide a wide range, solve a wide range of use cases and you can do everything from dashboards to just straight lists. It depends on what the interface needs to be. So what's the most exciting thing that's getting you jazzed up right now in terms of obviously within your business, but outside of your business, on top of your business and around you? What trends is keeping you going, wow, this is so amazing and it could be something that come out of the woodwork is there new surprises? What are the surprises and what are the things that you expected and what are you seeing and saying, hey, that is the coolest thing we're seeing? So the coolest thing that we're seeing, so just a little bit about my background, I was one of the principal founders of Jaspersoft back in early 2000, right? And so open source was sort of a new kind of, that was the cool thing prior to the cloud and whatnot. So now coming back in as the CEO of Lucent Imagination, the evolution of open source and how viral it is and how aware it is. And I think one of the analysts did a survey recently where 75% of the global 2000 use open source in some way, shape or form and Lucene Solar are in the top five of the ones that they use to build applications. What are the other 25% of them? Yeah, I wasn't listening to those. So to be in the top five and have that kind of reach and that's the thing where you see such a huge adoption and with budgets being constricted and IT departments putting more and more pressure on it, just the opportunity to continue to use open source I think is extremely exciting and it's becoming even more, there was so much concern about the licensing and all that stuff is all gone now. It was only about what happened. Talk about the entrepreneurship side of it, okay? Because you mentioned startup and that's really cool. Obviously, now it's a different world now than it was 2000, even then, right? So the data entrepreneur out there, what's your advice to them? I mean, obviously you've been through the old way and now the kind of the modern way and this is going to be an ultra-modern way around the corner, a whole new generation of developers and entrepreneurs are coming out. What's your advice for them? Learn statistics. Math, go back and study stats. Go back and study math and study statistics and study. I mean, I heard a little bit of what Tim was talking about. It's the understanding of how to develop interesting algorithms to solve business problems and then your ability to test those. I think is going to be where it's going to be an absolutely revolutionary way. Now that you can amass so much data in one place and actually capture it into a data source that allows you to manipulate it, it's going to be how do you intelligently then manage that and then how do you actually produce real results from it? And I guess, I don't know, but that would be my guess if I knew statistics. Yeah, well, and with all that data, the algorithms can maybe be a little simpler than they have been historically, but. Probably so. So I had a question for you. What did you think of the Indeka deal? The Deca Oracle? Fallen been ringing off the hook since then? No, no, no, no, no, no, no, no. I love that kind of stuff. Oh, of course you would, right? Not only does it set the comps in the market. I mean, that's fantastic, but it creates this huge vacuum where still as an independent company with the kind of user population that we have is just, it's phenomenal. That's like a new chapter now. It really is, we're now, and I think what it does is sort of clear the way with the old enterprise search paradigm. And now, as I said, enterprise search is evolving and sort of starting to overlap more with the business intelligence space. And I just see more and more opportunity for us to expand our reach in the enterprise and to really provide some really value in mission critical applications across all the different verticals. So I think it was fantastic. I mean, you look at autonomy is the same way. Indeka was a great one. Microsoft, when they bought Fast, I mean, the multiples on that are fantastic. So I'm trying to drive our company as hard and as fast as I can. Where are you guys located? We're located right here in Redwood City. Okay, great. We'll have to come by and see you there because we need your advice on our data project. We have a huge H base back annual app here that we built. I got a company that can help you with that. And we need to get some search and we got to work on some coding and we need some advice. So we might come by and knock on your door and chat further. It'd be a pleasure. Paul, thank you for your time. Paul, great meeting you. Nice to meet you. Thank you very much. Real-time search is a killer app right here. Apache has it all. You guys are doing great lucid imagination, great business model, open source. Again, open source is really on this third generation, Dave. I mean, we were first generation and you had the second generation, now a whole new generation of open source. It's really exciting. Virtualization, open source, Flash, big data. These are the enables, Cloud Mobile and Social. We'll be right back with our next guest. Watch these.