 Another guest, Sid Probsnien, who is the CTO of Ativio. Sid, a Cube alum, good to see you. It's good to be here. Great welcome back to theCUBE. What a show, huh? I'll say yeah, we were here last year, it was my first time here, it was kind of boutique and now it's getting a bigger feel to it. 1400, 1500 people, line out the back. Oracle's here. You know when they're running out of lunches. I won't hold that against them. You know you've made it when Oracle makes it to your show, and Ativio is here. You know, your first, at least sponsor, sponsoring of this event. So what's happening with Ativio and specifically in a Hadoop context? What's going on there? Well, the challenge of getting information into applications into existing infrastructure like business intelligence, quite honestly, has never been higher. And everyone is, you know, totally awash in data. All the stuff we talk about, you know, during the keynotes, click data, sensor data, machine-generated data, log data, and a lot of, you know, user-generated content that isn't like that. It's, you know, web pages and message boards and people want to get this stuff under control. They want to bring it together. They want to do it quickly. They want to correlate it, you know, as cheaply as possible. They don't want to get into hand-massaging the data. And so one of the essential elements in the sort of what I'll call your data stack 2.0 or whatever you want to take, the big data world, is using Hadoop in the right way to take massive volumes of data, which may or may not be of interest and reduce that to a summary set of data, that you then load into the traditional applications, business intelligence tools, reporting tools that you use now. And getting that, like, that's one of the essential elements is adding that volume storage, right? So saying, okay, so I have this enormous amount of click data, and if I want to understand customer behavior, I need to, you know, boil it down to a set of, you know, kind of outcomes that I can measure and track and interpret. And so Hadoop is doing that. And it's, you know, the ecosystem, what you can see about the ecosystem is it's a look, it's a raw, emergent technology, right? Where maybe not all the pieces, numerous references during the keynotes, for example, to instability, not having this piece wired up, not having that piece wired up. So it's emerged, it's definitely around the solution, you know, a piece of the solution that people perceive. And so now you see this tremendous way of a company saying, hey, we've got a better way of connecting these together. We've got the integration layer. We've got the thing that makes this more stable. So you have all this kind of ecosystem work, right? It's bringing up around it. Now a tibios angle is to say, hey, there's more to information than just volume, right? We view information as extreme information. That's really what we're dealing with. Volume is one dimension. There's also variety. There's complexity and velocity. And actually some of the, you know, other folks around here in the ecosystem are focused on one or other aspect of that. So it's a very exciting time to say, you know, users want the information they want, whether they consume that through an application, through search, through business intelligence, if the application blurs the lines between those things, that's okay. And what they want the information delivered to them the right way, right? And they want it integrated. The era of silos, of impossible, what I like to call the death of great ideas in the parking lot. I think you've heard me talk about this before, right? But you know, you're taking a shower or you're out on your morning job and you say, if I could identify classic customers who thought this and bought this way and who'd repeated this, you know, pattern over and over again, I could identify this incredibly interesting, valuable group of prospects to go after for cross-selling. Great idea when you're in the shower. By the time you get to work and park your car, right? Or get off the subway, you're thinking, okay, you've thought of all the reasons why that'll never work. I can't get a server fast enough. I can't get a DBA to go look at this. It takes three months to make that kind of change and a change request process. So people want, you know, those kinds of artificial barriers taken down. And that's what's really exciting and Hadoop plays a great role in that. Okay, so you've got this thing called Active Intelligence Engine. I think you just announced the new version, version three, I think. And you play in this unified information access space. What are you doing specifically to, you know, maybe bring that stability or, you know, enable Hadoop customers? So exactly as you say, Ativio has been focused on the unified information access problem. And one thing we bring to the table is enterprise readiness, right? So our system is monitorable, exposes statistics through JMX. It's a system that's designed to be distributed, not only across a set of servers, but different workloads can be distributed across different servers. We have built-in high availability, fault tolerance, even a self-healing mode, right? So if I have two nodes working the same content, one goes down, when it comes back up, it'll put itself back in production and replay any transactions that it missed. Obviously in reverse order of that. But so, you know, having the ability to do monitoring, having the ability to do management, to be able to do fallover, heck to be able to do geographically distributed information distribution, those are all rights to play. And if you look back, this is why I referred to Hadoop as emerging, right? If you look back at the relational database, it went through the same pain. You know, when the first versions of relational databases came out, they didn't have replication and backup and clustered operation, and now they all have those. So as Hadoop is solving some problems that, you know, in a way are too expensive to put into that database stack, as you use it more and more to solve more and more of those problems, it will need enterprise readiness. So we've really had it from the beginning. The big thing we've added in 3.0 in addition to geographic distribution is the automatic ability to support real ANSI 92 SQL. So now I showed some demos and in fact you can check it out over at our booth. We're showing examples where we're using Tableau to analyze unstructured information. Literally graphing key phrases over time, showing clusters of interesting text that the user can drill into, correlating that with other structured data. So enterprise readiness can be about, you know, high availability and fall tolerance and monitorability, but it's also about playing with the ecosystem. Companies that have Hyperion or Cognos or business objects or an older system or a newer system like Tableau or a CliqueTech or open source like Pentaho, all of those, you know, work with that open standard and so we're very excited to be able to bring velocity or variety or complexity and volume with Hadoop all to bear and all through those interfaces. So what's, we've talked about this before, Sid, but maybe explain to the folks that aren't familiar with it, Tivia, what's your secret sauce? I mean, you guys are a platform. Yes. You're not out to be a point product. You're really trying to have a platform on which you can build applications. So what's the secret sauce that you guys have invented? Well, I think our core innovation is the use of the search model, which is the inverted index, marrying that up with a graph engine. So if you think of a search index, right, it's just basically a list of words that point back to the documents or records that they were found in. Okay, that's great and everybody loves search and don't we all wish we had the search box in the morning when we lose our socks, right? You lose your socks and say, my socks says they're behind you, turn around. That would be awesome. And the enterprise- And there's one of them. There's one, always one. I hope that if there's an afterlife, you get all your socks back. I think that'd be a great way to start. But my point is this, you approach the problem of search and you say, well, it doesn't work in the enterprise. I don't need individual items. I'm not looking for the documents to mention this. I need to understand data visualized, aggregated, presented in a time series, right? It's a very different animal. So when you lay out data using an index, it's flat. You can have a title and a body, but you don't have relationships. The innovation is that we bring a graph engine to essentially link, build a mathematical structure from one node in the index to another. So essentially that edge is the sort of the relationship and that could be used to say, hey, this term occurs at a table or this table is related to another table. So an ID to a foreign key ID. That gives us the ability to do full-text search exactly the way you'd expect with any search engine, but also to support SQL, anti-92 SQL because we're able to do joins exactly the way a database does. Now we actually have some nice advantages. We don't require referential integrity or some of the other things that databases kind of impose on you because they want to keep the system stable from an operational point of view. We focus on the query side so we can relax some of those rules. Anything that matches can be joined on. That's kind of our special sauce, but all of that great technology is not so useful without the ability to get data into it and out of it quickly. So on getting it out, of course, we rely on things like SQL and ODBC and JDBC, but getting it in, we have a workflow model. We have more than 90 different components that we provide to our users and that's the real, the roll-up of the special sauces or the secret sauce is all of those capabilities, machine learning, classifiers and entity extractors, workflow to link it all together and make it occur in the right order. You have UI's to manage it, manipulate it, right? That's what makes it easy. A lot of people just don't know how hard that is. I mean, just the one search problem of in legacy databases is the structured data and you have new unstructured data, just that by itself is a hard problem. The whole other thing you just mentioned is even harder. I mean, you guys have got elves working in the North Pole. I mean, what's going on? How did you guys get here? I mean, just describe a little bit about how hard it is and how you got there and then what kind of solutions you're enabling. Great question. You know, I asked a question at my talk earlier. I said, how many people here have implemented a map-reduced job? Actually, the number of hands was not as high as I had expected. It was maybe 20 people. I said, okay, now keep your hand up. If you've written a map-reduced job that operates on Chinese data, all hands went down, right? I pointed out, I said, that's a much harder problem than operating on a bunch of English log files. We all speak English. We don't necessarily speak Chinese. A Chinese speaker will tell you that two speakers might disagree about where a word begins and ends. There is a lot to the vagaries of human expression, right? Don't we all have a friend who doesn't get jokes or sarcasm, right? Boom, right over their head, right? So, expecting a computer to be able to grapple with that, that's a real, that's a big thought, right? That's a bit of a leap. Now, you know, people write map-reduced jobs. They're mostly manipulating structure with, you know, our purchase data. You got to get at all that text data. It's very important that you can manipulate it and understand it and deal with, you know, synonyms and antonyms and acronyms and there's all these different... How many people are working on this? Well, so that's a great question. We've started out, we have 70 people total in the company. We have about 15 engineers at the end of the day of our core team. And look, we took the approach coming out saying, okay, do we need to build another search engine? And we had a lot of folks from the search world, right? Many of us worked at a big search company that was acquired a couple of years ago. We left before the acquisition. And we said, do we need, does the world really need another search engine? Should we just build another search engine? And we said, no. And then at the same time, we went and got a bunch of people from the BI world, big database folks from like Abinatio and Taland and Thompson and even Oracle. And they're like, do we need to build another BI tool? No. The answer was to go and build the new thing. And the approach we took is, do we have to build this and if we have to build this, the key decision point, can we add value? Is there something unique we can create? If not, there's no point. So we were brutal about build-by and we took a lot of free and open source software. We're very aggressive in the way we use it, meaning we only use the permissive license. We use a lot of tools like Black Duck software, big shout out to Black Duck. We use them and others. And we've gone through real deep scrubs with some of our OEM customers who embed our software inside theirs. But at the same time, we also said, hey, for some of the stuff, open source is not there yet. And we used the commercial third party. So we've licensed in a bunch of commercial third party software. And we've also gone and implemented, that's how you focus your implementation, on the things that really make a difference. Text analytics, ours. Not the kind of thing you can license, no good commodity versions out there. Machine independent, language independent, machine language trainable systems, that's much more the kind of thing we focus on. Similarly, the graph engine, the join operations, where we're actually about to be issued a patent for a join on top of a search index. That's where we focus. That's powerful. That's the new model for software now. I mean, look, I mean, we were arguing, not arguing, David, I never argue. We debate, or we commentate aggressively on the same thing, we both agree. Like when HP bought autonomy, we were kind of against that whole huge thing. But autonomy, if you think about it, their key value is the search. Not so much the other stuff, but does, there's an e-discovery component. But now search and managing these fabric, that's the new way. But with Hadoop, you got a lot more opportunities. Can you just elaborate how you see that? I mean, do you agree? I mean, I see that autonomy's done good for themselves, but I'm expecting HP's probably going to take that ativial approach with autonomy, possibly. I think autonomy is really a very old school, unstructured, focused company with the search box. And if you know- You've got to say on-premise. On-premise, on-premise, right, not cloud, not cloud. I actually, I'll tell you, I have a lot of respect for the way that company approached strategy, right? They said, search has a limited life in terms of growth. And so they focused on buying companies that had solutions, like email archiving. Obviously a very successful model for them, if you were to kind of look at that, outcatch, obviously very successful. But I don't know if that is anything that really changes the landscape of HP. I mean, I kind of think of HP as one of the great innovation companies out there. And yet, many analysts have noted they haven't really hit any home runs with databases. They don't have a massively parallel operation. And so I see that they bought Vertica, right? To get into that space before they had NeoView. So they're clearly trying to make a lot of this stuff work. I think Vertica plus Hadoop is awesome because Hadoop lets you deal with the fact that I don't want to pay for CPUs and licenses and storage, right? Necessarily to keep everything ready to query at any moment. So when I have that data, Hadoop will be a great compliment. And on demand, that's what a lot of the integrations are, on demand we could use that to take a big pile of data, reduce it to some set of summary records and put that inside along with the other pieces. But that is still mostly structured. It's structured data in the database, what I would call unstructured data. Meaning not content, not human text, just variable length records, web logs, click logs, et cetera. They're going to have a great solution for that. They all still need to come to terms with the unstructured side. And that's, I think, where the real future is. Being able to understand, we were talking about exception handling during my talk. It's great when you have a business process and you have step A, B, C. And everything goes A, B, C, no problem. But when something stops at B and can't move to C, you are so out of the process almost inverted, right? Whereas you have a small number of people working the process, to deal with an exception, all of a sudden you have this huge, potentially the entire organization. Who knows the answer? How do we solve this, right? Well, that is solving that problem. One great way to do that is enterprise 2.0 approaches, right? So get the right people kind of blogging, tweeting inside the company about what they're doing. But another way is to say, hey, if I could actually query all that information, all that unstructured. And at the same time, query that was structured data. Like say, find the customers who bought this product in this way through this channel. And who said this, let me find that group to focus on. That's where the power is, right? That's compelling. That's the future. I love that. I mean, I think it's a really pragmatic use case, hard to do. So I think that's fantastic. Let me give you a concrete use case. You did ask for use cases. So I'll throw one out there. We work with a major investment bank on IT incident management. And you know, look, it's tough being in financial services, especially in IT. The trend towards budget is down, not up. And so this group needs to do more with less. And recently they got beat up by their business unit. Business unit said, you guys are escalating more than half of the issues that come through here. And that means there's a real problem. So they asked the IT unit to go back and study the problem. And they said, well, we studied it on average for severity one issue. It takes around 27 minutes to resolve. And the problem is silos. And just imagine yourself as a poor system admin. Your pager goes off and you got to go fix whatever the system is. You know, the illustration system. So first you need to go look at this virtualized landscape. The servers, you know, the system you're looking for might be on different servers than it was on the last time, right? It may have moved. So finally you hunt down these log files. Now you're looking for the log files, not the easiest thing to navigate, right? Often a lot of different people's, different developers' work just coming out free form into a log file. So you got to kind of worm your way through that stuff. Finally you say, hey, that's the thing. That's the problem. Now you got to go to the design documents, the change histories, the knowledge base provided by the vendor, and maybe the SharePoint site that's packed with information by previous system admins, right? They've all put their notes down there over years. No wonder it takes 27 minutes. And if your 15 minute escalation time is there, of course you're escalating more than half. So we integrated all those sources of data, including all the log files from hundreds of physical servers. In index, they're brought together, marshaled, brought in, and indexed every 10 seconds. And this system for the system in cut the time from 27 minutes to three minutes. And that was the bank's own study. Because all of a sudden, instead of wondering, where am I going? You go to one place, a search application, right? You start by saying, what's the system name? That gives you the server map. Now you get the log files. Now you find the smoking gun. Copy that in the search box. Hit the search button. The answer comes up because it's in one of those sources, right? Design documents, change histories, trouble tickets, the wiki, with all the sysadmin comments, and the knowledge base provided by the vendors. So three minutes, it's a real game changer. And by the way, the ROI, and that's hundreds of thousands of dollars every month. But the story doesn't end there. What's even more interesting is that one pile of data or content or whatever you want to call it, that index of information that the sysadmins use through search, there's another group that's interested. The other group is the managers of the sysadmins. They want a dashboard though. They want aggregate trend-oriented data, not the individual items. They want to say, hey, which systems break the most? Which ones take the longest to fix? Which admin fixes them most rapidly? They even have us figuring out when there's a change management collision. Two systems getting the same change during a window of time. So that's the real power of UIA, right? It's not just that it's many different types of data at a large volume with pretty aggressive velocity. It's also that there's a complex mission and two different user communities don't want to have two different systems. My last question for you, Sid. You know, there's a lot of talk these days about who contributes the most to the open source. Do you guys have you or will you contribute to the whole open source Hadoop movement? Not to Hadoop, but we have contributed to other projects, patches and things like that. And we've funded development of Apache Poi, which is actually Poi is not an Apache project. It's used by an Apache project Tica because we think the world needs great format extraction. So we'd love to be part of the ecosystem. You know, we're a commercial company, but our approach is that at the end of the day, customers are going to want all of the best elements in their solution, whether they're open source or commercial source or something else. And we knit it together for them and add innovation on top so they don't have to. And there's definitely a group of buyers out there that that's the solution that they're much more comfortable with now. In time, I think we're going to see every model, all of them linked together, all working together, free, open source, premium, premium, commercial, hybrid models, versions where there's gaming involved, right, to get points and self promote the company. It's a great time to be in software and it's certainly a great time to be in information. Excellent, Sid Propstein, CTO at Tivio. Thanks very much for coming on theCUBE. It was always a pleasure. My pleasure, thank you so much. Great energy, good story. Great to see you guys too. Thanks, all right. Keep up the good work. Thank you.