 Okay, we're back, this is Dave Vellante and I'm with Wikibon.org. And this is SiliconAngle.tv's continuous coverage of IBM information on demand, IOD. And this is theCUBE where we bring you all the best guests that we can find at events, at tech events, within the enterprise. We try to extract the signal from the noise and deliver it to you, our audience. We're here at IOD for two days. We'll be at Strata on Wednesday and Thursday. We've got two cubes going on this week. Teams flying all over the country. Equipment flying all over the country. We're here live at the Mandalay Bay in Las Vegas. And I'm here with David Inbar, who's the Senior Director of Big Data Products and Solutions at Pervasive Software, an IBM partner. And we're going to talk about big data. We're going to talk about data and integration and some of the things that you guys are doing. David, welcome to theCUBE. Great, good to see you. Good to see you. So first of all, IOD, big event. I don't know if you saw the keynotes this morning. Yes, big event. Jason Silva was blasting us with big data concepts. I needed a fire hose to drink from his discussion, but it was good. And we love the idea of big fire hoses of data. That's really one of the areas we're specializing in. We've got a whole business unit devoted to big data and big data analytics tools. And we're very excited about it. And we started researching and preparing for this quite a few years ago. So talk about the company. It's a publicly traded company. You got your earnings call today. Probably going to have it as we're on this call. As we're talking, right. So we'll keep an eye out for that. The market's closed, so you're probably just about to make that release. So obviously we can't talk about that. But talk about the company, and then we'll get into the big data specifics. Because you guys have been around for a while. Okay, yes. We've been around for a long time by software company standards around over 25 years as an independent company, which is pretty unusual. But we're reasonable size, about 50 million a year. Relentlessly profitable. And most importantly and most interestingly is we have literally tens of thousands of customers around the world who leverage our software or embed our software in interesting solutions that they're delivering to their customers. So give us some examples of some of your customers and some of the things that they're doing. Sure. So on the database side, we're embedded inside a lot of accounting applications. So we have a database product that's embedded inside accounting applications, project management applications, project tracking, all kinds of, let's call them traditional transaction centric kinds of applications. Not the kinds of things you'd think of with big data, but they are applications that run businesses all around the world. And on the integration side, we have data integration products that allow companies and partners to shuttle data backwards and forwards, synchronize it between lots of different applications both on premise and in the cloud. So we have a lot of experience with cloud-based deployments of things like integration as well as on premise. So in a way you guys have somewhat of an independent perspective on all this. So you're seeing the worlds of traditional data analysis and structured data collide with big data, which is largely unstructured. What's your angle on what's going on there and how does pervasive add value in that integration? It's a great question. Are you really going to let me loose on this? Yeah, go. So we see big data as being both structured and unstructured. Because part of the big data avalanche, of course, is what people call unstructured. Some of that is indeed text-based, sentiment analysis, analyzing emails, analyzing Twitter feeds, those kinds of things. A lot of it is machine-generated data or sensor-generated data. And those volumes are climbing extremely quickly as well. So the big data challenge is not just an unstructured data challenge, it's an all-around data challenge. And how do you, as you mentioned at the beginning, how do you extract the signals out of a massive avalanche of noise at this point? We started to look at that problem not entirely knowingly, seven years ago or more because we had integration software and our customers were coming to us and saying, well, we've got this much more data every year and we've got this much less time window in which to process it, whether we integrate it, enrich it, validate it, whatever it was that we're doing. Are you going to be able to be with us as we go on that trajectory? We looked at some of it and we thought, well, we're pretty hot, we know software, sure. But we started coming across instances where the growth rate was really getting interesting, including some of the early web search engines, companies, I can't give a name, but we were talking to some of them about their log file processing requirements and we started to see, this is a whole new order of magnitude, maybe several orders of magnitude different. So we put a separate research team together to build what we call the next generation integration platform. And they came back to us with something that was optimized for leveraging multi-core processors, which we could see coming as commodity, weren't obvious then. And they built what we call a parallel data flow engine, which is now called DataRush. And that platform, which we've now had embedded in some of our products for three or four years, so it's robust technology, we know it works, it's out with hundreds of customers, is now the core of our big data analytics push because that's capable of handling hundreds of, literally hundreds of millions, billions of rows of data, both structured and unstructured on very modest hardware. So we're now starting to deliver big data tools for really, really tough problems that take advantage of also the new architectures, we'll talk about Hadoop and Big Insights, and that deliver new economics to customers because that's really the issue here. It's not that all the traditional data warehousing and relational database and everything else aren't delivering huge value, they are and they'll continue to do so, but they're not going to scale to a hundred X and a thousand X, which is what we're starting to see. So John, this is music to our audience's ears, running all this on commodity hardware, new infrastructures. Yeah, so the thing I want to, people are always talking about ETL, this and high performance that, it's usually higher end systems, commodity or industry standard hardware, depending on who you talk to, is important to be part of that. But the question I have is, talk about the software side of it. So let's talk about Hadoop for example, Big Insights, all the rage. Connectors sounds like a really easy solution. What's your view on connectors with Hadoop? Some are saying, hey, connectors are bad, unless you actually know what you're doing. Connectors are not the, it's not the silver bullet. A connector to Hadoop is not the silver bullet. Well connectors is part of the story, but it's not the biggest part. So is it helpful to have connectors to other systems so you can pull more data into Hadoop? Sure, is it useful to be able to read some data out of Hadoop? But the significance of Hadoop is, it's a massive general purpose. Moving data around, it's good. Connect systems together. And for running analytics. So if you want to run machine learning algorithms, if you want to do predictive analytics on combinations of different data sets that have correlations or don'ts, or rules that are hidden that you don't know about or don't, Hadoop lets you do that. The traditional predictive analytics approach to things was to say, okay, I'm going to build a model. In order to build that model, I'm going to take a sample set of my data because I can't process all of it. And I'm going to figure out a model with a degree of certainty that it works. And that it's this percentage accurate. And with Hadoop and the right software running on Hadoop, we'll come back to that. You can just run your analytics on all your data. So you've got a high degree of certainty and you'll start noticing the anomalies. They want all the data. But everyone wants to do it on all their data. Kill sampling, right? Like Abhi Metta says. Sampling's dead. Sampling's dead. All the data. And if you want to look at all the inbound network traffic into your environment and you want to start detecting what they call advanced persistent threats, long running, really sophisticated intrusion, you can't just do it on the last five minutes. You'd better be able to go back doing it over the last five months and five years. Different database. We talked about HBase, time series, relational. But I want to drill down back to the connector conversation because really there's two modes that people think about. Do batch, do all kinds of analytics, store it and then do batch, do analytics on that. And then real time. So they seem to be converging. There's an overlap between near real time. What does real time mean? We'll just call it real time, near real time. And kind of like just analytics on stored data. Can you talk about that and how that's evolving from your perspective? Absolutely. So you're absolutely right. They've been traditionally thought of as two separate arenas with separate sets of tools to address each of them. And you've got batch data where you've stored it and it could be in a database, it could be in DB2, it could be in HBase, it could be in Cassandra, whatever it is. And you query it and you keep adding to it. And you preferably want reasonable real time addition to it and updates. But then you've had this world of streaming data and what they call CEPs, complex event processing engines. Closer to the world of Wall Street trading, that's extreme real time. But in most businesses, decision timelines are probably 30 seconds or they're 10 minutes or they're a couple of hours, depending on the interaction that's going on. That division from our point of view is artificial. Certainly from our perspective. Why is that? Because if you're using the right approach, we build data ration of what we call a data flow principle. So everything to us is a flow of data. Whether you're pulling it off a disk or you're inhaling it off a feed of some kind or another. I like that term inhaling it. It's data. A lot of it's a big choke as they say of data. It's inhaling the exhaust. It's data and you want to run analysis on it of the last five seconds or the last two seconds. But you also want to run analysis on it on the last week and the last month and joining all kinds of other data. In principle, if you break it apart, the processing you need to do is the same. And yet, traditionally we've had vendors focusing and delivering different tools for each of those things. A lot of it could actually be common. We're probably going to see a lot more of that going forward. So as we're going to be at Strata and Hadoop World next week, Riley's big event. This week. I mean this week. This week. Tomorrow. I'll be there. I've got colleagues who are going to be there as well. I'll be there setting up tomorrow and processing Wednesday and then we'll be able to fly in. We're physically at both events. Yeah, I'm sorry to hear that. I'll say it. It's big data week. I was saying it's bigger than Shark Week if we were on cable. We'd be pulling in the ratings anyway. No, we're sorry, we love this. We love this. But Hadoop and that emerging ecosystem is all where all the coolness is, right? It's where the action is. The sandboxes and emerging new startups and the tech. Here at IBM IOD, Pauline Nissen from Intel called it the adult supervision of big data. That's IBM. Everyone's growing up into the big leagues, right? What is the requirements? What are the table stakes to play at that level? I want you to talk about that. And then to talk about some of the barriers that need to be taken down to get there. Right, right. So table stakes, minimal table stakes. You talked about connectors and a lot of vendors have come out and said, we connect to Hadoop, which is good. That's fine. You can pull and push it down. Yeah, you're on, you know. All right, yeah, I got a connector. So what? But the really interesting thing about Hadoop and big insights here at IOD is it's extremely powerful, but it could be a pain, a real bear to get it set up running and then you start writing code to make use of it and using MapReduce and then you've got to have people with special skills. So I think as an industry, the real challenge is can you make it as easy to use as the technologies that are 10 years old, 20 years old? Is it, can you get it to that level of maturity where it doesn't matter to the business analyst? Business analyst has data, they want to understand it. They shouldn't have to care, is it Hadoop running in the background? Is it IBM DB2? Is it MySQL? Whatever it is, or is it a set of files? That should be transparent. So that's what I think the real table stakes are now. Is there software out there that can do that? And then the other big topic, I'm just going to throw it out there, which we may or may not want to go down that track, is the energy consumption of all of this activity. There's been some other articles and conversations about that and data centers are proliferating and absorbing more and more electricity every day. And the people who are building those data centers are doing a great job on making them more efficient. The hardware vendors and the chip builders are doing a great job every year of making their equipment more efficient. The software writers and application writers are actually the guys who are the big obstacle here. Because they're still largely delivering software that is serial in nature and doesn't leverage multi-core hardware. And if you look inside most data centers and most computer systems and you look at CPU utilization, you'll be lucky if you're above 20%. Most situations you're going to be closer to 12%. Even in a highly virtualized environment. Most of the calls are not being used. That's true, right? Not being used. People talk about 70, 80% utilization. It's not the case in most situations. Except with pervasive. So. Why is that? Well, that's because of this rush analyzer and data rush technology we built is designed to automatically at runtime, looks end to end at what you're trying to do, looks at how many cores and spindles you've got, where your data is distributed and optimizes. And then you'll see all the cores light up accordingly and run at 70 or 80%. And it's energy saving gets hugely faster and it's why we have partners and customers like Opera Solutions that are standardizing on data rush because they can build their analytics once and their people don't have to care about what hardware it's running on. They can deploy it on small systems, on big systems, and it'll run incredibly efficiently on all of them. So that's my plug. So Opera, since you brought them up, it's a very interesting company. We first found them, geez, I can't remember now, John. We had them on Open World. SAP Sapphire, that's right. Not a company that doesn't do a ton of marketing, but when we did the industry's first big data market sizing, Jeff Kelly, the top big data analyst at Wikibon did the market sizing report. Opera came up, very surprising. It was number one and number two in the marketplace, just behind Vertica in terms of revenues, for Pureplace. And Opera uses your solution. Opera's a very interesting company. It has a key building block. Talk about it for something they call Signal Hubs. It's this really, it was a great name. And Opera's this company with a couple hundred data scientists, they're solving real, serious big data problems driving productivity. So talk about how they use your solution a little bit, if you will. I will, and they've been using us for less than a year, but they've started deploying now to multiple customers in industries that, you know, they're very big in financial services, capital markets, in federal and government spaces, as well as retail. Their Signal Hubs concept is this whole idea of extracting useful signals out of the mass of data that's flowing through organization systems. And they do that with all kinds of vertical market solutions. One of the early ones they've used as far as a risk management solution, which is used by some of the largest financial institutions in the world for managing their portfolios and their client's portfolios. They had a situation, just to give you a case in point of where speed and efficiency really matters. They had a customer where it was taking about 20 hours to update their portfolio with this risk management solution that uses advanced algorithms one kind or another because of the massive new data and the accumulated historical data. 20 hours, it's okay, but it wasn't very exciting. They got it down to about 15 minutes using our software, same hardware. And that's nice because now you can do things faster, but I think the more interesting thing is the significance to that financial institution is that they can start innovating in how they're managing their risk. And they can start doing, offering probably a wide range of services to their clients because they know that they can give them either lower risk or broader opportunities to go handle their assets. I love these stories because it's companies that most people haven't heard of solving real problems, driving productivity, solving problems that a lot of the big whales can't necessarily solve. So they look to smaller companies. Or they can, but it comes with a different price tag. It comes with a different price tag, maybe it takes longer, it comes with other baggage. Right. David, my final question before we break is, big question is, for big data insight from you is, what do you think will be different next five to 10 years with big data? All this stuff going to happen. I see the database is more computation, commodity hardware, scale out open source to full on IBM, big iron, developers. It's craziness right now. What's your vision next 10 years? Well, I think what's most exciting is in a way the things we can't yet imagine today that we're all going to be able to do with big data. Because in the next few years, will all of us have very powerful machines sitting on our desk as well as connection to cloud with infinite resources. And we'll be able to dream up applications that we're not using today that we can't imagine. We're carrying around our phones, our PDAs, one kind or another. And I can really start, I imagine I can really start using it to understand what's going on with my health or I can start to use it to monitor a mixture of macroeconomic with very industry specific data and start detecting patterns of one kind or another. And that's all very exciting. So it's the new businesses that can be built around that. Arguably you could say Google and Yahoo were generation one data centric businesses. But there are now what? A thousand startups and probably over the next few years. It's early too. Another 5,000 startups, all of whom. Vertical markets wide open. Yeah, so the barriers that were there to really doing something with data are going away. Yeah, the entrepreneurship angle is just phenomenal. I think you could really start to cover a very low cost and go innovate in a vertical. I mean you could literally take one feature of vertical that's disruptive that no one's thought of yet and literally as a green field completely disrupts that market. There's a relatively trivial example but it may be a good one to end on and start thinking a little bit about the potential. A jet engine on a commercial transport plane has approximately 1500 sensors in it during a single trip across the Atlantic. It accumulates about three terabytes of data. At the end of the trip, that data gets thrown away. It doesn't have to be today. Today it's really not that expensive to start keeping that data and then start running analytics and saying okay, so now we're going to start learning a whole lot more about not just fuel consumption, but about engineering stress, about safety on the flights, about weather conditions. Yeah, I was talking to a friend last night. We had a little geek's dad's night out dinner kind of physically a bunch of guys get together and we drink wine and have food brought in and we just talk tech and we talked about the space jump which was an amazing publicity stunt. I mean, can't get any better than that. Go to 25 miles and jump out. I mean, come on, that's pretty cool. But they're instrumenting them all the way down. He went into a flat spin. Right, he's an amazing data collector. He collected, and the insight from that was that they now know what to do in a flat spin as well as knowing how to take someone up to the edge of the atmosphere, which could be a portal to space travel. I mean, this is kind of the wacky science that people are talking about. So this is just the beginning because it's so inexpensive. Right, so think about that. If it's inexpensive to get people there, that's the first stop to Mars. So these are the kinds of things to me that is just so intoxicating about the possibility of big data. So we need you guys to keep on banging out your technology and IBM doing their thing and we'll be at Strata this week to get more. Davey, congratulations on all your success. Great to have you on theCUBE. Thank you. Appreciate it. This is siliconangle.com and wikibond.org's theCUBE, our flagship telecast regarding the events. We'll be right back with our next guest right after this short break.