 At Big Data SV 2014, is brought to you by headline sponsors, WAN Disco. We make Hadoop invincible, and Actian, accelerating Big Data 2.0. Hey, welcome back everyone to Silicon Valley's Big Data event, Big Data SV, hashtag Big Data SV. Search on Twitter or go to crowdchat.net slash Big Data SV. Join the conversation. This is theCUBE. I'm John Furrier, the founder of SiliconANGLE. Join my co-host, Dave Vellante, the co-founder of wikibond.org, and we are covering all the action here in Silicon Valley and the Stratoconference right behind Desil Crosses Street, and our next guest is Mike Hoskins, CTO Actian. Welcome back to theCUBE. We add them on Big Data NYC. Had to at the end of the event, but we had a really engaging conversation, and now they have news we want to dig into. First, Mike, welcome back to theCUBE. Great to see you. Thank you, John. Thank you, Dave. It's good to be here. So, I want to really appreciate you coming on, also talking about the news that you have. So, tell us first the news that you have. You guys released some news today, and where is you guys going technically, and what does that mean for today's market around proving the value of Hadoop? So, 2014 is a really big year for Actian. The news today, the announcement around Strata, is really a follow-on from two weeks ago when we announced the Actian analytics platform. You know we've acquired several technologies over the last couple of years. What we've done, and this is a steady evolution of developing the products and platforms, we've decided that having spent the $150 million to acquire really next-generation advanced technologies around data flow, around the world's fastest column or analytic databases, instead of doing like pretty much every other vendor does here, and deliver those to the marketplace as a point product, and then leave the customer to struggle with how it all comes together, we've assembled all of those next-generation IP assets into a single platform. We call that the Actian analytics platform. And so no matter how you want to connect to your data, capture that data at scale, drop it someplace and land it, maybe HDFS, push it through your analytic and data processing pipelines with data flow, and then on into a next-generation, incredibly fast column or analytic database, or maybe just native analytics in Hadoop, maybe a graph database with the semantic web in triple stores. And all of that has now been pulled together by Actian into a single, coherent Actian analytics platform. So can you talk about those pieces? I mean, obviously Paracell and Provasive were in there. Can you talk about the IP on which you've built that platform, those you sort of turned point products into that platform? Just refresh our memory there. Sure, the technology is really game-changing in pure performance, but also massively in price performance kinds of ways. I think of our first major acquisition, Vector. Vector for vectorizing your software stacks and your databases. We're now the title holder world's fastest database at one terabyte, you can go to tpc.org and in second place and third place, our Microsoft and Oracle having been squashed, we're 100% more faster than them, but and probably on one 20th the hardware. So you have to ask yourself, how does a fairly new young database, obliterate the giants of the database industry in price performance? And the answer is it's modern software. It's software that understands that the gift from the hardware gods is incredible. A modern CPU architecture has pipeline processing, 15 functions queued up, it has L3 cache of 30 megabytes. The whole RAM a few decades ago was only 30 megabytes. Vector processing is a first class citizen in the x86 instruction set. So what would a database look like if you wrote a database from scratch now with next generation columnar technology software only scaling out on commodity hardware, next generation compression, using vector technologies, using compiled queries, using advanced techniques. Well the net result is you'd have technology that would be on a price performance basis devastating to traditional players in the industry. Well that education said let's go out and find the other IP assets that are like that in the marketplace, went to pervasive, found their integration plumbing which allows us to fan in data from any endpoints, found the data flow technology which is the next generation pipelining technology scales up perfectly for your server, scales out automatically on Hadoop, has visual tooling. And then we also acquired Paraxel which is the world's fastest columnar analytic database but scales out on commodity hardware as well like vector. So that, you know you ask the question, these are IP assets, very modern, each of which had tens of millions of dollars spent on it that are now under a single corporate umbrella and a single platform. The number you threw out was, you said 150 million you guys invested, is that the number? The total acquisition cost of all the companies but those stacks, each of those stacks, tens of millions of dollars invested. But so that's really not a lot of money to spend on those assets. I mean you think about companies today, you're seeing companies getting funded John, I mean you know this better than I, 50, 60, 70 million dollars raises across there through A through C rounds. So I mean 150 million, it's a big chunk of change obviously but for those three or four assets that you just mentioned, it's actually, you know, pretty productive use of capital. Oh, it's awesome. Our board of directors is very clever and they were very astute to recognize that those assets were out there and they've now been assembled under a single corporate roof. How's the word getting out there? So let's give us a little taste of like some of the success, you put the pieces together, you can actually build an engine, we like to use metaphors, sports or cars, you got a new engine. What does that render to itself and what are some of the success, can you share some of the things that you guys have done in the marketplace around your portfolio, which is now the engine? Sure, the best way to do that, I mean I'm a technologist so I'd love to share the, more of the technical virtues, go ahead. Look at our customers, we have over a hundred customers already deployed on these next generation assets. So just had dinner, a couple of weeks ago at an event with the CTO of Evernote, we all know Evernote, it's a classic internet scale business. They wanted to find out a way to convince more of their free users to become paid users. That's a classic data science problem. How do you do that? You mine through infinite log files to find the patterns and find the premonitions and find the willingness to upgrade. And they did that and they tried to do it with classic Hadoop and Hive and of course, extremely low performing design time and runtime and ended up adopting our matrix or what used to be called Perux that was now called matrix in the Acti and Analytics platform. And that gave them sort of massive scalability and they were able to run their iterative algorithms and science and their impact on revenue, directly on revenue growth, directly attributable to that deployment on our matrix database was huge for them. That's just one example. We've got customers like Autometrics who were able to use the integration plumbing and develop new business models around next generation analytics on the automotive industry. OfficeMax was able to save a huge amount of money by crushing and collapsing down the amount of time they spent on their analytic workloads. So I wonder if I could come back to that. We talked about 150, really not that much to get a platform. Are you comfortable that, I mean there may be some tuck-ins that you need to do, but are you pretty comfortable the platform is where you need it to be? You don't have to make any major acquisitions or are there still missing pieces in your view from a CTO perspective? That's a good question. And I think one of the blessings of the transactions was that they're highly complementary. And so I've had a chance inside, outside to look at this and there are always opportunities to acquire interesting add-on pieces of technology and I don't think we're going to rest in that. But I got asked that question this morning in a different venue and here's how I answered it. The challenge for customers now in a young space like this big data and analytics space, technologies are very immature and people are going best of breed. There's not the big single place. We're not a ring-fence platform where everything is us and us only and it's us or nobody else. We're a platform that plays easily with other corporate assets and especially plays wonderfully with open source. You look at the juggernaut that is Hadoop and some of the cool technologies coming in the open source space. So I think it's not so much acquisition and tuck-in, it's finding a partner in technologies both commercial and open source that we can play friendly with. For example, our Hadoop stack already interfaces with high degrees of parallelism with Flume, we just wrote Kafka interfaces. We've got the world's fastest loader-unloader for H-Base for example. So we're big believers in Hadoop for example and the open source movement and I think we're going to see the acting analytics platform coexist nicely with other modern advanced technologies to deliver the analytic goodness to the customer. So Mike, as you know, the best technology doesn't always win. So when you talked earlier about some of the benchmark data and smoking some of the more traditional guys, but those traditional guys, they have a way, they have that magic dust that they hypnotize their customers and the customers will keep sort of buying from them. So what gives you confidence that you can unseat or how will you sort of migrate customers? Can we, I wonder if you can talk about that a little bit? Sure, if you look at some of our customers again and the success they've had, it isn't just about the technology and I'll come back to that. But I can think of a top 10 worldwide bank that benchmarked us against Natisa for example and we were hundreds of times more price performant. I mean the cost takeout opportunities to migrate a workload to a modern next generation technology like ours is just stunning. So whether it's revenue optimization or cost takeout or risk mitigation, it doesn't matter. If you get to a point where your platform enjoys high levels of visual tooling, native execution against the data on Hadoop, we've got one of the world's first yarn certified engines executing post map reduce in the Hadoop platform. When you get to that level, it is hugely compelling to customers. But you're right, it's not about the benchmarks. What do they want to do? We have a retail partner in India, one of the largest retail companies in India, 7,000 locations. They were using competitor technology, in that case I think it was Green Plum. Queries were taking forever. They moved to the Actian Analytics platform, collapsed the runtime and they're now doing marketing optimization and merchandising optimization, almost to the minute because they can. And what a difference that makes to their business. So whether you're doing retail analytics like OfficeMax and Future Group or healthcare analytics, this time compression means you can drive better business decisions. So some of the fundamentals that you'd look for, big market, you got that check, we just quantified that with Wikibon today, headed toward 50 billion, 18 billion today. So that's exciting. 10x better, you got that, it looks like. The other is experience and execution prowess and ethos. So I wonder if you could talk about your ability to retain talent from some of these assets that you've brought in. Some of the innovators, some of the disruptors, have you been able to keep those guys? I think an example, maybe talk about that a little bit. So it is a challenge always in this space. It's intensely competitive. We've done very well. I just heard from an engineering team where the product manager couldn't find a place to sit in a cube today because there's been so much recruiting lately as our staff is growing in one of our database technology. So that's exciting. I think, I mean, there's something that people may not know about Actian. In addition to being sort of this young tiger with big data analytics, where we think we have, especially with the analytic platform, a multi-year lead, we're a very well-established company. We're about 140 million in sales. We're one of the largest private software companies in the world. We have 10,000 plus customers historically. We have product lines that have been around for 30 plus years. We answer the phone in every major geography in the world. We're not a lumbering beamoth that's billions that's lurching and getting in its own ways, but we're also not one of the startups 95% of whom might go poof risk. And so we feel we've got a great balance there of heft, seriousness, strong history of data expertise in the marketplace, and an investment in a portfolio that is gonna be game-changing in, I think, in the coming years. Mike, I gotta ask you about just kind of the platform or architectural vision. What is your vision and what are folks out there that might have different visions around the preferred future? Is it gonna be more horizontally encapsulating other elements? Dave and I talked about the data center being the API of the future. How are services gonna work with all these subsystems? And you have things like HANA out there, HANA databases proprietary with SAP, and you got open-source stuff out there. You got all these different elements. Is there going to be a layer on top as an architecture or something else? What do you see as that preferred architecture? So I get the luxury of CTO at Actian to run something called the Innovation Lab where I have spent many years contemplating that exact issue. I now have a new toy, which is the Actian Analytics Platform, which as I said is an end-to-end platform for capturing and analyzing acting on data at scale. So I've actually said, what would a solution architecture look like if I had the perfect platform that could have massive parallelism and inherently in it and scale sort of effortlessly for any data volumes? And it looks something like this. And I think data flow and the notion that data is flowing is important. We're instrumenting the universe. We're about to hit the back-breaking wave of machine-generated data. So the digital data tap has been turned on. It will never be turned off again in the rest of time. Which means almost every architecture we have right now that's classic BI architecture, too static, too rigid, too brittle, it's gonna be overwhelmed with this river of data. So architectures must be more streaming-oriented. They must be able to capture data at any scale. So think fan-in collection frameworks. They have to then land that data someplace, HDFS, perhaps the economical place. And then I think we're gonna see what I call analytic pipelines. There's been 30 years of focus on operational workloads. I hear this every time I wanna go to IT and CIOs. They're thirsty for a new generation of analytic applications. They wanna change their business processes. They wanna crush fraud and they wanna get revenue optimization going. They wanna find the bad guys and reward the good guys and stop customer churn. And these are the business outcomes that they wanna, and they know that analytic gold is in that mountain of data, but how do you get from the mountain of raw data to the analytic outcomes? Well, you have to be able to capture that data at scale, land it, you have to be able to push it through your Hadoop pipelines. We have that Dataflow technology that with visual tooling allows us to do any kind of data preparation at scale, any kind of data cleansing enrichment. We don't use MapReduce. We're much higher in the stack than that. So kind of like Google left MapReduce long ago. Yarn allows other execution engines to come in. And I think, and this is the big idea, we will see a fan out of new analytic techniques. So clearly we will load columnar analytic databases, but we may wanna leave the data in HTFS and do deep iterative data mining and machine learning algorithms on it. Barry Zane, the founder of Netizen, Perkzell is with us. He's now got a new triple store, a semantic web database. You put the data in a graph database and the data can talk to you. The graphing clusters and affinity relationships emerge spontaneously. So yeah. We love that affinity rank algorithm that we have with our crowd chat. But I gotta ask you, is like, let's take HANA for instance, you know, from SAP. That's like a database looking for a problem. And does solve a lot of problems. Use cases are pretty specific. They've got some well documented use cases. But what you just described is outcome specific solutions that need different architectures or technologies based upon the problem. So is the world shifting completely to the world? Outcome drives technology or technology is available to solve an outcome? Well sadly, we live in an industry where technology has driven too much of the equation and I think we're gonna see a shift. I mean, it used to be just big data. Now it's big data in analytics. I predict that it'll be all about the analytics. Customers are looking for this holy grail of closed loop analytics. How do I generate new insights and new patterns and new predictive models out of my data and then promote those to analytically enable my business processes so that I can make ever more timely and accurate decisions regardless of my domain. And that takes this notion of a continuous pipeline of raw data coming in at an incredible scale of being machined and refined into high quality, optimized analytic methods and outcomes. And then you feed those analytic outcomes continuously into your business processes. So you asked about HANA, I really think that's a more traditional view where data goes to die. It's just you work, work, work and then it just sits there. And my vision is much more of a constant flow of data where you have to be ready to eat lots of scale and complexity and then you need to use optimized analytics to enable and infuse your business processes. And that's not a one size fits all. And that's why I like an explosion of analytic applications. And I think our platform is geared very well for building those pipelines and analytic. Dave and I have been talking about this concept of data fusion that's been kicked around in the industry of fusing data together. Is that down the road where you're enabling those kinds of analytics? Is that kind of fused, capture then fuse? I mean, what does data fusion mean? So it's a great question. It's actually front and center in our innovation lab right now. So it's earlier in the pipe. You're capturing, if you know the sensor data revolution and machine generated data, there's a popular phrase there, sensor data fusion where it's actually stickier and used more regularly than in our traditional structured data world. But it means the same thing. It means when you're pulling data from multiple sources, there are opportunities to correlate that data across what I consider immutable dimensions. So who, what, when, where? What was the time stamp on earth when that digital data event was created? What was the geo location where it was created? What was the app that created it? What was the log file type format that created it? And if I know these core metadata elements, what I call immutable dimensions, then I can begin to fuse the data into a consistent single view of that digital data event so that it can transfer down the analytic pipeline. And if you really want to get interested, think ontologies and think the opportunity as data becomes more standardized for us to fuse the data based on standard metadata dictionaries and ontologies. And yeah, I think it's earlier in the pipe and it's a very important, in the next generation of analytic applications, I think data fusion. Is this the telegraph of where it's gonna go? The tail sign for the machine learning and automation, a lot of the AI kind of personalization. Is that kind of where this is leading? I think it is. I think that's at the far end of the pipe. It's the deepest science. But yeah, your challenge, if you want to build a new generation of analytic applications, and analytic gold is the outcome. I need these new methods, these outcomes. I need to drive my business processes ever more efficiently. Then I have to capture the data, fuse it, push it through and then apply advanced analytics to it. Not your hindsight, your backwards looking, dashboards are good and Olabs are good and Cubes are good. But there's predictive power. One of our partners, Opera Solutions, you may know, they're one of the really good companies. They build a lot of their stuff on our stuff. We're partnering with them all over the world. It's very exciting. They talk about finding the predictive signal in your data. And I love that phrase because it's in there. Which means kind of, it's a wheat and chaff. Get rid of the chaff, find the wheat. Find the essence. Find the analytic drivers in that data. The insights and predictive models that you can then stick into your business processes and defeat your competition. I know we gotta go, John, but you called it master data ubiquity and you hear people talk about single version of the truth and a lot of people say, well, we're gonna be further away from the single version of the truth than ever with big data. But what you're talking about this notion, this concept of data fusion is really a new type of single version of the truth. It's not reporting. It's not like you say, looking back. What my quarter look like and what the financial data looks like, it's predicting based upon data that you can trust and then driving outcomes that are dramatically more productive than your competitors. That's to me the difference here, is that everybody can, I mean, IBM and Oracle and all the established vendors, they're gonna be able to say, hey, we can drive business outcomes too. What your premise is that you need new architectures to drive business outcomes that are gonna drive competitive advantage greater than the guys doing sort of the old way. That's your premise. Yeah, I'll assert it categorically, the scale and complexity of big data and machine generated data has only just begun will break the back of the legacy software industry. And yes, we need new plumbing. We need new IP stacks. We need new solution architectures and new platforms. And let's let these platforms be driven by the high value end of the pipe, the analytic outcome. Let's talk to businesses about whether they're doing revenue cycle optimization in healthcare or seat pricing optimizations in the logistics industry or preventive maintenance in the logistics industry or trying to do disease prediction in healthcare or fraud detection or risk mitigation. And all of these are, you know, and this is the early days. Right now, you look at the ROI our customers get when they implement these analytically enabled business processes and it's off the charts. Now 10 years from now, that might be harder to get that monster game. But right now it's there for the taking but you do have to look at a new generation exactly like you said of architectures, of analytic workloads and applications. And I think a lot of the themes that we've acquired are IP stacks and a lot of the thinking around the acting analytics platform is spot on for a new generation of analytic applications. Mike Hoskins, great conversation to remember. I remember in New York, very unforgettable conversation, very riveting, love the vision. We're in a technology confluence of a lot of things happening. I think that the methods and some of the outcomes that we're going to drive, huge innovations, data fusion, we love that as well. Great to see you doing some great work. Always great to have you on theCUBE. That's some tech athlete tech talk right here on theCUBE, which we love. This is the Big Data SV. We're here in Silicon Valley covering the Stratoconference, all the activities in the Big Data world. I'm John Furrier with Dave Vellante. We'll be right back. This is theCUBE.