 Live from Midtown Manhattan, the Cube's live coverage of Big Data NYC, a silicon angled Wikibon production, made possible by Hortonworks, we do Hadoop, and when this go, Hadoop made invincible. And now your co-hosts, John Furrier and Dave Vellante. Okay, welcome back everyone to the live from New York City for Big Data, New York City, Big Data NYC. This is where all the action is happening. Part of Hadoop World Stratocons are exclusive coverage. Silicon Angle and Wikibon is the Cube, our flagship program. We go out to the events, we start with the silicon noise, and we're wrapping up towards the back end of day three, I'm going to show my co-host Dave Vellante, Dave, third day. We're grinding it out, but it's a lot of fun, a lot of great stories, great personalities, great technologists, tech athletes, as we say. And our next guest is Mike Hoskins, CTO of Actinian. Welcome to the Cube. Thank you very much. Thanks for coming on to the Cube. Cube is where we talk about just tech, one, and things going on in the news and what you guys are doing. So talk about your company and tell us about what's going on in the past year since the last strata. Sure. Glad to. So Actinian is a private software company, about 140 million in sales. We're one of the largest private software companies in the world. I'm the CTO of Actinian. It's been an incredible last 12 months for Actinian. A series of Acquisitions 3 altogether that we've strung together started with an acquisition of pervasive software, a public software company, about 50 million. That was my business. So I got to Actinian recently, only in the last year, via that transaction, I was CTO of pervasive. Right. That's where we met. That's right. Last time we met, I was wearing the CTO of pervasive hat. We had an effort around data integration and big data and analytics there. And that was the interesting technology for Actinian. Then in flight, while we were acquiring pervasive, we acquired Versant, an object database. So data management is young again, object databases are interesting, no sequel is interesting. And then finally, to wrap it off, kind of the signal achievement, we acquired Paracel, which is, in my opinion, the world's fastest column or analytic database. And of course, analytic databases are hot. And so there we had those series of assets stitched together now. And we made two big announcements in July a few months ago around the Actinian data cloud, which is a secure elastic cloud for doing data integration. How do you integrate the Internet? How do you tie together all the new APIs born on the Internet? And we announced, and that's what we're talking about at the conference here, the Paracel big data and analytics platform. We adopted the Paracel brand name. And so Paracel is a brand name now, spreads across the entire end-to-end portfolio of big data and analytic IP that we bought and product offerings that we have. A lot of bots on Twitter around that product. So tell them, what's the talk of the show? What are you learning out there? So the show is really interesting. What have you observed? Maybe not learn, CTO, I'm sure you've taught a few things to folks out there. But what's the buzz? I presented yesterday, and we talked about that a little bit. But first, just stepping back and looking at the show, I've been coming to Strata cheese year after year for four or five years, it seems now. And it's bigger and it's better every year. I really think they've lifted their game this year. I think the buzz was higher. It's certainly packed. I just read that they're going to move to the Javits Center next year, which I applaud. I think that's a good idea. But it's just the vibrancy in the ecosystem is stunning. I mean, without mentioning any names, you walk in, the sponsor pavilions are just overflowing with booths and with people. I tucked way back in the corner are these two obscure little booths with fancy names that I won't mention, but the sort of multi-billion dollar, you know, legacy data management players that just seem to be overwhelmed in what I call the age of data. And so that's one takeaway that the boom is on. The other one is, and this was just odd, it hit me today when I was sitting there listening to conversations was how international the software industry has become. So those of us who grew up in software know it as a space that was dominated by U.S. players. That's just not true anymore. You listen to the languages spoken there in the events and at lunch and in the conversations next to you. Development centers are in Europe or in Eastern Europe and India and China and Brazil are just, it's a global, and I think that's exciting. So the software industry, since the advent of big data and analytics has become global and it really is, that's a positive development for software. So that was my kind of interesting personal takeaway from Strata this year is how global it's become. So I wonder if you could talk about the Paracel acquisition. So you were part of a, or Paracel was part of an ecosystem or a competitive ecosystem. It was kind of going after terror data, right? I tend to disrupt that. There was certainly Vertica, Green Plum, the TISA were some of the big takeouts. I used to joke it's like the offensive guards when there's a draft and the NFL, one goes and then they all started to go. But Paracel went later, obviously, Astrodata's another one, but terror data actually picked up. Each of these companies, well they're approaching their platforms in different ways. You know, you've got IBM doing its big insights thing, you've got Green Plum now sort of rolled into Pivotal, Vertica, they've got some autonomy assets that sort of mash in together, but they're all doing some kind of platform play. So what is your platform play? Do you have a platform play? Can you talk about that a little bit? What does that look like? How does it relate to sort of the Hadoop ecosystem? It's a very good question. Let me turn back the clock a little bit because I think it helps to give context to the answer before I talk in detail about our Paracel platform. The database industry is kind of interesting. We had sort of traditional OLTP straight databases. Then you had terror data kind of get in the game when the data got so large it was difficult and they understood sort of scale out and clusters. But it was still a row set traditional relational database that was used. I think the first really giant watershed just in the last 10, 12 years was in the TISA. The TISA really broke the chain there. Terror data is more of just the biggest of the prior. The TISA said, geez, you can use an open source database Postgres and you can use an appliance and you can change the economics forever, I think. So then you had a string of players. The TISA, Green Plum, Datalegro, Aster. I kind of lump them together because some of them were geared towards hardware. They were all still geared on row set relational database technology. There are two vendors in that list you mentioned that I think are unique and that is Vertica and Paracel. They're unique because they're each software only columnar analytic databases. And in my opinion, that is the future. And not Postgres, I'm certainly in the case of Vertica, not Postgres, same thing with Paracel. We're from scratch. In fact, the inventor of Paracel, the founder of our company, Barry Zane, was the sort of key co-inventor and founder of the TISA. And developed that for five years and then left that and we like to say, sort of figured out where all the bodies are buried, figured out how to do it right and yeah. Oh, two companies that Stonebreaker didn't invent it. Yeah, three of them. So Barry's a great guy and I think Paracel and anybody who understands that all of the value in future innovation is going to come from the software stack. And so a pure software stack that is columnar and it has to be columnar and that is analytic in nature for analytic workloads because that's the exciting area of IT right now. Operational workloads and systems have kind of been solved. The entire next 30 years is all about how do we build a new generation of analytic applications on analytic platforms. And so to the specific of your question then the acquisition of Paracel is sort of the key stone piece of this end to end portfolio. But it's much deeper than that. We acquired a data flow engine from Pervasive. Data flow is a beautiful technology for data and computationally intensive workloads. Our data flow engine is, we just announced last week that it's available under Yarn so we're a Yarn certified natively executing data flow engine on Hadoop. But it's next generation, it's post map reduce. And so it does fine grain thread level parallelism, enjoys complete sort of no touch that is parallelism scale up in the node and then we use Hadoop to scale out across the nodes that uses pipeline parallelism. So it avoids all of the terrible performance traps of MapReduce. In fact, I think a hidden story here is Yarn is going to liberate the world from the bane of MapReduce on productivity. Yeah, yeah, yeah. You know MapReduce is well acknowledged to be terrible at design time. But the truth is it's pretty terrible at run time also for a tremendous number of workloads. It's completely inefficient. And yet it's still what pretty much 100% of the people use. I think the arrival of Yarn and platforms like our Peruxel platform and the data flow engine are really gonna change the game and allow people to move beyond MapReduce to vastly more price performant and efficient platform. So we have data flow as kind of a backend engine tying together the entire platform. It allows us to capture data at scale. So raw data comes in the front of the pipe. We can move the data through the pipe. We have advanced ETL technology and data quality technology. And then we can move it to its optimal storage tier and that's where the databases come in. We own also a product that used to be called VectorWise. That's the world's fastest single node column or analytic database. We have a kind of a stranglehold on the column or analytic database market here at Acti. And so we can move the data at scale across your Hadoop cluster or any hardware and then deposit that data in whatever the optimal tier is for the downstream analytics that you might wanna do. And so that's why we call the platform the Peruxel Big Data and Analytics platform. It's really for building a new generation of end-to-end data and computationally intensive analytic applications. So you're going back in and that was a great description of your platform. Thank you for that, but I'm still, I struggle sometimes and I'd love your CTO perspective. There's an eclectic mix of platforms in this space. I mean, you've all got, you know, there's some kind of, you know, key modern database. But then there's all kinds of, I mean, you look at what Pivotal has. It's with Cloud Foundry and Spring and you know, those are, are those white spaces for everybody else? Are those just adjacencies that are distractions? Or I mean, what's this ultimately gonna look like that in terms of what customers are gonna adopt? I wish I could pour the data in my predictive analytic machine and tell you. Yeah, it's hard to tell right now. Here's the truth. We are in a highly disruptive period between the old and the new. I call this the interregnum, the period between the kings. It's kind of a wild west. It is a discontinuous break from the past. This is not your father's data warehouse on steroids. This is literally something new. And so we're in year one or two or three or four of a 30 year span and we don't know how it's all gonna play out. The princes are contesting for who's gonna be the new king. They'll be multiple kings, but. Plenty of land, certainly, that's for sure. What's that? Plenty of land to be king on. There is, and this is why, why are there 150 startups and big data? Why are people pouring tens of million dollars into this? This is the gold rush, but it is so early. And platforms like Hadoop, as good as they are, are so immature still and need things like our Dataflow engine to sort of bring them. They need things like yarn. They need things like ZetaSet. They need things that help them mature and become more business ready and enterprise ready. So I can answer the question, we're gonna be the winner. There's gonna be multiple new names. But the truth is, it's a very interesting time. It does present a challenge for customers though because that Wild West makes it very hard for them to go say, oh, this is the obvious answer. Here's a single vendor. Yeah, it's very fuzzy for the customers. It is, and I think there's, you see a lot of best of breed stuff out there now where customers are being forced to mix and match and adopt a little bit of open source, of the commercial and kind of brew it up in their own cocktail. So it's an exciting time, but it is not obvious. You know, it's interesting too. And a lot of the legacy, I call it legacy, the traditional IT spaces, the best of breed or integration of a suite, it seems to be the ladder that's winning in this space because it's so immature. It seems like best of breed is gonna win for a while. It's always like that in young spaces. Look at cloud computing. The ERP, the CRM leaders, the HR leaders, they're all autonomous ISVs out there that then therefore require integration. Hence the Actian data cloud, which is the glue that lets you integrate the internet. But yeah, anytime you have a young, immature emerging space, you always have a lot of best of breed activity and that's what's going on right now. And you know, we think at Actian, one of the differences we can make is we can offer that platform. So we've already stitched together some of the core pieces of technology. The acquisitions we made were very modern software stacks written in the last four, five years, understanding chip scores and clusters, modern patterns of parallelism, so AC John's gonna roll his eyes again after cutting. No, I like it, it's clever. Talk about the, I noticed you guys are branding the hashtag age of data. Yeah, you guys, okay, so talk about what that means. Start with vision behind age of data. You mentioned some of the forward-looking we're not in, we're in a new world. That's right. New industry being built from scratch with all the players now here on the field. What does age of data mean? It's interesting. I talk about the massive revolution that is modern computing. I divide it into the age of hardware, sort of 1945 to 75. The age of software, 75 to 2005. Microsoft was founded in 1975. It's a pretty old company, really. And Google went public in 2004. And you look at companies like Google and Facebook and Twitter, they're not software companies. They use software, but they also use chairs and electricity. Yeah. What they really are are data companies. They're basically collecting data, harvesting the value out of that data, and then renting or selling the value back to the marketplace. And so in the age of data, this is the kind of business that's gonna have high value. We're basically, in our lifetime, living through a transition from an analog to a digital universe. We're instrumenting the universe. Every dumb object is becoming a smart object. And the net result is an almost constant river of data that is flowing at us. You think of a modern transaction. You go to Amazon, you click, you order something. That's a classic business transaction. It took Walmart 60 years to get to a terabyte of those business transactions. Facebook generates that in a week. I was with the large bank the other day. They generate that in a day, just in their firewalls and routers and network logs. Between classic business transactions and then social interactions and click streams and then machine-generated observations, the digital data tap has been turned on and it will never be turned off. The flow of data is complex. It's completely continuous forever. Which is why you see a lot of things in Hadoop around real-time frameworks and streaming frameworks. Which is going to, by the way, obsolete large swaths of the software industry. People who have built their models on static, rigid architectures where you can kind of stop at night and e-tail your data and wait eight hours and then stick in your warehouse so you can give people information that's two hours or two days old. It's over. That's not where things are gonna go in the next step. So do you see the data warehouse and business intelligence markets being flipped around where business intelligence is just the standard analytics? We were joking. Dave and I were joking. The internet created the killer app email. Which we all use. But now the young kids, my sons, like what, they don't even turn on their voicemail on their phones. They're texting and they use social networks. Analytics is the killer app for the data business. In some way or another, there's a lot of things going on in the covers. What's your view on that in terms of the apps that are coming? Are they mostly out of the covers? Is it gonna be just a simple interface, app interface? Well, I wish I knew. Let's give the BI crowd, the business intelligence crowd sort of the history of the last 10 years where they've established this idea that I can turn raw data into information into some meaningful way. But it's still kind of backwards looking. How many widgets did I sell yesterday? And so the key to analytics is really advanced analytics and kind of forward looking and discovering patterns in your data, making predictions. It is very early, but yes, that is where all the value is going to become. I think businesses are gonna start looking at ways to optimize this kind of analytic flow. I think there's really two analytic flows. There's the large discovery pipes. That's what a lot, what you see here at the Hadoop event is about. You know, the last guy who was here from Squirrel, you know, it's really about collecting enormous swaths of data and storing it and investigating and finding the patterns and the meaning in that data so you can make predictions about who's a good guy and who's a bad guy. But the second wave of that revolution will be how do we operationalize those analytics? How do we make analytics pervasive in an enterprise? And that's gonna come with little applets and full blown apps and predictive models that tell you how to price the seat on your airplane or when the maintenance valve is gonna fail in your AC or when you're gonna have a heart attack three hours before you have it. This is the exciting future. So it's very early days, but yeah, I think you're exactly right. Think of business intelligence as now a much larger landscape that includes advanced analytics and those analytics are going to be moved into our personal lives as well, but our business lives and we will slowly learn how to operationalize and carry those analytics to optimize our business processes. Awesome. Age of data, I love it. Final question for you. Put a bumper sticker on this year's the Hadoop World Stratoconference, big data NYC. What is it, what's it gonna read this year on the car as people drive away from the event? So I'll give you the CTO perspective because it's what I'm carrying away from the show. In my presentation yesterday, I presented our new Yarn certified data flow engine and I invited the CTO of Hortonworks, who's one of our partners, we partner with Cloudera and Hortonworks. Hortonworks has really been a driver behind Yarn and we had a great dialogue, Ari Zilka and I around how important Yarn is and both of us agreed that Yarn is one of the most significant milestones in the entire history of Hadoop. It really immature frameworks, there's sort of the Wild West and MapReduce jobs are contesting for space and bumping into each other and so bringing adult supervision, bringing a resource management framework into the game of course is enormously valuable but that's not even the real power of Yarn. Yarn is not just so MapReduce jobs sort of behave themselves and negotiate for resources, it really allows a thousand new flowers to bloom. You can move beyond MapReduce into the post MapReduce world like our data flow engine because now you can bring alternate computation and execution frameworks and engines to bear where the data lives. We both agreed that you have to bring process to the data and where the data lives and the data lives in your Hadoop cluster increasingly but does that mean you should bring sort of extremely inefficient MapReduce takes forever to design, everybody hates it, executes with poor performance? No, what if you could bring something that was a fine-grained thread level highly parallelized data flow engine that did pipeline parallelism, never touched the disk, didn't do MapReduce, MapReduce, MapReduce. You know, the 100 X type performance games you could get would be off the charts but we used people like us who had those engines used to sort of live outside the Hadoop cluster and go, oh, I wish we could get in there and of course now with Yarn, anybody can get in there and anybody who registers themselves as Yarn certified becomes a first class citizen inside Hadoop. This is awesome, Mike, great chat. We want to definitely have you on the roster. Are you based out of the East Coast or were you out of? So Acton's headquartered in Redwood City we're all over the world. I'm based in Austin, Texas. Okay, so you're an easy living down in Austin. It's a great town. Thanks for coming on theCUBE. I want to keep in touch with you obviously to get more CUBE action. We love this topic. This is really relevant. Totally agree with you, age of data is here. Just beginning, the dawn is here and what's really exciting I find personally is that it's a blend of the older systems guys with the new blood and it's a whole new generation this industry is being built and it's really exciting. It's not just jumping into an old market. It's a complete radical new industry. Yes it is. Like the computer industry back in the BC days. It's like young guys are people doing deals at our party last night, handshake deals, doing some biz dev. No lawyer is just, hey, why don't we just team up? A lot of beach head, a lot of real estate to be king on. So we want to get you on some of our crowd chats as well. Get some of this data out there. A lot of great technical innovations, a lot of business model innovations, a lot of people innovations with the connected internet as you said. Perfect storm for innovation. So thanks for coming on. You guys got great strategy, you're geared up, got the acquisitions under your belt, all geared up to go cause some damage. We are, we're ready. The market's growing. This is theCUBE, we'll be right back after this short break. CTO of Acti in here inside theCUBE, talking columnar databases, talking about the future, the age of big data. We'll be right back.