 Live, from Midtown Manhattan, it's theCUBE. Covering Big Data, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. Okay, welcome back everyone. Live here in New York City, this is theCUBE's special annual presentation of Big Data NYC. This is our annual event in New York City where we talk to all the thought leaders, experts, CEOs, entrepreneurs and anyone making the news, shaping the agenda with theCUBE. Conjunction with Strata Data, which was formerly called Strata Hadoop. Hadoop World, theCUBE's NYC event. Big Data NYC is separate from that, and we're here with Jeff Dees, who's the Chief Marker of the Actian. Cube alumni, formerly with HPE, been on many times. Good to see you. Good to see you. Well, you're a marketing genius. We've talked before at HPE. You've got so much experience in data and analytics. You've seen the swath of spectrum across the board from classic, I call it classic enterprise, to cutting edge, to now full-on cloud AI, machine learning, IoT, a lot of stuff going on. On-premise seems to be hot still. There's so much going on for the large enterprises dealing with how do they better use their analytics at Actian, you're heading up to marketing. What's the positioning, what are you doing there? Well, the shift that we see and what's unique about Actian, which has just a very differentiated and robust portfolio is the shift to what we refer to as hybrid data. And it's a shift that people aren't talking about. Most of the competition here, they have that next best mousetrap, that one thing. So it's either move your database to the cloud or buy the suppliers or move to this piece of open source. And it's not that they don't have interesting technologies but I think they're missing the key point, which is never before have we seen the creation side of data and the consumption of a side of data becoming more diverse, more dynamic. And more in demand too, if people want both sides. Before we go any deeper, I just want you to take a minute to define what does hybrid data actually mean? What does that term mean for the people that want to understand this term deeper? Well, it's understanding that it's not just the location of it, there of course is hybrid computing, which is premise and cloud and that's an important part of it. But there's also about where and how is that data created? What time domain is that data going to be consumed and used? And that's so important. A lot of the analytics, a lot of the guys across the street are kind of thinking about reporting and analytics in that old world way of we collect lots of data and then we deliver analytics. But increasingly analytics is being used almost in real time or near real time because people are doing things with the data in the moment. Then another dimension of it is ad hoc discovery where you can have not one or two or three data scientists but dozens if not hundreds of people all with copies of table and click attacking and hitting that data. And of course it's not one data source but multiple as they find adjacencies with data. A lot of the data may be outside of the four walls. So when you look at consumption and creation of data the net net is you need not one solution but a collection of best best solutions. So hybrid between consumption and creation. So that's the two hybrids. I mean hybrid implies a little bit of this, little bit of that. That's the bridge that you need to be able to cross which is where do I get that data and where's that data going? Great. So let's get into acting and give us the update. Obviously acting's got a huge portfolio. We've covered you guys in the past. I've been on theCUBE many times. They've cobbled together all these solutions that can be very effective for customers. Take us through the value proposition that this hybrid data enables with acting. Well if you decompose it from our viewpoint there's three pillars that you kind of needed since the test of time in one sense. They're critical. Which is the ability to manage the data. The ability to connect the data. The old days we said integrate but now I think basically all apps, all kind of data sources are connected in some sense. Sometimes very temporal. And then finally the analytics. So you need those three pillars and you need to be able to orchestrate across them and what we have is a collection of solutions that span that. They can do transactional data. They can do graph data and object oriented data. Today we're announcing a new generation of our analytics specifically on Hadoop. Is that vector H? And that's vector H. With Spark. Love to be able to talk to that today with the native Spark integration. Let's get into the news. So the hard news here at Big Data NYC is you guys announced the latest support for Apache Spark so with vector H. So acting vector in Hadoop hence the A, vector H. What is it? Is Spark a glue for hybrid data environments or is it something you layer over different underlying databases? Well I think it's fair to say it is becoming the glue. In fact we had a previous technology that did a yeoman's job at doing some of the work that now that we see Spark in that community. The thing though is if you wanted to take advantage of Spark it was kind of like the old days of Hadoop. It was assembly was required and that is increasingly not what organizations are looking for. They want to adopt the technology but they want to use it and get on with their day job. What we have done. Which is machine learning, putting algorithms in place, managing software. It could be very exciting things such as predictive machine learning next generation AI. But for every one of those there's an easy dozen if not a hundred uses of being able to reach and extract data in their native formats. Be able to grab a parquet file and without any transformation being analyzed. Or being able to talk to an application and being able to interface with that. With it being able to do reads and writes with zero penalty. So the asset compliance component of databases is critical and a lot of the traditional Hadoop approaches pretty much read only vehicles. And that meant they were limited on the use cases they could use it. Tell about the hard news. What specifically was announced? Well we have a technology called Vector. Vector does run just to establish the baseline here. It runs single node, Windows Linux and there's a community addition. So your users can download and use that right now. We have Vector H which was designed for scale out for Hadoop. And it takes advantage of yarn and allows you to scale out across your Hadoop cluster. Pentabytes of data if you'd like. Well we've added to that solution is now native spark integration. And that native spark integration gives you three key things. Number one, zero penalty for real time updates. We're the only ones the best of our knowledge that can do that. In other words you can update the data and you will not slow down your analytics performance. Every other Hadoop based analytic tool has to if you will stop the clock fresh out the new data to be able to do updates. Because of our architecture and our deep knowledge with transactional processing, you don't slow down. That means you can always be assured you have fresh data running. The second thing is spark powered direct query access. So we can get at not just vector formats. We have an optimized data format which it is the fastest as you would find in analytic databases. But what's so important is you can hit orc, parquet and other data file formats through spark and without any transformation be able to ingest and analyze information. The third one and certainly not the least is something that I think you're gonna be talking a lot more about which is native spark data frame support. Data frames are... What's the impact of that? Well data frames will allow you to be able to talk to spark SQL, spark are based applications. So now that you're not just going to the data you're going to other applications. And that means that you're able to interface directly to the system of record applications that are running using this lingua franca of data frames that now has hit a maturity point where you're seeing pretty broad adoption. And by doing native integration with that we've just simplified the ability to connect directly to dozens of enterprise applications and get the information you need. So would you, Jeff, would you be describing what you're offering now as a form of data sort of a data virtualization layer that sits in front of all these backend databases but uses data frames from spark or am I misconstruing? Well it's a little less a virtualization layer as maybe a super highway that we're able to say this analytics tool doesn't have, you know in the old days it was one of two things. Either you had to do a formal traditional integration and transform that data, right? So you had to go from French to German, once it was in German you could read it. Or what you had to do was you had to be able to query and bring in that information but you had to be able to slow down your performance because that transformation had not occurred. Now what we're able to do is use this Spark native connector so that you can have the best of both worlds. And if you will it is creating an abstraction layer but it's really for connectivity as opposed to an overall one. What we're not doing is virtualizing the data. That's the key point. There are some people that are pushing data, cataloging and cleansing products and abstracting the entire data from you. You're still aware of where the native format is, you're still able to write to it with zero penalty and that's critical for performance. When you start to build lots of abstraction layers, truly traditional ones, you simplify some things but usually you pay a performance penalty. And just to make a point, in the benchmarks we're running compared to Hive and Paula for example, where use cases against vector H may take nearly two hours, we can do it in less than two minutes. And we've been able to uphold that for over a year. That is because vector in its core technology has calmer capabilities and this is a mouthful but multi-level in-memory capability. And what does that mean? You ask? I was going to ask but keep going. I can imagine the performance latency is probably great. I mean, you have in-memory that everyone kind of wants to get that level. A lot of in-memory where it is used is just held at the RAM level. And it's the ability to read data in RAM and take advantage of it and we do that and of course that's a positive. But we go down to the cache level. We get down much, much lower because we would rather that data be in the CPU if at all possible. And with these high performance cores, it's quite possible. So we have some tricks that are special and unique to vector so that we can actually optimize the in-memory capability. The other last thing we do is, you know, Hadoop and HDFS is not particularly smart about where it places the data. And the last thing you want is your data rolling across lots of different data nodes. That just kills performance. What we're able to do is think about the co-location of the data. Look at the jobs and look at the performance and we're able to squeeze optimization there. And that's how we're able to get 500, sometimes an X is 500 times faster than some of the other well-known SQL and Hadoop performances. So that combined now with this Spark integration, this native Spark integration means people don't have to do the plumbing. They can get out of the basement and up to the first floor. They can take care of advantage of open-source innovation yet get what we are claiming is the fastest Hadoop analytics database on Hadoop. So I got to ask you, I mean, you've been, and I mentioned it on the intro, industry veteran, CMO, chief marketing officer. I mean, challenging with acting because there's so many things to focus on. How are you attacking the marketing of acting because you have a broad portfolio. The hybrid data is a good position. I like that. Bringing that to the forefront to kind of give it a simple positioning. But as you look at acting's value proposition and engage your customer base and potentially prospective customers, how are you iterating the marketing message, the position and engagement with clients? Well, it's a fair question and it is daunting when you have multiple products, right? And you got to have a simple, compelling message. Less is more to get signal above noise today. At least that's how I feel. So we are hanging our hats on hybrid data, you know? And we're going to take it to the moon or go down with the ship on that, but we've been getting some pretty, pretty good feedback. What's been the number one feedback on the hybrid data? Because I mean, I'm a big fan of hybrid cloud, but I've been saying it's a methodology, it's not a product. On-premise cloud is chain growing and so is public. So hybrid hangs together in the cloud thing. So with data, you're branching to worlds, consumption and creation. Well, what's interesting when you say hybrid data, people put their own definitions around in an unaided way and they say, you know, with all the technology and all the trends, that's actually at the end of the day nets out my situation. I do have data that's hybrid data and it's becoming increasingly more hybrid and God knows the people that are demanding wanting the user using it are doing it. And the last thing I need and I'm really convinced of this is a lot of people talk about platforms. We love to use the P word. Nobody buys a platform because people are trying to address their use cases but what they don't want to do it in this siloed kind of brick wall way where I address one use case but it won't function elsewhere. What they are looking for is a collection of best fit solutions that can cooperate together. The secret source for us is we have a cloud control plane. All our technologies, whether it's on-premise or in the cloud, touch that and it allows us to orchestrate and do things together. Sometimes it's very intimate and sometimes it's broader. It's a job scheduler or what exactly is the control plane? It does everything from administration. It can do down to billing and it can also be scheduling transactional performance. Now on one extreme, we use it for a backup recovery for our transactional database and we have a cloud-based backup recovery service and it all gets administered through the control plane. So it knows exactly when it's appropriate to back up because it understands that database and it takes care of it. It was very relevantly simple for us to create. On the more intimate sense, we were the first company and it was called Acti and X which I know we were talking before. We named a product after X before our friends at Apple did. So I'd like to think we were pioneers. And Cisco had the iPhone. Don't get confused there, remember. Oh yeah, I got to give credit where credit's due. I had to give it up. But what Acti and X is, and we announced it back in April, is it takes the same vector technology I just talked about, so it's material and we combined it with our Ingress Transactional Database which has over 10,000 users around the world. And what we did is we dropped in this high-performance columnar database for free. I'm going to say that again, for free in our transactional pro from system. So every one of our customers, as soon as they upgraded to now Acti and X, got a rocket ship of a columnar high-performance database inside their transactional database. The data is fresh. It moves over into the columnar format and the reporting takes off. Jeff, to end this segment, I'll give you the last word. A lot of people look at Acti and I'll see the product I mentioned earlier. Is it product leadership that's winning? Is it the values of the customer? Where is Acti and winning for the folks that aren't yet customers that you'd like to talk to? What is the Acti and success formula? What's the differentiation? Where does it jump off the page? Is it the product? Is it the delivery? Where's the action? Is it innovation? Well, let me tell you about, I would answer with two phrases. First is our tagline. Our tagline is activate your data. And that resonates with a lot of people. A lot of people have a lot of data and we've been in this big data era where people talked about the size of their data. Literally, I have five pentabites. You have six pentabites. I think people realize that kind of missed the entire picture. Sometimes, smaller data, God forbid one terabyte, can be amazingly powerful depending on the use case. So it's obviously more than size. What it is about is activating it. Are you actually using that data so it's making a meaningful difference? And you're not putting it in a data pond, puddle, or lake to be used some day like you're storing it in an attic. There's a lot of data getting dusty in attics today because it is not being activated. And that would bring me to the, not the tagline, but I think what's driving us and why customers are considering us. They see we are about the technology of the future, but we're very much about innovation that actually works. Because of our heritage, because we have companies that understand for over 20 years how to run on data, we get what asset compliance is. We get what transactional systems are. We get that you need to be able to not just read but write data. And we bring that methodology to our innovation. And so for people, companies, animals, any form of life that is interested in data that's going to do things. It's the product platform that activates and then the result is how you guys roll with customers. In the real world today, where you can have real concurrency, real enterprise grade performance, along with the innovation. And the hybrid gives them some flexibility. That's the new tagline. That's the kind of main, like I understand you correctly, hybrid data means basically flexibility to the customer. Yeah, it's use the data you need for what you use it for and have the systems work for you rather than you work for those systems. Okay, check it out, Actian, Jeff V's friend of theCUBE, alumni now. The CMO at Actian, we follow on your progress. So congratulations on the new opportunity. More CUBE coverage after this short break. I'm John Furrier, Jim Kobielus, here inside theCUBE in New York City for our big data NYC event all week in conjunction with Strata Data right next door. We'll be right back.