 At Big Data SV 2014 is brought to you by headline sponsors WAN Disco. We make Hadoop Invincible an Actian accelerating Big Data 2.0. Hey welcome back everyone we're here live in Silicon Valley for Big Data SV event. I'm John Furrier the founder of Silicon Angle and we are covering all the action in Big Data, Stratoconference, Innovation, Entrepreneurs, Startups, Growing Companies, VCs. We have them all here inside the Cube. The Cube is our flagship program we go out to the events extract a signal from the noise. I'm John Furrier with Dave Vellante co-founder of wikibond.org and John Santa Ferraro is here product marketing Actian. Welcome to the Cube. Thanks glad to be here. So Big Data is growing like crazy valuations are good we've been talking about you know the emerging companies and some of the ups start to try to get a position here a lot of new names in the pavilion Stratoconference so my first question is here in product marketing so you have to look at the market they also understand what's going on the engineering side of things you guys have a nice platform play that you guys are putting together and growing on so I got to ask you what is this big data movement of 1.0 to 2.0 let's discuss this next chapter we are transitioning to Big Data is growing up. Yeah it's definitely growing up and I think there's a number of things in the marketplace in general the global marketplace that are driving it so when you think about it we we all know that the world has changed and everything is now digital everything is either a pixel or a bite of some kind there's trillions of you know events being captured every single day there's social data being poured into the world so everything everything is digital so we know the world has changed what's striking is that the world is going to continue to change at a never increasing pace and so when you think of the rate of change the way things are moving now it used to be that you could transform yourself as a company every five or ten years and now you have to transform continuously you have to innovate constantly to be able to keep up with the speed of change so the only thing that we know to be constant is change itself and that so as a result of that you get you get all that speed time becomes the new gold standard so time is the one thing that you can't get more of you can only use it more wisely or speed things up and so there's this need to innovate constantly to be able to transform yourself over and over again and to be able to do it at ever increasing speed and so that's what's driving this shift from big data 1.0 where where things sort of sat in a big data reservoir and to big data 2.0 where there's this need to operationalize big data and push it towards that that real time interaction time is money I mean time is the new new scarce resource right and that's what you're saying so is that is that big data 2.0 I mean I remember back in the day was asking about web 2.0 earlier when we do an interview on crowd chat and web 2.0 is always the elusive what does that mean so what does in your definition what is big data 2.0 yeah I mean it rolls out of big data 1.0 right so if you go if we go back and look at what's happened to date big data 1.0 has had some great accomplishments so obviously we've proven that you can that you can store massive amounts of data affordably and you can scale that to no end we as a result of that all of these companies have captured these massive amounts of data regardless of whether they believe it has value or not it used to be you had to establish value and then you could keep the data now you just keep it all and then there's there's this consensus I think that as as all of this data has has poured into the data reservoir companies have naturally begun to do data discovery trying to figure out what's in there it's commonplace to be doing data discovery it's common place to for companies to start doing data services and data provisioning out of that reservoir where does the data need to go what needs to happen to it what does it look like and then the the consensus around that whole big data 1.0 movement was that we have a lot we have a lot of data and analytics is going to be the key to unlocking the value in that data so big data 2.0 is about making that shift to the point where we now take these these big data projects that are stuck in the lab and move them into production it's about increasing the time to value figuring out what value there is increasing the time to value and then and then being able to to operationalize the big data and use analytics to take the understanding of what you have there and embedded in business processes to move it towards real-time operational engines to to be take these you know very sophisticated and complex analytic models and use that to provide intelligence into the streaming analytics systems that you know I always I look at it this way you know the there's there's a lot of talk about streaming analytics and because of the nature of physics streaming analytics will never be smarter than about third grade so what that third grade streaming analytics needs is they need the PhD from the heavy lifting analytics models that are being crunched you know by these by these you know relational analytical systems to be able to feed that PhD level intelligence into the third grade system and bring it into a show how to beat the third grader the PhD against the kindergarten so so John what was the good of 1.0 and what was the not so good you know the good I just described so the good was massive massive scale at affordable cost it was capturing all the data it's learning to do data discovery and data provisioning it's it's this realization that analytics is going to unlock it the challenge was is that that that people hit the wall in certain areas and and there's some challenges that are people people are facing and big data 1.0 that keep those projects locked in the lab so for one there's the some of the complexity of putting together a big data system a lot of it is still open source the open source is always changing there really is pieces parts that you have to put together to to build a big data system today the second is just the complexity all of all the new data what do I do with these new kinds of data how do I bring it in how do I parse it what do I need to do to be able to run analytics against it and then skillset shortages as a result of that right so there's a shortage of skillset in terms of the number of people that can do Java programming and Python programming that know how to you know and understand how to use map produce a shortage of all the data science skills that are needed to abstract the value out of that data and then there's some enterprise challenges you know there's challenges around high availability security things like that and so so these challenges combined are the kinds of things that have kept folks in those those big data projects in the lab and kind of stuck in big data 1.0 yeah and and the other byproduct of that is unclear ROI for a lot of companies now having said that I was listening to John's interview with John Schroeder from f bar and it was interesting John to hear him just talk about the number of companies that are really doing substantive work and so it's interesting to know when you talk to the broad audience I would still say most people don't really understand what most organizations don't know where they're going to get ROI from big data broadly yet there's pockets that are crushing it right and so now two dollars comes along and presumably what that's going to do is you know it's the diffusion of technology or adoption right so so what are the sort of new technical you know parameters requirements architectural you know innovations of 2.0 that we should be paying attention to so so in 2.0 the thing that is going to unlock that value of you know the project stuck in the lab is going to be the the new technologies that are emerging now that do a number of things one they really need to abstract away the complexity of what lies underneath the system what where is the data stored how do you access it what kind of programs do you need to write second it's got to be accessible to the masses so not everybody's going to grow up to be a data scientist there's going to be a shortage and so so these new technologies have to provide ways for the business analyst to be able to to get at the massive amount of data there without having to be a programmer they have to provide ways for them to use analytic functions without having to be a data scientist they have to be able to provide accessibility not just to the the big data platform and the massive data reservoir itself but to the kinds of platforms that can crunch these analytics to drive business value you know so one is abstracting away complexity to accessibility for the masses third it's it's got to be performant it's got to be higher performant the other sort of challenge that I that I didn't mention is some of the performance challenges of a batch oriented file system that that can't crunch through massive algorithms at the kind of speed that companies need to crunch through and so so the you know the ability to have this high performant access to be able to process that data at increasing speeds and then the last one I think is this ability to combine relational and non-relational so the the you guys were at strata in New York a few months back and you remember Ken Rudin from Facebook got on stage and his first slide said big data does not equal Hadoop and everybody sort of gasped right and his second slide said big data equals Hadoop plus relational and so here is a company that you know that was on the forefront of creating Hadoop and they've hired Ken Rudin because he's an analytics expert and he understands relational and he can bring in that high performance analytic processing that they need to be able to do some things that they couldn't do on the Hadoop platform so it the other thing it's got it you know there's this this combining of relational and non-relational technologies in a in a way that abstracts that complexity away from the end user so it all becomes much more usable so that you know today a lot of these a lot of the companies that have figured out they they were able to they were able to go out and find the resources to write all the programs they were able to find the data scientist to do the heavy lifting work well we're running out of those people and so in big data 2.0 to make this available to the mass masses we have to industrialize it and make it simple for more people to get access to the data. Let's talk about that because one thing we always talk about it at our big data NYC event in New York City with Strata was there you mentioned that the big trend was data simplification because the knowledge worker now is basically an analyst soon to be casual user and I think the trend is what you're saying is vectoring down the road of stuff just happens under the hood magic happens you know automagically as they say so so that we're not there yet but we're getting there so so talk about that trend and what has to happen to get there well the good news is we actually are there in in part right it's so not we haven't simplified you guys with everything Acton so the the Acton analytics platform gives you the ability to choose from hundreds of operators that that sit in a visual framework so you can literally be a business analyst and drag and drop the the pieces and put them together into a into a data flow or a workflow and without writing any code without having to understand map reduce or how a Hadoop cluster is put together and where the data is distributed our platform would literally just go figure out where the data is what resources are available and send the workload down to the HDFS cluster right on the node where you get the best performance and bring the results back so no Java no Python no map reduce you literally just use drag and drop operators and all the work is done on the HDFS node so that's that's exactly the direction we're going okay so that's you got so you got the abstracting complexity just address some of that I want to come back to accessible to the masses because we haven't talked about it much this week John but the other sort of big data events that we've done we've talked about making analytics accessible to the business user you know you pointed out John not everybody has a data scientist of the big big shortage of data science skills so let's talk about that a little bit part of that is visualization you know we've done kind of conferences that like for instance Tableau and you know they talk about the viz and a lot of business users not a lot of geeks you know in in that audience a lot of geeks but they're not tech IT technology geeks they're more sort of data geeks right and visualization geeks so visually visualization is part of that but I wonder if we could unpack that a little bit more in terms of of analytics for the masses or big data for the masses is that really happening what has to you know take place how is acting and supporting that yeah so it's it's a great question I think that that there's an part of it is an educational piece that has to take place and part of it is technology that allows the the business user to do that right so the that drag and drop interface that I talked about includes not just transformational kinds of things but machine learning algorithms and and text analytics and things of that nature in our high performance engine we embed over 500 analytic functions right in the database and so all you have to do is call it and attach it to some data and it runs at a highly performant speed and so our task as Actian is to go out to our customers and our prospects into the broad community and start educating them about analytics 101 so these business analysts by nature of who they are a lot of them have you know have some kind of a mathematics background they have MBAs they understand statistics so they don't have to become a data scientist who creates and writes algorithms they just have to understand what the algorithm is and how you use it against the data and so we're looking at rolling out an education program that says you know here is a you know we could take them deep into K means cluster algorithms but why not give them a simple pattern match algorithm and let them apply it to the data and tell them how to use it we did this we went sat with a customer was a retail customer it was a business analyst he had never done any kind of you know significant analytics we plugged in this this out you know this this very simple pattern matching algorithm that runs in our database and with we sat there for 30 minutes looked at real data and within that 30 minutes we started to uncover fraud in their stores and so in a 30 minute time period that business analyst started to learn how to use the analytic function he'll probably never never write a matching algorithm of some kind but analytics 101 is about helping people understand how to use the algorithm not just not have to be able to create them and then let's talk about performance so you see innovations like spark and yarn is that sort of what you're referring to are there other uber trends that are facilitating better performance into data yeah I mean I think spark and and yarn are absolutely on the horizon memory is where it's at and I I divide the market into two different categories when you look at memory so there's a number of old vendors who are taking their old technology and trying to shove it into memory and it runs a little bit faster and then there's the the next generation vendors who understand that that it's all about putting things in memory and we've actually created software that takes full advantage of that memory and optimizes in fact to run things in memory so one of our customers currently for the Actian analytics platform has a is running 50 terabytes in memory and they've got 50,000 customer attributes that they track they have 10,000 customer segments so they're doing hyper segmentation and they're recalculating that massive analytic 20 times a day and then they feed it into their ad optimization engine so that they can so that again that's the the PhD is the heavy lifting hyper segmentation the they're given that PhD understanding to their their three year third grade third grade operational ad optimization system and doing this amazing personalization of ads so in memory is here so we can wait we can we can wait but there's a there's a lot of in-memory capability today on analytic with analytic engines that is already in use and ready to go so I gotta I gotta follow up on the old vendors trying to stuff stuff stuff into memory so let's talk about some older legacy so the company so there's three that come to mind SAP Oracle and IBM IBM's announced blue acceleration and pushing them in memory SAP Hanna you know the SAP's Hanna crazy and really you know jamming that down everybody's throat that will that will listen or even that won't listen and then you got I Oracle responding to SAP saying hey we smoke SAP we got our in-memory addition are those three examples you know Vysal Shikha would say Sikha would say you know told me new architecture you hear a hasso plotter talking about that you know certainly the IBM engineers talk about blue acceleration are you talking about those types of vendors different vendors and are they sort of overstating their capabilities relative to what the 2.0 guys can do give us some insight I think there is some over overstating I mean I can give you an example where you know we were we were in a bake-off against Oracle and they had a query that was running 46 hours on their regular database we put it into our memory optimized platform and it ran in 30 seconds so 46 hours down to 30 seconds had a guy from Oracle stand up in a presentation and say well we could you know we could have brought our consultants in and we could have put it into our in-memory system and our customer looked at him and said yeah you probably could have done that and got it down to about four hours but we need the 30 seconds and so it's an example those are real bake-off yeah this was a bake-off okay so one has to wonder why they didn't bring in their consultants and put it into in-memory and you know at a certain point the customer realized it's not worth it right so let's go to the start of it right why wouldn't Oracle do that maybe it wasn't ready yeah probably wasn't but okay so so you're talking about you see me translate what I'm hearing your premise is that the old line guys are basically making incremental improvements and and essentially bolting on their their and charging a premium into into memory architectures and charging a premium and that's and Hannah's kind of this new animal but it's kind of from an old dog if you will no offense and you're saying that the new line guys are taking advantage of in memory in different ways natively you know born in memory if you will okay and that's a order of magnitude delta and optimized for memory right just even even looking at vector processing and being able to you know to look at the registers within the chip and apply the apply the code so that when you compile it it runs against that and and and then looking at parallelism with every measure of parallelism right parallelism to cores or or processors or whatever it is and then taking even node processing and breaking that down further so that this this concept of parallelization is is the next generation of being able to fully accelerate everything so it's a it's a new way of approaching technology I want to ask your perspectives on something I said a lot of times when we talk about love talking 1.0 and 2.0 and in this world because it sort of helps us mentally put you know things in a box oftentimes 1.0 is a is a pejorative in the companies that are associated with 1.0 end up being you know losers and the 2.0s are the winners you know you think Friendster versus Facebook right okay but it looks like the purveyors of 1.0 Hadoop are making the transition to 2.0 is that a fair statement I think it's a fair statement I think the there there's a there's a couple of approaches there too so I was I heard a hallway conversation here at strata I was walking by and and one engineer said to another engineer it seems like we're we're creating databases all over again from scratch and I thought wow well yeah that's that's exactly what you're doing and because so there's a couple approaches one is to have this this great Hadoop technology and realize that you need SQL access to it so you you design and build a brand new database and you put it on the HDF plus HDFS cluster well I mean having done this a couple of times now it takes about 80 to 100 million dollars in about four or five years to bring a date to bring a real relational database to market and to have it mature and hardened and ready to go to market and so what approach would be take that take that technology and build a new database to sit on it the other would be take a very mature analytic database a column or MPP database all right and and art rearchitect that to work right on the HDFS cluster and I think that there's a there's a maturity advantage that you get there in in bringing those two technologies together and that's the track that we're looking at how do we how do we get our our analytic platform closer and closer to the HDFS cluster we're doing it already with the you know with the concept that I described you know with the data preparation and machine learning where it runs on the HDFS cluster the next phase is going to be to figure out how to how does the database run right where the data lives on Hadoop so one last challenge question to Acton so you guys I mean by the very nature of this discussion the inferences that Acton is 2.0 of course right you got a thought leader here talking about 1.0 and 2.0 associated with Acton you guys made some acquisitions par excel pervasive vector DB and you're bringing those together what makes Acton 2.0 so I think what makes Acton 2.0 is the fact that all of the technologies that we're bringing together into the Acton analytics platform our next-generation technologies they're they're optimized they're fully parallelized world-class optimization on every component within the platform to get increased performance the fact that we're that all of the products and the entire platform is architected to abstract away the complexity and make it simpler for people to interact with analytics and data and it's all highly performant I mean everything that we do I mean I work with the engineers in Campbell here and when I'm in there I every time I look at them they're finding one more way to optimize it they're like the they're like supply chain optimization analysts from from a UPS or a FedEx and they're looking at from the point when the data comes into the system to the point when it the result spits out they're looking at every little movement of that data and how can I make it faster and and they're excited when their little piece of that that you know data supply chain suddenly gets faster and that's their job is to make that piece even faster so it's it's fascinating it's really okay job I want to give you the final word as he comes to the end of the segment summarize for the folks watching what's it what's it all about this year at big data Silicon Valley big data SP with the Stratocommerce going on all the innovation summarize for the folks out there what's it all about this year I think it's about driving value out of out of the investments that have been made there's a lot of money that has been spent to date and everybody that I'm talking to is very concerned about finding a way to quickly get to value how do I how do I not only figure out what I have in the data but how do I move that into operational systems where it's going to create value for my business and that is the big push I heard a statistic from one of the analyst firms that said that 60% of everybody who is implementing big data today is still trying to figure out what the business case is so they made the investments they need quick time to value and my perspective is that that's how we that's how we built our platform to help people get to value more quickly can't think to value more quickly this is the cube we're all about value fast getting the data to you expecting to see this from the noise I'm John Furrier Dave Vellante be right back with our next guest getting down to end of day two here at big data SV be right back