 Live from Boston, Massachusetts, it's The Cube at the HP Vertica Big Data Conference 2014. Brought to you by HP with your hosts, John Furrier and Dave Vellante. Okay, welcome back everyone. You are watching SiliconANGLE and Wikibon's The Cube. This is our flagship program where we go out to the events and extract the Silicon Manoys. I'm John Furrier with Dave Vellante, your host. And our next guest is Raj Yakali, Director of Data Infrastructure at Ed Marketplace here at Live in Boston, Massachusetts for the HP Big Data Conference where all the experts are gathered and talking about what's going on in the industry from a technology standpoint, but also how it's changing the marketplace in terms of creating new products, new value. Raj, welcome to The Cube. Thank you. So I got to ask you, what's the biggest thing that you're seeing in the marketplace right now from infrastructure, software, cloud, mobile, social, big data certainly becoming a big conversation. Anyone who uses Facebook or any social network or mobile knows that their data is being used in some capacity to create a better product experience. Some cases get a good discount on stuff. What is driving all this? Is it really big data? What's the big technology? I think the two things are coming together. One, the technology space is converging in itself where the constraints that we used to have in the past they're not there anymore or the new constraints are coming up which is like the storage and the memory area sectors with respect to the technology they're converging together and that is making us to do more than what we used to do with less. And at the same time, the user requirements we're able to go beyond the past user requirements the new user requirements are coming in which are like for example, we were doing the analytics for a while and then now we are doing the predictive analytics and very soon we are thinking about the prescriptive analytics which is beyond the user is asking for. Prescriptive or predictive? Prescriptive, next level of the predictive, that's what. Yeah, break the difference down, that's really good. This is a nuance but for the data geeks it's kind of like just roll off their tongue. Prescriptive, describe the difference between predictive analytics and prescriptive analytics because these are two hot areas right now where everyone's looking at it from across the board. Customer service, manufacturing, sales, both are relevant. Talk about the difference between the two. So when you talk about the predictive analytics you're actually predicting something based upon what you have, currently based upon previous patterns and what not. And when it comes to the prescriptive analytics you're actually following, you're utilizing all the current content that you have on the user base or the requirements but at the same time you're actually anticipating what is that the user might need at a later stage. Even before they ask or even before they even come to you. So does one use more collective intelligence than the other? Or is it more of a data set discussion? It's a layer above the current level of intelligence. I was talking to a data scientist when we were talking about Costco and you walk into front door Costco and I'm in their demographic so when I go in all I see is big screen TVs and I say to myself I want one because when I walk in all I see is really low priced big screen TVs I want one. So is that prescriptive or predictive analyst where they kind of know my, they create product placement for me without even knowing what I might want? So if you go for a fine grain line between, the thin line between the predictive and prescriptive is like the factors that you know currently about the user. The factors that you do not know currently about that user but you can actually anticipate those new factors that might be created. Can you give an example? Like for example on the predictive analytics today for example if we take the ad marketplace as a place where I actually work for. In the ad marketplace user is searching for a certain keyword or saying okay I want to do this. I want to buy like a cheap wine in New York. He's telling us very clearly that this is what I want. Now I'm predicting okay he wants to buy a red wine. What kind of the red wine is he trying to buy? So I have certain factors already with me and using that I'm actually trying to think about what is he asking for and I'm predicting his user intent. The user intent I'm predicting on. Based on what is told. Is that one core screwdriver or some other accessory kind of thing? Do I need to get him the wine from Napa Valley? The red wine, which kind of red wine? Is it Napa Valley red wine or is it like Chile Cabriolet Sauvignon or something like that? That's where I'm actually predicting. But very closely related. Now again based upon the wine consumption that he is having and depending on the kinds of searches that he's doing I'm thinking about totally unrelated. Like okay is he trying to, is it the time for him to buy an insurance? Or is it the time for him to actually check out his wine consumption might be going up or down? Is it time to go for a doctor checkup? Or is it like, oh his health is going to be much better than it used to be because he's consuming the red wine. So do I have to, is it the time that he needs to buy a new pair of sneakers so that he can actually go jog around and do it? That's prescriptive. That's prescriptive. So are they both prescriptive? So being able to prescribe the type of wine okay as part of it and then there are ancillary outcomes. So I'm interested in the technology behind that because John my wife will go out and she'll always say I want this red wine and I want a peanut wine, I want it to be berry and she'll describe it and some bartenders can't do it or waiters and other ones this is one waiter gets it every time. Oh I get the wine for you. So how are you able to replicate that in technology? One, we actually capture a lot of information with respect to the keywords or the user content and we create the dictionaries out of all this when it comes to the user searches and our marketplace. We have this keyword based searches, keyword based ads serving and at the same time we have this category based ads serving and we also try to figure it out based upon the geospatial analysis and also the kind of traffic source that they're coming from when they ask us about okay provide me this kind of service. So we're actually analyzing all these factors that come together and that's what we combine all these factors together and try to provide them the best thing that would cater to their needs. So okay, so I understand that. And the technology how it comes together is that the big data. We actually get almost half a billion requests per day and along with that there's like two terabytes of data per hour that is being generated as a result of these requests and with all this data coming together now we have to perform the analytics on top of this. So Vertica has one of the technologies that provides us the analytical capabilities to mine these data sets and along with that the new HDFS, Hadoop and all these ecosystems put together and the AWS for the processing computing powers. Everything comes together and this technology plays a big role in analyzing all these data sets and providing the customer with right set of user intent. And you've got to do that. I'll invoke the real time phrase but you've got to do it in real time before the customer goes away. That's what I would consider real time. Yes. Fair enough? Yes. So are you bringing in analytical and transaction systems to be able to do that and allow machines to make decisions before humans can because you can't do it in human time. That's too slow. That's right. So we have two products that we have recently innovated and created in our system which is a BitSmart. What it does is like it performs the price matching and at the same time it's a predictive algorithm system that actually crunches through all these data sets and actually takes into the factor all the variables that are in place to make the decision quickly and suitable to user intent at the speed of light so that we can cater to the needs of the customer in the real time. And on top of that there is an analytic piece of it that actually supports marketers. We have this heavy data savvy marketers, largest data savvy marketers on our user base and they want to have the fine grain visibility into how they are spending their ad money and the ROI towards the ad spends. So we provide all that through with the advertiser 3D and the BitSmart and it utilizes the real time by whenever the click data comes into our system our network that click data is being directly streamed into a Vertica. That provides us the real time capabilities of analyzing all the data through so that our marketers are able to see that in the real time as to what is happening with the clicks and the requests that are coming in from the user base. So the database is Vertica, not an in memory database. We, the pricing and matching is supported by our in memory database. We have an in memory database as well that supports the pricing and matching and then the analytics is being supported by our analytical database which is Vertica. Okay, and the storage is not spinning or is it, right? Or is it flash? Is it, so it's in memory, you got? It's a flash and SSD combinations and to certain data sets we also have the storage as a whole. The cold storage. Cold storage that we don't need the real time insights from. So we have this layered data sets. There are certain data sets that you want the real time insights. So you're essentially competing with Google in a way, right? Were they part of your ecosystem are you part of their ecosystem? We compliment Google's ecosystem. The Google actually has a search engine on their back. So it actually, they take up all that part. And beyond Google and Yahoo Bing, we are the largest search content marketplace. So what we do is we compliment their offers when the ad marketers go on to the Google and so and they are not able to meet their requirements just by the Google. They actually reach out to the ad marketplace and they actually extend their user base and they get new user base that is not available or supported by Google's platform or the Yahoo Bing platforms. And that gets realized because of the search market that is fractured between all these technologies and the tool sets and everything. And now with the mobile space coming into play, the search is everywhere, even inside the apps. There are like 100 apps in our mobile phone or so and each app provides a search mechanism. So all that fragmented search is not on the Google now. It's not on Apple or it's not on the Yahoo Bing or anything. So this is the place we actually come into play where we connect all this fragmented work. This is the new user experience. I've been talking to you, you know I've been blogging about this for a decade. Google's great for web 1.0, maybe 2.0 but now the new user experience is mobile in these new environments. So the query is what they're searching for is much more, less about static content but very contextual based app based. So I got to ask you the next question which is the holy grail right now is along the lines of this predictive prescriptive algorithms and everyone's talking social sales. Oh funnel the data in and we'll trap them into a funnel. We'll identify who the right person is at the right time to sell them a product to give them some value. Very hard technical problem. Share your insights on that whole paradigm. Looking at all this data from either Twitter, social data, LinkedIn, Facebook. I see Facebook's doing very well with their business based on social data. But ultimately it's about matching the exact context. Certainly we were talking about how bad Facebook's retargeting is in an earlier segment and how it's just I go to one website just to look at something now I've seen that ad in Facebook all day long and they say I can't get rid of it. That's one thing that actually differentiates us from any of the other players on the market is we have the user search intent. User is asking us to provide them with certain service. They are telling us what to give them. Only thing is we make it much more simpler by actually doing much more analysis on top of that and provide them with the app. So they know what they want but they don't know all the details that go into what it means because they have to take into consideration of the geo, they have to take into consideration of the market space and the industry, the products that are available. So all that analysis and all those dimensions we actually crunch it through and provide it to the user space. And when we talk about the social space and all those things over there, you have no connection point as to what the user is asking, the user is not asking for anything. You're actually assuming on top of what the user activity is. Based upon the user activity, we are assuming that oh, they might want this, they might want this. So that is a tough problem to know. There's a startup out there that's funded by Sequoia Capital called Mintigo which is doing some really interesting social sales. They use a lot of big data techniques to give customers who use the sales force of the world just a better understanding of their prospects. And it seems like a very hard problem. What's your take on that whole big data application? Because it's very similar to what you're talking about. It's ads in a way, but it's more native to the user. So does the system have the user, do they track and record user activity? No, it's all the user target name and then they aggregate other data source data and try to match the two. So they have maybe your name or Twitter handle or maybe not, just your name and email address and then look at the web and all the social signals and then identify whether you're ready to buy that product. So very interesting paradigm because there's no touch points other than data. So because there are no touch points, it becomes much harder problem. Now they have to create some touch point right there. That creating the touch point is where the big data and all these appliances and everything comes together because it requires a lot of processing power. It requires a lot of time and energy. But with Cloud you can spin up some basically a mini supercomputer. Yes, but that's the complexity that comes with it. It's Cloud provides the processing power and then there is a storage and also your ingrown homegrown mechanics that have to come together along with it. It's going to come right back down to the database problem. So I'm going to go right there. So let's go back to the database. So we were just talking earlier with the VP of engineering here at Vertica. You know, does the database actually go away? Does it get abstracted away at some point in the future where the decision from the application and or the data, if the thesis is put smarts in the data then you should essentially not have to worry about where to store it technically. Where in the past it's been, what's my data schema look like database that will project the syntax into what I can do. So the question is I want to get your take on this kind of interesting time to have this conversation in our industry because you can actually abstract away the database. It shouldn't have to be your only limitation is based on the database. What's your take on that conversation? Is it a good time to have that? Are people actually doing that? Yeah, people are already doing it. Almost all the social space is about no schema, schema less and all that, which means that if you are trying to conform that into the database or schema and all that, which becomes harder to conform, conform all the data sets into the database. And so they were trying to actually get the extract value out of that system, out of the data set even before you insert it into the data set. That's where the whole HP flex zone and everything comes together. You know the interesting thing about things like Spark for instance, David and I were just talking about Spark which came out of Berkeley. I think Databricks, one of those companies up there is that in memory has become a huge deal. Open source obviously is driving it. But the notion of real time has changed the game, right? So we're talking about retargeting. How much does real time make it more difficult and complicated for these kinds of channels to provide real time contextual information to the user at any given time that would match them for. It's the business context. Now if the business has to survive the competition or so the real time really makes the impact. And being able to serve the needs of the customer in the real time and if you are to be in the competition you have to be aggressive and forefront and technology really plays a role at that point and the Spark in memory and even the SSDs and the convergence of the storage and the memory it's all coming together towards providing the real time insights into this piece. It's all about the feedback loop. So you have this feedback loop now you have to tighten it and closer that loop as well because your insights has to be much faster as well because it spurs the innovation later on. If you want the feedback loop is longer then the innovation takes much longer. If the feedback loop is shorter and tightened out your innovation is right on spot on and much faster. So you're talking about timetable now on it's a syncing up timetable to actual data acquisition and analysis, right? Yes. So you don't want to mismatch the cycles. No you don't. You want to really try to be able to match it up together. Can you give an example of how you've done that one time like what that means in context if you didn't do it? Give an example of the benefits of matching up these cycles. Would you mind replacing the cycles? So you mentioned that, okay, time to get the data and then time to analyze it and turn that around if it's longer versus that, what's that? Give a concrete, can you give a concrete example of how that would play out? For example at the ad market place when the user searches for the data set and when the user searches for a certain phrase saying flowers in Chelsea. Now I have to provide him the ad within like some milliseconds if there is, if we don't provide that service with an ad to the user within some milliseconds he's actually getting away from our site right there. He won't be staying there much longer. So now the time cycles that are required for the data that is coming in and you are crunching your data dictionaries that you are captured and you are going around for all the feeds and everything together that has to be much faster. And in order to provide it in the sub-milli-seconds the rest of the pieces in the chain have to be much faster as well. And now that's where the whole in-memory and the storage comes into play which helps us to churn these data sets and get that inside. I want you to share with the folks out there the difference between boring in the cloud, DevOps versus not boring in the cloud legacy technology. And this is more of a broader question. He was seeing successes out there like Snapchat, Uber. He's been great examples of companies that have come out of nowhere and just created billions of dollars of wealth because of doing things differently. Fast, agile, I mean Drop Can was an example. We interviewed the founder at AWS Summit and sold for half a billion dollars to Google with the Nest team. It was a webcam connected to the cloud. Big deal, half a billion, but what he did was he used the cloud to innovate on the storage piece of the value chain which creates significant value for him. This is a whole nother mindset. I call that born in the cloud. So explain to the folks out there who are watching, there's been born in the cloud mentality and non-born in the cloud mentality. Born in the cloud mentality is be ready for the failure from the system point of view, but at the same time now they're trying to utilize these resources. Even though it seems like there are vast amount of resources, there is a cost consideration that comes along with all these resources that are available. So they have to really think about how to do more with less, even though they are born in the cloud. So that is one mentality when it comes to the legacy environment also. Yeah, you have those resources for granted, but at the same time while this agile mentality is coming in, it's changing both sides of the world. The born in the cloud, they already come with the agile mentality and the DevOps mode and the same thing. And the legacy pieces are actually very quickly catching up with same methodologies. So now even if they have the hardware and the infrastructure in premise. You mentioned the converging infrastructure. You know, the other thing that was talked about this week was BuzzFeed's $50 million financing at an $850 million valuation. And what that really means is that's basically a social media list post company, top 10 reasons to do something. It has generated so much traffic because of the channels that they distribute the content in. Now making a serious run at the New York Times, these big publications, that's an example of this new model of full stack kind of company. So this notion of integrating silos together is an interesting DevOps problem. Dave, we were talking about this earlier. It's like the way people rethink these new business models is they say, hey, you know what, I'm just going to do it differently. I can optimize in a vertical or full stack basis and create value. Well, a lot of people are trying to figure out, okay, you know, there's a share shift going on, right? Amazon's growing at 60, 70% a year. It's going to be two, five billion in revenue this year. You know, there aren't a lot of enterprise IT companies doing that. So people are saying, okay, is that the new model? Many people believe yes, it is. But at the same time, you got HP, IBM, EMC, VMware, Cisco saying, well, we can replicate born in the cloud on premise. And we have a more complex problem because we have application portfolios that are very diverse. The average age of an enterprise application is 19 years, et cetera, et cetera, et cetera. So people are trying to figure out, us included, how that's going to shake out what's your opinion? You know, as a practitioner in that world. It's a cycle of innovation. There are two factors that are coming together. The one is user requirements and user base that you have with your product. And the other side of the piece is there is that fragmentation that is happening as new companies are born and the new methodologies. There is also the repetition of that same work happening at all these companies that are out there. So what AWS is doing is, AWS and the whole Amazon and the whole thing is, they're capitalizing on that repetition of the process that is occurring across all the ecosystem. What they're saying is, come to me, you will be, we will provide you with all the API and you will be able to use it exactly alike and we can provide you the platform. So that is one side of the picture where the convergence is coming together and all these best practices and the different companies are coming together. And the other side of the picture is, you have a customer base and you have a user base. By going to the AWS, you either have to go with a specific infrastructure, specific instances and specific memory or so, which is almost equivalent to your hardware internally because it's a reserved instances versus not. So you have to cater to your user base requirements. There are certain user base requirements that within your use cases that may not be able to cater when you are in the public space. So you are going for average performance. There is a possibility of that. So that, in which and what cycle of the growth phase you are in that may dictate or that may guide towards whether you want to be in the public space or even in your own space or not. And what is that you want to get out of that? But now I think you hit on it, repetition of the process. Amazon essentially is able to package that process in their marginal economics of deploying infrastructure incrementally goes to zero compared to doing it every time. That's right. Whether you call out a third party or whomever, you would think conceptually that over the next 10 years the Amazon way overwhelms the traditional way. Why would it not? If we want, that is a very good question, but if you take a step back and try to analyze that same aspect, the user search as it was before. So you have this created content on your publisher space and now it is fragmented out into the web space again. But now it is, usually before it was fragmented in this you created content, now it is fragmented within the mobile space. So you have a totally different ecosystem created with it. But it has followed the same cycle. Now it comes to this technology and the proliferation and as well as the consolidation. Yes, it is evolving. Now there are new set of challenges that are going to come in together, which is yeah, now that AWS is a solution for the repetition that is occurring in the system and now we are getting in there. Now there is a new level of user requirements that are coming to come, they are going to come into picture where there is again that fragmentation. You provide the average kind of service when you are in the public space because you have, you may get similar service as your neighbor is getting. Now that may be okay, but that may not be okay. When it is not okay, now you are again coming back. It goes to a certain extent and then beyond that certain extent if you are growing and growing and growing and beyond, there are internal inherent challenges that needs to be taken care of either by Amazon or even by the customer who is on Amazon. So that incremental challenges are they able to align those incremental challenges with the solutions that are provided by the provider? If they can provide it, yes, but then again as the scale of that system goes larger and larger, they won't be able to cater to each and every personal need or each and every organizational need on this big ecosystem. So that's where it again comes back. But it just, but there is an innovation space that gets created at that point where you are not really tackling the similar problem but you are tackling a different set of problems. Right, right, and you've got the scale to do it. Yes. Raj, thanks so much for coming on the queue. Raj, Jackalee, Data Architect. I think I just followed you on Twitter. You Raj underscore architect on Twitter. Okay, I just followed you. Love to get you involved more in the conversations. This is really something that we're passionate about if you're available. We'll ping you on Twitter. We'll do some crowd chats if you're cool. We're going to do a predictive analytics chat this Friday. This is an area where all the innovation is happening. Certainly Data Science is well documented. Wall Street Journal had a great article this past week on Data Science, but where the action is really going is really more programming, computer science and really good stuff happening around data. I think what you're doing is really the center of it. Congratulations, great to have you here on theCUBE. To hear from the experts, the tech athletes, this is theCUBE. I'm John Furrier, Dave Vellante. Go to crowdchat.net slash hpbigdata2014 and join the conversation. We're having a crowd chat right now. That's our new engagement container. Watch that. We'll be right back after the short break and we'll take questions from Twitter on our next guest. Take care.