 Hi, welcome back. I'm Jeff Kelly with Wikibon. For those of you that have been tuned in, you've been watching our coverage of HBase conference from San Francisco. John Furrier, our colleague from SiliconANGLE, is there covering all the action. So this ties in very well with what we're about to talk about, kicking off our big data and data science focus for the day. So I'm joined today by Anika Jimenez, who is the Senior Director of Analytics Solutions at Green Plum, and Kashik Das, Principal Data Scientist with Green Plum. Welcome. Thank you. First time on theCUBE. Thank you. As Dave, our CEO always says, the first time it will go very gently on you. Yes, thank you. Thanks for being here. As we get started, why don't you tell the audience a little bit about yourselves and kind of your role at Green Plum? Sure, yeah. So at Green Plum, I started about a year ago, actually coming off of a fairly extended tenure at Yahoo, and started at Green Plum really to help kind of build out data science as a service to our customers. And that meant starting to work with really great, existing people on the team, like Kashik, our principal data scientist who've been spending a lot of time thinking about how to really build predictive models on top of the Green Plum stack. And have done a lot of work to scale up the team, productize services, think about solutions delivery, all around much more advanced forms of analytics than what you would typically see in BI, right? Right. So I've been with Green Plum for a couple of years. That's when Green Plum realized that they have a really nice tool for dealing with big data. But you just can't give someone who's used to working with a nice power tool and expect them to suddenly start using it, right? It could be a little dangerous in that, exactly. So that's where the data scientists like me come in. So because using Green Plum with big data is not just about answering your questions faster and better, but actually being able to ask new questions. Right. And that's where we come in, because we see this happening across different customers and different verticals. So we are able to work with the customers and actually help them ask the right questions and solve those and give them the right answers. Right. So you mentioned, yeah, it's about asking questions versus kind of, you mentioned the more traditional BI approach. Contrast that a little bit. From my perspective, the more traditional approach to BI was much more looking back. Why did something happen over the last several months or several weeks or however long? And there's been talk for years in the BI industry about moving to predictive analytics, but it hasn't really happened in the traditional world. So how does the mindset differ? And how is the approach different from your traditional BI environment? So I'll take a first crack and then you can jump in. And I think what we find is customers have been, obviously like you said, been thinking about BI for a very long period of time. And they've primarily manifested BI through some sort of reporting layer with usually a set of analysts that can then take some of the more custom and queries against the data warehouse to go a little bit deeper against some core questions. And I think a lot of our customers are realizing that they could take and leverage the data assets that they have within the company in a much more profound way than through simple reporting solutions. And that starts introducing into the equation a completely different paradigm of compute and distributed computing requirements to support that, especially when you're entering into the bigger data realms when you have terabytes and petabyte scale data stores and you're actually trying to do very meaningful predictive analytics on top of that data that then introduces a need to start thinking about a completely different paradigm when it comes to actual execution. So it's a very different kind of workload than what you would see in a more operationalized process around supporting BI. It's development-like, it's exploratory, we're going in and working with the data, asking and answering questions and maybe going further, all while trying to yield what ultimately is a model that can actually be put into production, right? Again, some top priority business use case. And of course, like Kaushik said, there are just untold numbers of ways that you could apply the data from a marketing perspective all the way through to risk modeling, IT related analytics, et cetera. So one way of thinking about it is that it's not just asking what happened at a very deep level of detail. It's actually going beyond that and asking why did it happen. And then once you know that, then you can answer the question, how do I prevent that from happening again? Or how do I make it happen more? Yeah, yeah. Depending on the case. So you mentioned, you know, your role as kind of data science as a service was just really intriguing to me because we hear a lot about the kind of dearth of data scientists out there on the market. So walk us through a little bit, a typical customer engagement when you go on site. And how does that work when you're, I'm guessing in a lot of cases, engaging with customers that don't have their own data scientists on staff? Yeah, so it's an interesting, we're at this really just fascinating period of time, right? Because I think across all industry or verticals, if you want to think about the vertical markets, retail, finance, healthcare, energy, et cetera, we're finding that our customers are just awakening to this understanding of kind of this, there is opportunity and value in the data that they've been cultivating maybe over years. And they're realizing that again, with the right harnessing at the compute layer, they can actually extract much more value out of that data than they have in the past. So we very frequently get involved early in the process with the customer. They've come to enough of a realization that they want to at least reach out to Green Plum, but very frequently that quickly turns into a situation where we actually have to walk them through almost like a analytical brainstorming session, right? So we actually bring stakeholders to the table. We come to them with a vision for how they could leverage their data. We talk about use cases across the stakeholder communities that they're working with, and we actively work with them to prioritize and assess value of those use cases. And so we do all of that in an actual engagement that customers kind of urgently need that kind of support and help. And then of course, once we've done that, they're then ready to start thinking about execution and then we're there to help them with that as well. So is that an onsite process or do you guys do that? No, yeah, we actually remote in remote VPN into their Green Plum environment. Of course, once it stood up in their on-premise in their data center, then the whole team can kind of do a lot of the work. And of course the paradigm is we're moving from kind of desktop based analytics to in database analytics, right? And so that's where we're able to leverage the billions of rows of data in this terabyte, petabyte scale warehouse to yield much higher degrees of accuracy and predictive power than ever before. So Kishok, are you trying to, when you go into a typical customer engagement, are you trying to essentially, in addition to helping your customers kind of get off the ground with their first implementation, are you also trying to train them to be self-sufficient or is this more of a long-term engagement where you're looking to build a long-term consulting relationship with the customer? No, and as I was, that's actually a really good question. And as I was thinking a little while ago, our goal is definitely to teach our customers how to fish rather than continue to be their fish catcher, so to speak. So and continuing what Annika said, to add a little bit to that, framing the right questions is really very important and getting all the stakeholders in the company together is also very important. And once that happens, what we do is actually teaching the technology, which actually is not really the most difficult part because it's actually a very intuitive technology. So to give you an example, analysts do logistic regression all the time and they're used to doing it on the desktop and something that would take 20 hours maybe before. So you just start the model, you go and you come back the next day. So with our platform that comes down to minutes, like a couple of minutes, and that really expands the amount of things that you can do, right? And so it's kind of a, it's a revolution, right? It's not just a matter of degree and therefore it's very useful for us to come in and it's sort of, it's more of getting people used to a new way of thinking about their problems and their data. So we have some customers that are actually asking us to not just come in, so, you know, the short answer is we want to teach them how to fish, right? So we want to do the knowledge transfer that's required to fully kind of leverage this, this kind of strong horsepower that they're getting in green plum. The, and very frequently what we're seeing now is we have customers that in essence are saying they're committed to data, they're committed to data science and predictive analytics. They realize that they have gaps internally which we're beginning to help themself for, but they want to get to the point where they actually have what they might call their own data science center of excellence. And very frequently they need ongoing help from us to kind of help build and operate and then ultimately we hope kind of transfer ownership back over to the customer. Okay, so let's dig into some real use case, some examples of some customers you've worked with. What are some of the more interesting kind of engagements you've had that you could share with our audience? So, like I said, we have use cases across sectors and they can range from kind of what is, for people who are actually in Silicon Valley are very used to hearing about data usage in the digital media space, right? So it can be working with digital ad performance data, user level ad interaction with web contents to do profiling of users to inform behavioral targeting. Those are very common use cases. But it goes much further than that, extends immediately into some of the newer areas where each of the sectors have their own kind of mini data revolution emerging in terms of some sort of ubiquitous nature of data. So you have healthcare where you have a regulatory environment that's changing incentives to now start thinking about much more meaningful outcome-based analytics. Looking at treatment of patients through treatment pathways and understanding if their progress through those pathways is anomalous of some sort or not. Which of course then starts enabling you to optimize pathway traversal, identify fraudulent activity, et cetera. We were going to talk a little bit about one specifically within the utility sector, which is also really interesting. So in the utility sector, we are working with several cases, particularly there is one customer of our Silver Spring who are one of the builders of the smart grid, and therefore they have access to all this data, every household and business, you know how much power they're using every 30 minutes or every 15 minutes. So what you get essentially is a pulse of economic activity across a city or a region. And that is very exciting, but going beyond that, now you can answer questions like, how do I prevent a blackout? And it's not just how the blackout happened, where the grid started tripping, but it's also what were the sequence of events that happened before the blackout? And then going beyond that, you can now set up predictive warnings. Like if that sequence of even seems like it's happening, again, let's set up a warning, let's take action before it can happen and prevent that. And that sort of thing is very exciting. And there are lots of cases like that, it's not just blackouts, it's about theft prevention. Or if a tree falls, and then you want to know really quickly what has happened so that you can take action, you can send out the crew, and before this would take a long time to do, and it would be very costly and inconvenient for people as well. But now all that goes away. All right, that's some great examples of new use cases that you couldn't do in the traditional BI world where you'd be looking back and you might say, oh, we recognize that this customer basically stole from us two months ago. Well, guess what? He's long gone. He's long gone. They usually took a BI analyst weeks to get access to the data. They're running some analysis on their desktop, and maybe a month or six weeks later, they're saying, ah, we found some theft. As opposed to actually getting much earlier in the curve and identifying it, maybe even in the terms of an outage before it actually happens, right? So that's, it's very exciting, that kind of pattern recognition and anomaly detection, which has applications to a lot of different kind of actual use cases like theft detection, outage prediction, treatment optimization, et cetera, is endemic, it's happening across all the verticals. There's an opportunity to apply this notion of pattern recognition across all the different verticals, across different types of data. It's all very exciting. In terms of, so certainly across verticals, but are you seeing a lot of interest from maybe the smaller and mid-sized organizations versus the large enterprise? We hear a lot about large enterprises who have the resources and a lot of data, but SMBs also have a lot of data. They may not have a lot of staff or revenue, but they certainly have a lot of data and could benefit from some of these approaches. So you're seeing what's the level of interaction or engagement from that, the SMB community and where do you see that going? So I guess I would say, you know, even at the large enterprise level, the data science revolution is really just starting. So remember I told you a little bit about how we come in and kind of help them establish a vision. I think we are needed for that support because they're really just awakening to the possibilities that are on the table for them. What we're finding is that there are some companies that play very pivotal roles in support of the SMB community that are themselves thinking about how to build analytics as a service to the SMB community. So you have, for example, credit card transaction firms, firms that kind of are managing the processing of credit card transactions. You can imagine that that data is very valuable because it gives insight down to user level transactions across any sort of retail location, and that can scale in size from very, very small to much larger, right? Those transactors, those companies that kind of manage the process of approvals and doniles, et cetera, of credit card transactions are in a very interesting position to take that data and add analytics on top of it that's very supportive and helpful to the small and mid-sized business community. So I think that's where I would see the next logical step. I don't know that I'm seeing a lot of readiness to think about advanced analytics from the SMB community, per se. So I think in terms of the SMB community, what needs to happen first is getting just the access to all the data, and it's not just the data that they have which can often be big data, but also data that's available out there which is accessible to them. For instance, for a small retailer, I mean, even for a medium-sized retailer, not Walmart, there is still a lot of information in just the US Census data, for instance, and people are still not using that, but this is what we feel strongly is going to happen very shortly because technology like ours makes it very easy to access the data, but people are still thinking of the situation a few years ago when it was a lot of work and expense even to get that data and look at it, but that is no longer the case. Yeah. And so do you find yourself doing a lot of education in that regard? Yeah. Explaining some of the possibilities and now the new tools and hardware, et cetera, advances that can now allow you, much easier access to storage and compute, et cetera, to start performing some of these big data and older. Absolutely, and that's why the brainstorming session that Annika referred to, which often starts our engagement is really important because people might have heard of big data or they might have one felt need, but they often don't realize the whole universe of information and insight that's awaiting them, right? So I guess it's really interesting, we actually had a graphic, I don't know if they're able to highlight it right now, but the idea of this graphic is simply that it is illustrating that big data analytics, we're at this period of opportunity now in the market where anyone enterprise can much more centrally harness big data analytics. And the use cases are, like I said, kind of ubiquitous across stakeholders. So yes, you have the CMO with marketing analytics and related kind of adjacent air opportunities, which is very traditionally understood, but you also now have the CIO thinking about how to mine IT logs. You have the CFO thinking about risk and compliance and much more robust models for risk than they've ever been able to achieve in the past. You have the COO thinking about call center optimization and prediction of the nature of incoming calls and routing them appropriately to minimize resource constraints, right? So I think the ubiquity of use cases against available data has grown to the point where the opportunity for a company is to actually use data as a source of competitive advantage. And when we're having conversations with companies in this brainstorming session, or even just casually, in the past year, there's been numerous situations where you can almost physically see the light bulbs going off in their heads, you know? When- Yeah, it's like, it's what makes our job really fun, because we're actually trying to be a little bit of the catalyst to that, but it's definitely happening more and more. And, you know, the most important ingredient really is the willingness on part of the organization to believe that they want to make their organizations ready for predictive analytics. And if that willingness is there, then the rest, everything falls into place because all the other pieces are already available. Exactly, yeah. Some great advice, some great insight. Appreciate you guys joining us here on theCUBE. I thank you so much. Thank you for your pleasure. So we'll be right back in just a few moments with more big data and data science coverage here, live from EMC World on theCUBE, and we'll have more coverage also from HBase a little later on in the day.