 Okay, we're back. This is the spotlight on data science and big data. This is Dave Vellante of Wikibon.org and I'm here with my co-host Jeff Kelly. And we're live at EMC World 2012 and the theme of the show is transformation. We've talked a lot yesterday about cloud being the IT transformative piece but really it's data is transforming business and it's all about packaging information, monetizing information, getting value out of information. That's the business transformation. Of course there's also a big transformation of skill sets and EMC's talking a lot about that but today we're talking about really the business impact, the data, the value of the transformation. We're here with Steve Hillian who's the Chief Product Officer of Alpine Data Labs. Steve, welcome to theCUBE. Dave, nice to meet you. Great to have you on and so Chief Product Officer, I asked you off camera, is the product data? And you said yes. Talk about that a little bit. The product really as you were saying is sort of the insights coming out of that data. I think increasingly organizations are either turning their data into value or worrying that they may not be doing that enough. They've got these mountains of data that are piling up, all the traditional data that they've been getting out of their transactional systems but now increasingly machine generated data, web behavior, just piling up in the sense of how do we make the most value out of that? How can we use that to really understand our customers better? And that's really what we're about, sort of turning that into deeper analytics than may have traditionally been done in the past, getting real value. So what does Alpine actually sell? So we sell an application that allows you to do predictive analytics on really large quantities of data without having to set up a massive new infrastructure. In fact, what you can do is you can download our product literally off the web, point it at the source of your data, typically going to be sitting in a relational database like Green Plum for example, and then just start doing predictive analytics. I think traditionally people have thought about predictive analytics is something that's difficult, hard, expensive. You have to hire a team of PhDs. Build a model. Build the model, refine the model, test the model. You have to go back to the well to get more data and that's the slow process. It's cumbersome, right? I mean, it typically takes you like six months to produce like a churn model or a new product recommendation model. And for us, we want to make that like radically simpler, really spin down the amount of time you spend producing those models by going straight to the data source and doing the analytics where the data lives. Okay, but so do you have the capability to essentially on the fly build that model within the database? Yeah, that's right. So this actually came out of, that's exactly what we do. Sounds like magic. Well, it certainly took a lot of hard work and in fact, I can't claim all the credit ourselves, this actually came out. A lot of early work that was being done at companies like MySpace and Amazon and Green Plum itself actually, and also some academic work that was happening at Berkeley under the leadership of Professor Hellestine there, Joe Hellestine who's sort of expert in really big database. Yeah, we've had Joe on theCUBE. Oh, great, yeah, yeah. Fantastic guy, very bright, obviously. And really sort of saw this, had this insight based on his interaction with industry and with his team that you could actually do analytics directly where the data is sitting. In a sense, Hadoop is an inspiration for that, right? Because Hadoop is not just about storing and retrieving data, it's a computational platform. And it's like the database vendors over time have gotten better at doing more complex calculations. And so they were thinking, can we actually do these more complicated sort of analyses where we're building models and scoring them directly in the database? And so we've built a company around that. So talk a little bit more about why in database predictive analytics. I mean, what's the real appeal from a value props standpoint? Well, I think a big thing for me that sort of really inspired me when I first heard about Alpine, got involved with them at the founding and decided eventually to join the company was that they just made the whole thing so much easier. So I had been involved in many analytics projects early for the last decade where the process of getting the data and refining the data, building the models and iterating and so on, it wasn't even iterative, right? I mean, it's just like highly waterfall, highly static sort of one-shot model development. It's like, I hope this works. I hope this is the right data set. And if we need to go back to the well, it's just too painful. And what Alpine was doing, working with Green Plum, because there's actually an early spin-off from Green Plum, is going into customer sites and said, just point us at your data and we'll find something interesting, like this afternoon. Instant ROI. Instant ROI. I mean, I remember the first time we used it, so this is when I was working with my data scientist team, actually using Alpine. I loved it so much, I joined the company. I went into a telco, which had no data scientists, right? Never done churn models or advanced analytics before. And literally by the end of that day, we just took the source of that data, did the analytics directly where it sat, and we had churn models, pretty decent, not like maybe production level, pretty decent churn models that they could actually use. You could see things immediately, trends that you could act on, you're saying. When we talk about churn models, you're talking about customer churn. Yeah, yeah, in telco, that's a big thing, right? Because people are constantly shifting their allegiances in terms of cell phone providers, service providers. And so predicting whether that's going to happen, obviously things like your account ending, your account date, your subscription ending is going to be a big predictor of that, but they're more subtle predictors, like ratios of text messages to voice and how often user of the weekend sort of declines in usage and whether you're talking to people who themselves have recently- Pricing device availability. Did you get a spike in your brain? Yeah, timing of product availability, if you're sprinting, you don't have the iPhone, all of a sudden you have it. I mean, there's so many factors that- That's right. There's a complicated matrix there. Hugely complicated. Often you'll have hundreds if not thousands of variables and going into these sorts of models. And one interesting thing is that you, then you need to go to the business, right? You need to go to the business and say, what are the variables? What do you think it's like to predict journals? So one of the secrets of Alpine as well is that because it's all exposed to a nice simple UI, but you got all this power of the data infrastructure behind it, but a very simple UI is that the business can get involved as well and so what if we try this variable and actually start clicking around it themselves? So it becomes a sort of collaborative environment for people to do analytics together and to maybe get business analysts and people, sort of what we call aspirational data scientists, get them involved in the process as well. Because it's so easy to get it up and running, it's a little less scary. So churn models is obviously one use case. Are your algorithms specifically tuned toward churn models or do you have other use cases that can be applied more generically? Yeah, it's definitely a general tool right now. I think for an early stage startup like us, it can be useful to fix onto certain verticals. We're typically aimed at sort of finance, healthcare, retail, entertainment is another big one. Places like we are in now often do a lot of analytics in sort of Vegas to sort of target their customers better. But we've done stuff all over the map. I mean, we most recently did something in the healthcare industry where we're predicting patient outcomes like the event of a cardiac arrest happening based on patient profiles. Went in and just did a very sort of quick simple proof of concept about that. You know, building product recommendation engines. We've done stuff where we've looked at the electricity grid or actually looking at smart meters and being able to detect fraud or detect likelihood of outages or vegetation growing in an interrupting service just all over the map. So that's kind of fun, building something very general. Are the models didactic? Are they self-learning? Do they get smarter over time? That's a really interesting point. I think one of the problems with the traditional analytics process because it takes so long from end to end, by the time you get to the end of it, the idea that you want to refine this, I'm exhausted, I want to go home. Whereas if you can, you know, the goal, I think the goal for an analytics project is you rebuild your models every night. If you, I mean, if anybody out there right now, if you're not building your models every night, there's something wrong with your process. This is really important. Because this is a completely different mindset, right? We used to, okay, at the end of every month, I'm going to put in a little bit more data. Maybe one, two, maybe 10% more data. Model is a God model and you don't want to change it and you add to it and it gets layered. And you're saying, no, no, throw it away. Absolutely. Start over. It's like the lessons from the Agile movement, right? From the Agile software movement, which is build early and test often and just constantly refine things. I mean, the number of analytics projects that I've seen that have really gotten stalled because you're aiming for perfection, your R-squared has to be so high and your mates have to be this good. And so screw that. You want simple, interpretive, functional models. What do you want? You want the perfect model that predicts with absolute precision in six months' time, once all your customers have left anyway, or they've decided all your product portfolios have changed or customer behaviors have changed, or do you want a model that's pretty damn good right now? And that's what I think people need to be aiming for. Just across the board is just a different approach to predictive analytics. Yeah, pretty damn good or even good enough to act upon. If you can act upon and extract value, then that's pretty damn good. Yeah, that's right, that's right. So what about changing mindsets? I mean, because that is really not the way, as we talked about, traditionally people look at