 Hi, and welcome back to theCUBE. We're up here at Oracle headquarters talking to some customers and Oracle partners about some of the challenges and opportunities associated with the rapidly evolving big data universe. And right now we're talking to Karl Rexer of Rexer Analytics, good Boston fella. And we're going to talk a little bit about some of the new ways that people are using predictive analytics. Absolutely, great. Welcome to theCUBE, Karl. Thank you. So, tell us a little bit about Rexer Analytics. Small company founded in 2002 and we focus on providing predictive analytics solutions to our clients. And those clients can range from small startups to big companies like PWC and ADT. So, lots of companies across all different industry sectors and we help them to predict their customers behavior whether that be what product are they going to buy next or which customers are at risk of leaving. A lot of people know what predictive analytics are but just for the context sake, tell us a little bit about what it is with an IT, the next question is going to be and how does it get done? So, what is it? Well, really predictive analytics is using historical data. Like, much of our clients, our clients, sorry, many of our clients, like I said, are taking their historical data and they want to predict something about their future. They want to know which of their clients are going to buy that product or which of their clients might close their account so that we might be able to take preventative action, reach out and see how we can better serve them and retain that client relationship. So, customer retention is big but you need to be able to have historical data to see, well, what are the patterns that we see in the past that help to identify who's at risk and who's not at risk and what differentiates those customers. Once we can use that historical data to find the patterns, we can then assign those that scoring algorithm to your current set of customers to identify which among your current customers are at risk of leaving or which among your current customers are more likely to buy that next product. So, when we think about predictive analytics, what we're talking about first off, it starts with the idea of there's an event that the business is concerned about. Absolutely. And then once the business working with the various parties understands what that event is and the value of that event, now they invoke you to take a look at historical data, mine it. Right. Munch it, all the other things. Absolutely. Take us through that process a little bit. Yeah, well, one of the first steps in that process is often trying to identify what is that target variable in your data that's most closely aligned to the business problem. So, if we take the example of customer attrition, we've done a lot of customer attrition work with in the banking sector. And you realize that actually, even identifying what the right target variable and the data to use to address customer attrition is not a very obvious thing. When customers are leaving a bank, they might have multiple accounts, they might close some of them, they might stop their transactions, leave something open for a long time and only months later, actually close the account. So, working with the customer's data to explore and define what's gonna be the best target variable to use to assign a customer has left. Is something that we work on first. Once we've defined what that target variable is gonna be, we then look to see, well, what type of predictor variables are available in the historical data that we might be able to use to help to predict that target. So, again, if we think about customer retention, we might be looking at the call center records. We might be looking at the number of accounts or number of transactions, the size of the transactions, the balances in the accounts, all these sorts of things in the historical data are potential predictor variables. And we'll look now in historical data say, which of those potential predictor variables actually are related to the target to help to differentiate the loyal customer from the customer that's about to close their account. So, you're using, for example, it might be that the customer actually chooses to leave the bank six months earlier than they actually close their account and you can identify that because they've stopped transactions against that account. For example. Yes, or we see drops in the transactions. They might not have stopped entirely, but they used to be a customer that used to transact at one high sort of rate. And now they've stopped or they might sort of the number of checks written, the number of ATM transactions, if we're talking about banking. But this also applies across many other industries. We've done these similar sorts of models to look at which students are at risk of dropping out of college. And there we could see that, yes, there's many different sorts of factors. It might be their grades, but it also might be how far they live from campus. It might be other factors like, are they a first-generation college student compared to coming from a family where all the parents were also college educated? So all these different things can work to predict a behavior. So it could be behavior like closing a bank account, but it could be deciding to put solar panels on your roof. It could be deciding to drop out of college. All sorts of these different actions can be predicted if we have the right data. And the predictions are not gonna be 100% accurate, remember. What we're doing is trying to say for businesses, who's at risk or who's likely to do that thing? But now we still might be wrong a lot of the time, but if we identify a set of customers where we can say 40% of these customers are gonna close their account in the next three months unless you do something, compared to another group of customers where only 2% are gonna close their account, that's now of great business value to the company because they know where to focus their retention efforts. So we're looking for the problem. We're looking for the action that is associated with the problem. We're looking for the variables that best predict the action. And now we're doing scoring work to actually construct the model that allows us to turn this into an operational approach to doing different things differently. Absolutely, because we will often have hundreds of thousands or millions of rows of data of customers and we'll have dozens or probably hundreds of possible predictor variables. And we look to try to figure out, well, what set of those predictors are working to get the prediction? So it might come down, might boil down to about a dozen or so variables that end up being predictive. Now, like you said, we need to put that into a scoring algorithm where it calculates a score. Oftentimes we do this on a one to 100 scale. The people who have closer scores to 100 are likely to do the behavior and those who have lower scores are less likely to, but we need to now implement that so it's a formula that can be run. And this could be an R, it could be in SQL code, SAS, SPSS code, whatever it might be that the company is using to calculate the score so that that can be done in an automated way every night, every week, every month, however often a business needs to produce a new score. Now, a lot of data-oriented applications don't age gracefully. That's true. So tell us a little bit about how often you have to go back in and refresh the model, retest the model, rejigger the model to make sure that it stays current relative to the business problem and the events that the business is interested in. Right, and we would call that model management. And we see that many companies don't have good real procedures in place for systematically managing their models. By the way, we would call that model management, too. Yes, yes, and many people do, but it's really just an area that's increasingly getting attention, where it's deserved attention for a while, but it's only really now starting to get attention as people are having somewhat mature, starting to have maturity in their predictive modeling capabilities for their company. So they now start to see that, hmm, should we just have a rule of thumb and sort of refresh everything after six months or a year? How often do you need to refresh that predictive model? And sometimes the rules of thumb are okay, but actually I would argue that people need to put into place more systematic ways of checking to see are the predictive models still predicting? So maturity typically, or often maturity turns into practice and practice can turn into a tool. What are some of the tools that are starting to evolve to do a better job of supporting these model management tasks? Actually, I don't have any particular tools that I'm aware of that I'd say, I see companies using to really do that. So I think there's a good opportunity, good space there for some tools that can really help with that. I see companies really doing it in an ad hoc sort of way right now. So it really comes down to the data scientists and companies rely on their data scientists to say, hey, every so often, whether it be every month for rapidly degrading models or every few months or a year for a model that might be more stable, to say, hey, check that model, see if it's still performing okay and do we need to refresh it? I don't see companies doing that model management in any systematic way. And that's obviously an opportunity. I think it is an opportunity. Yeah, absolutely. Great, well, thank you very much for joining us on theCUBE today. Happy to do so. Then Ken Rexer from- Carl Rexer. Carl Rexer from, sorry. Yes. Carl Rexer from Rexer Analytics based in Boston. Once again, you've been watching theCUBE. We've been talking a little bit about how to build predictive models and we appreciate your joining us from Oracle Headquarters in Redwood City, California.