 From Orlando, Florida, extracting a signal from the noise. It's theCUBE, covering Pentaho World 2015. Now your host, Dave Vellante and George Gilbert. Hi everybody, welcome back to theCUBE. I'm Sam Cahane with George Gilbert and we're here with Tim Garanto, Senior VP of Infrastructure and Information Systems at Edo. Thank you for joining us today. Thank you for having me. So Tim, a lot of excitement here. I know Edo has a lot of great things going on. Can you tell us a little bit about your company? Sure, so Edo is a startup. We're about eight years old, so kind of old for a startup, but still in that startup phase. We are in the card linked offers market. So we help financial service institutions put credit cards and offers and coupons and discounts onto the credit cards for their consumers. We're headquartered in Nashville, actually dual headquartered in Nashville, Chicago with Office in London. In London, very nice. So were you here at the conference last year? Yes, I was. How have you seen the conference change? You know, for a first and second year conference, these have been amazingly well run and well put together. I've mentioned it to the Pentado folks last year, the same thing, just wonderfully run. This year it's a little bigger. I think the sessions are more focused obviously on the new technology. So a lot of excitement. It's just been a great conference. It has and I think they've grown by 40% this year, so that's great year to year growth. Pretty soon, this will be a 20,000 person conference. Right. So you're the senior vice president of infrastructure and information systems. Can you tell us a little bit about your role at Edo? Sure, so as senior vice president of infrastructure and information systems, I have a responsibility over all of our hardware infrastructure, our command center who manages all of our internal processes and interfaces with the banks when they're occasionally, when there's an issue, they have to get involved. And then I also have the data services team. And those are the people in our company that work with all of our data sources to bring it in to get the data ready for analysis or reporting. So they spend a lot of time working on the data, getting it ready, as well as running our, what we call it a targeting application, the application that allows us to figure out which person will respond best to which offer. So let's talk about these offers. Everyone's familiar with range of offers, all the way from that wonderful experiment called Groupon to the, I'm walking into Starbucks and they're like, okay, we know your favorite drink is such and such. We offer you a discount or whatever. Can you tell us sort of where in that spectrum you fit in terms of enabling your partners to make these types of offers? Yeah, so we, I'm not sure how to say where we fit. Maybe that's the right spectrum, but yeah. So we work with merchant partners like a Starbucks or Red Robin hamburgers and we will then take an offer construct from them. Hey, somebody we want, so the company will come to us and say, you know, our average spend in our store is $20. We'd like that to be 25 because that's a common metric is to try and increase the spend. And so we can say, okay, well we'll provide people with an offer, say $5 back on a $25 spend at a restaurant. And then so Red Robin's happy because they're getting more spend in their stores. And then we identify within our population. I keep saying it that way, but we have absolutely no idea who anybody is. It's all tokenized and anonymized. But we can look in there and say, okay, who's likely to buy at a faster restaurant or a quicker restaurant? And how do you come up with that? Who's likely to buy? We have a data sciences team out of our Chicago office that's very, very smart. And so they go in and produce models that indicate who's likely to be interested in this offer. And we do that based on transaction history. So we don't necessarily know anything about the people other than what their spend has been. So no demographic, no demographic. It's just, and the transaction history, I assume is not at the item level, but at the merchant level. Wow. So these scientists must be rocket scientists. They're really, really good. Okay, so they identify some group, some subset of this population. What happens next? Well, as they go through it, the offers get ready to be pushed out. They really all just get scored against an individual. So we call them an individual, whatever that unique stream of data is. So everything comes up with a score, a propensity score on it. Like likelihood to buy this offer, whatever. And then that will get what we call pushed out into our production database. It's an H-based system to run on our dashboard services or in a mobile app, for example. So the consumer, when they log into their banks portal or the portal application that we host, they will see the offers in terms of the propensity order. So the one that is most relevant to them is on top. And we don't actually not give any offer to any individual. That's a fairly recent addition to our platform that we rolled out is that we don't stop offers from going to people because using Pentaho, using Cloudera, we can really just put every offer out to any person. But it becomes important because our business model is that we only get paid when the consumer does something. So it's a pay to win, if they don't do it, we don't get paid and merchants not happy. So we have to make sure that they see the most interesting one for them. Okay, that much is clear that you would want to promote the most interesting, but it also sounds like there's no cost to just presenting one that's a low score, that's low probability of being taken. Is that partly why you've chosen to show everyone at least one deal? Well, we show it. So we show them the deals that we think are most interesting too. Right, and so yeah, somebody may log in and that week or whenever there were no super interesting deals, so they'll get one that's maybe not as interesting. But what we also found is that people travel and so if I limit offers based on, say, geography, I'm in Nashville, but I traveled Orlando, I traveled to Chicago, but one of the models, one of the propensity models says how close are you to a place that you could use this offer? So if you're 500 miles away from a store, then you're probably not likely to use it, but if there's 10 of them on your drive to work, you're probably likely. But what we found is that people do travel, so if I'm in Chicago, then I will see Chicago based offers because that's where I'm located. And even though the score, that propensity may be lower because of where my normal transaction spend is, I will still be able to then see them and take advantage of those offers in that market. Okay, so maybe it makes sense to dive a little bit lower into so what's the machinery that makes this work? And it sounds like your part of your job is to say, sort of make sure the gears don't grind, yes. Tell us some of the pieces of this machinery. Almost all of our data work, data processing work is in a Hadoop cluster. We use cloud errors distribution. We use the Pentaho tool, the PDI tool for initial ingestion of the incoming data files. So when we get files from our different partners, those files all come in different formats. So we use PDI to take that data in, standardize it, and then present it into what we call our transaction matching system. That's the system that says, does this person, does this transaction match any known offer and is it available to this particular person? And if it is, then that is what generates what we call a redemption to send back the cash. Let's unpack that a little bit. So you bring in from different information sources and then exactly how do you take those apart to say a transaction would be appropriate for this offer or a transaction in the past matches someone who would be likely to use this offer. How does that part work? So after we've processed all of those files on a very automated basis, all that data then goes into our data warehouse. It's again, part of that Hadoop cluster and gets, because they all come from different partners, it's all, we load all of the raw data and then we have an internal, an interim step that takes that raw data and conforms it to a common format, we use PDI for that. And then the final step is for a PDI job that takes all that. And PDI pentaho data integration. Yes, pentaho data integration. Takes that and then transforms that data into a final output for either analysis or reporting. So it's a three step process to get that data from the raw data that we get into the cluster and then ready for our data scientists to do their work or reporting. And then it's the data scientists who really come up with the models and the likelihoods. And so we've got, the way we've structured it is we have a number of different models, kind of if you think about almost micro models. And are they being sort of tested against each other to see which one's more accurate? Well, in this case, what we do is yes, yes and no. So we have a geographic model that says how close are you? We've got branding models that say, is this good brand? Is this a bad brand? And so there's a lot of different models that run and then the scores are aggregated to decide an actual propensity scoring. Interesting. Yeah, it was actually at this conference last year that they were giving a talk on modeling and the best way to do it in Hadoop. And it's interesting, but a lot of small models that are then brought together because of the parallel process in nature of Hadoop makes a lot of sense. As opposed to trying to do one massive model to get everything, you do a lot of small models. Oh, very interesting. So Tim, we have 20 minutes till the keynotes so we're running out of time here. So if you're going to leave the viewers watching and everybody here with one takeaway, what would that takeaway be? No pressure. Yeah. Wow, I didn't know where to go. I know we went over a lot. Maybe how does Pentaho make that process easier on top of Hadoop? If so, how? Yeah, Pentaho has definitely made that process easier for us on top of Hadoop. All their steps, like their MapReduce step, one of our guys is, he had got his master's degree in computer science with concentration on big data. And so he was used to writing all of the MapReduce. And when he showed up, he's like, this is way easier. But Pentaho's given us the flexibility to change our business and to change how we're processing everything without having to re-change all of the infrastructure or for that matter, the staffing, right? So certain people are better at certain types of things but with Pentaho, we've shielded them from having to learn MapReduce or dive super deep into the technology so they can be more thoughtful about what they're doing. They don't have to be great coders. They can be great data scientists or data shielding, the concept of shielding from the low-level plumbing. We hear that as a theme. And yeah, the other would be Hadoop is awesome. You were awesome during this interview. Tim, thank you for joining us. I'm Sam Cahane with George Gilbert. Keep watching. We have one more interview here at Pentaho World. You can watch all the interviews at siliconangle.tv.