 Live from Orlando, Florida, extracting a signal from the noise. It's theCUBE covering Pentaho World 2015. Now your host, Dave Vellante and George Gilbert. Welcome back everybody. This is theCUBE. We're live here at Pentaho World 2015. Pentaho was one of the very first customers that we ever interviewed when this whole big data meme started out. We were at the early years in Hadoop World and Strata. Chris Jansen is here as the director of user experience at Black Arrow, a company that services telcos doing some cool stuff with data. Chris, welcome to theCUBE. Thanks for coming on. Thanks for having me. You're welcome. So first of all Pentaho World. This is my first Pentaho World, only Pentaho World second. So I guess I get a pass. But have you been here before? No. What are you making the show? This is my first so far as well. It's been great. Some of the keynotes that we had this morning were really nice. Yeah, I thought it was a good mix of, you know, so high level, you got a hardcore, you know, practitioner with FINRA. So let's tell us about the Black Arrow story. What does Black Arrow do? Director UX? I mean, I guess it's self-explanatory, but maybe talk about that a little bit. Yeah, so Black Arrow provides a platform that enables multi-screen solutions for data and for advertising. So we serve operators like Comcast, Charter, Rogers in Canada, Virgin Media in the UK. And through them, we also serve the content providers, so Disney and AMC and so forth, are all onboarded onto our platform so they can do dynamic ad insertion on the different screens that they serve. Yeah, so we, as every user knows about dynamic ad insertion, your job is to make sure that it works. It's effective. It's not just get this out of here. It's actually, oh, that's interesting. Yeah. That's a hard job. Yeah, my job ultimately is to make our platform as easy to use for the operators so that they can get in and get out and really set up our system and then ultimately set up campaigns, report on that, and really know what's happening in the system. So kind of the Google of your world, right? Is that fair? Yeah, fair enough, yeah. So talk about how you've used technology and what's changed. So, I mean, what's changed recently was we added Pentaho as part of our reporting solution and what that allowed us to do is really free up the reporting so that the user can define what they want. So before we worked with them to find reports that were very broad, had a lot of rows of data that ultimately was difficult really to deal with. They'd have to take it somewhere else like Excel and kind of distill it down from there. What we get with Pentaho is an ad hoc reporting tool that now we can set up the general model, give all the dimensions and metrics for them, and they can either use our standard reports that we've defined because we know the space, modify those and save them as new ones for themselves, or define ones that we haven't thought of. So talk more about how that works. You have this bog of data. And what do you do with that data and what comes out the other end? Yeah, so we get a lot of data, and as I said we support multiple screens, which means we support multiple platforms. And so we get data in different formats coming into our system. We end up loading those logs and that return path data. And what I mean by return path data is information about what actually happened in the session. So did the user watch this particular ad? Did they happen to fast forward and skip some of it? What was their engagement on it and so forth based off of the events that we get from the player? So pretty deep knowledge about what the user does. But then you enable your customers to build their own dashboards essentially, right? Right, so we distill that information into a format that they can actually use within our visualization tool and our ad hoc reporting tool. And so they're able to create visualizations that allow them to see trends and analytics that they wouldn't have been able to see before with our own reporting or their reporting. One of the important things that we enable for them is since we're multi-screen, we bring all that data together into one place, which is something that's kind of new to them before they've had these different systems that report in different ways and it's really hard for them to actually get that data together and be able to report on it. Okay, so part of your solution is a visualization piece and making that easy. Is that something that you guys developed your own? It's actually built on top of Pentaho's analyzer. Okay, so you use Pentaho's Viz. And then what does the user get? They kind of get this sort of drag and drop, customizable environment. So they get this easy to use drag and drop environment. They have a list of the dimensions that are available, the metrics. As they drag and drop, it builds that out either as a report or as a visualization. It's very guided as far as you put metrics in this place. You put dimensions over here. How to do a cross-tab report is all kind of explained within the UI. Kind of simple to use and that's one of the reasons why we chose it. So there have been other tools over time that simplify creation of cubes and reports. What was it that made Pentaho so much more effective in bringing together this multi-screen capability that you're talking about? So it's what we're going to be able to do going forward on top of it. So the first thing that we wanted to address is really give this ad hoc reporting. There's a lot of data in there and there's a lot of data that needs to be processed in a certain way and that's where our data hub that's using Spark comes into play. So in the future we're going to be able to actually open it up for them to do the kind of reporting where we can do some custom reports for them that's ultimately they define and then we run on top of Spark. It may be a much longer running job because we're doing things like unique counts that if you're doing a true unique count as opposed to an estimation, you have to go back to the transactional data transactions in a single campaign. Can you talk more about Spark, maybe the existing infrastructure or pre-Spark infrastructure and then how you're using Spark and what pieces Spark is complementing, supplementing, replacing, enhancing? Right, so previously we had your run-of-the-mill kind of MySQL data warehouse where we would do custom ETL work into the data warehouse and then ultimately write store procedures to be able to do the reports. What we had, what we moved to was having Hadoop ultimately as a data source. We get the raw logs in, we load those in into a, in the raw format. We then transform that and put it into a parquet format within Hadoop still and then using Spark we do transformations, aggregations, enrichment of the data based off of what's there and prepare that again, we write that in Hadoop so it's accessible there but then we push that to what we call MySQL hub which is InfoBright which is basically an enhanced kind of MySQL database that runs on top of MySQL and we push that data there which is what Pentaho ultimately talks to. So using Spark in conjunction with Hadoop will you move certain work to Spark that you normally would have done in Hadoop? So our entire ETL process has moved on to Spark so all the work's being done there and we also offer customer reports as needed to our customers that things that are farther in depth can't just do for every, you know, as a run-in-the-mill kind of report and so all those jobs will be run directly off of Spark. So you're replacing major portions of your Hadoop infrastructure with Spark completely or not? Well, so it's working in conjunction because Spark is just on top of Hadoop. It's just enhancing what we already have there. Is there work that you, you know, like say, you just mentioned ETL and other pieces that would you've been able to do that previously or is this Spark I think at the scale that we continue to road to we were doing things as far as we were enabling that reporting but as we are running into more scale it's to continue we need that solution. So, Chris I had a question regarding this pipeline looks like it has a lot of latency built in because there are a lot of discrete steps. Are you trying to shrink that? And in the future what technology might play a role where you're getting in data on one end and making it accessible to the customer to operationalize it either through human or automatically on the other end? We already kind of get the information to them relatively quickly. I think the longest run time we have is an hour for some of the ETL processes. But yeah, different platforms give us data in different ways so your older qualm environments that most of TV has run off all the video on demand has run off even the return path data takes time it usually takes less than a day but sometimes we have a window of three days that we'll have to continually try to match that information within the system. Good. I'm just curious because the latency, how fast all this information goes through the pipeline has a determination on what technology you choose so would it be you'll take the lowest or slowest common denominator and you make your decision about what to use based on that or would you say the really latency sensitive stuff will use new technology? It's a balancing act. Ultimately, I think as you said it's going to depend on the platform that we're supporting and how frequently we're getting that feed of data you know in the future we may need to enhance what Spark's doing to be able to do more frequent data ingest depending on the platform. And what help us tie that back to what's going on what ads are being bid on or presented video to on demand so might be as three days what's the one that's closer to real time? So we support different platforms we support digital platforms that information comes through vast ultimately how return path data comes for that is much different with someone hits a pixel and that says hey someone had an impression or this was a fast forward event and then we record that and log that immediately ultimately we then put that into our ETL process and then expose that into the reporting. You guys won an excellence award for this work talk about that a little bit what's that mean to you why do you think you won? It really to me validated things that I was trying to get out of Pentaho is really the seamless integration into our application and I think us winning the award really speaks to that we did the right things to really make it feel like it was a unified part of our our product and that we were able to do a couple enhancements on our own extending Pentaho based off of what it provides to give additional user experience gains within that. You know I was listening to the keynote this morning from the chief product officer and he laid out this dazzling vision that goes way beyond what we used to think of you know business intelligence. How does that roadmap fit in with your with your plans? What would you build on it that you know that you hadn't thought of before or what alternative road maps are you considering? So there's a lot of different aspects in the different parts of the roadmap that we're interested in Spark when we started was really in the labs so we're not really using that in PDI and I think ultimately as that grows up within to the PDI infrastructure on their roadmap it would be very interesting for us to start leaning on that for our ETL process and so forth is one potential option we have once Spark becomes a more integrated piece of their platform. I was excited to see a lot of the enterprise hardening that he was mentioning and the continued hardening you know we want to make Pentaho feel like it's just part of our product and it's seamless and so as they add the features there that really allows us to give a seamless experience and also allow us to make decisions for the user as they're using the application so one thing they added in one of their recent releases was the ability for us to change models for the user without the user knowing what's happening so what that allows us to do is really like to deploy a new predictive or prescriptive model so we've already the model that describes what dimensions and metrics exist what the hierarchies of those metrics the cubes yes so the feature that they gave us allowed us to programmatically switch between those cubes and so now we don't have to ask the user choose which cube you want to work with today right they can just go into the report say I want this dimension this dimension in this dimension and we're able to decide we need to use this cube and this is the most efficient cube for that and we switch to that seamlessly dynamically dynamically yes so it's a really nice thing that really gives a huge user experience gain they don't have to think about it they don't have to think about it they don't have to know what our architecture is to be able to do their reports oh you mean they just say what dimensions they want they just drag and drop that you have something that's called campaign name I just drag and drop campaign name we have some things that are high cardinality like program and if they drag in program we have to switch to a special cube that has that high cardinality item in it so it allows us to quickly do that and really they don't have to worry about what our architecture is they just have to know what their information they're trying to get to that's essentially a cube template that you can invoke on the fly they don't think about it that's critical to getting data in the hands of you I was talking about citizens analytics today several times I mean the important piece of the analytics and reporting is breaking down the walls of having to know what the data is understanding it and just you want them to be able to drill in and seamlessly go down through the data and that's what they have provided so it was a big data analytics practitioner you know we've heard a lot in the last couple years about the complexity of a dupe and the challenges of a dupe and how hard it is to get up and running and how hard it is to get ROI out of it you guys have been solving that problem you're actually running your business on it what do you want to see from the ecosystem from the community do you agree first of all that it's still too complex does it have to be simplified or is that a competitive advantage for you it's complex you know it may be so I wonder if you could talk about that dynamic a little bit yeah I mean a good thing is we understand our space and our data and we're able to kind of curate things for our customers so that I mean it does give us an advantage but I do you know as someone that's in charge of user experience want to see continually breaking down that wall of the difficulty it is to get at the data and do the visualizations ultimately we do want to enable our customers to be able to not have to call us and say how do I do this or how do I get to this data we want to make it available for them so that they can get to the data as quickly as possible and find things that they didn't know before they can find audiences that they didn't know existed that they're under utilizing and then monetize on those audiences understand what the user experience is of the ads that are going in if I do this you know this many ads do I get a better user experience am I getting more engagement from my customers or what type of customers am I getting certain audience segments that perform better for this this ad load than that ad load so there's a lot of possibilities that we can delve into that data and help guide them to see things that help them monetize and optimize their system either guided exploration or simplified exploration so they don't need the guiding right I think ultimately it's going to be a little bit of both as I said we're experts in our space so I want to present things that are guided dashboards that really allow them to go down in and not have to know that much but then you're going to have more sophisticated users that are need to come in and they want to be unguided so that they can get to other data that we may not have thought of how did you guys launch this business was it a situation Chris where you said okay now the technology is available we've got the skill sets let's somebody wrote a business plan got funded and said go or was it a legacy business that you then sort of supercharged with the analytic we we've we've always been in focused on dynamic ad insertion depending on the platform early on the company saw an opportunity within video on demand as an underutilized or underrepresented area so we kind of focused there and that's what we're known for and now that we've established that we're moving far beyond that I mean different platforms in different use cases being able again to provide audience data and decorate that and really allow them to explore their audience data as well in your primary focus is enabling the efficacy of inserting ads making it easy for your customers to do that well I guess I should say the ease with which they can insert as what about the efficacy of the ad itself I mean that's not your role but so because we're the ad router and everything comes through us through us ultimately we're getting all that return path data and that is part of so after we've enabled this ad insertion we decided who gets what ad and what ad should be served it's not the work's not done there that's where the reporting really comes into play where now we can look at what happened as a result of that and then allow them to try different things A B test what they want to do with ad loads as I said or be able to tell agencies how well this ad did versus that ad so that's all part of the solution that we provide and the time in which you could actually provision an ad has obviously been dramatically compressed how has that affected you know hit rates and conversions so again it depends on the platform and so we get a lot of information about the ads but ultimately the operator's still in charge of working with those agencies to get the ad onto the platform that they need so whether it's a CDN or onsite a pump for video on demand for example well so the reason I'm asking is as a consumer you can tell it's getting better but it's still not great right so there's a lot of upside would you agree with that as an industry do you feel like there's a ton of upside in terms of the efficacy of the ad placement yeah I mean there's a lot of things that as they're exploring into these different screens they really have to see how they can monetize and that's something that we're helping so that they can get different ads on as opposed to you know maybe you're used to seeing the same ad repeated during every single break and you kind of that doesn't end up making a good user experience so all the work that we do to better enable them to do different campaigns be able to track that kind of thing or even prevent that kind of action which our platform allows for allows them to increase your user experience and ultimately makes you want to come back and actually watch that platform so I'm like the ad serving industries like best customer potentially because I'm a sucker for an offer yeah and I buy a ton of stuff online if I see something that appeals to me I'm like oh I'll stop what I'm doing and grab it so I just feel like the industry has a huge upside yeah and should be good news for you guys yeah I mean ultimately I think the upside for consumers as well is what we enable again their monetization of this allows them to offer more content so what I like to say is don't think of it as I'm making you I'm forcing you to watch ads I'm actually getting more content on demand for you right yeah so I want to go back to sort of a meta question which is sure you know Pentaho has this sort of end-to-end approach to tools that were very different from the data warehouse world where you have kind of the exploration you know kind of exploration and then the kind of ETL over here what is the value out of having it all integrated together in one big single-vendor tool chain in the you know big data world so I think there's a lot of things that if it's a single chain that they can do and that one piece can pick up immediately from the other so you know a lot of the things that they kind of talked on the road map of being able to dynamically get doing data extract and then automatically have PDI build that model for you and that then gets published and then the user is able to go into analyzer and seamlessly use that then understanding that entire workflow and being able to provide that gives us a huge advantage because it allows us to automate more easily so that's something that we're definitely looking forward to using in the future all right Chris we have to leave it there congratulations on the excellence award thanks very much for coming to theCUBE thanks for having me all right keep right there we'll be back with our next guest this is theCUBE we're live from Pentaho World in Orlando right back Line from Orlando Florida extracting a signal from the noise it's theCUBE