 Extracting the signal from the noise. It's theCUBE, covering Spark Summit East. Brought to you by Spark Summit. Now your hosts, Jeff Frick and George Gilbert. Hey, welcome back everybody. Jeff Frick here with theCUBE. We are live in Midtown Manhattan at the Hilton at the Spark Summit East Convention. It's the second year they've had it here at the Hilton, which is where they have Strata Hadoop World or are used to, it's kind of the epicenter of big data. Now it's all about Spark, it's not about Hadoop. So we're really excited to get the doer, as we like to say, practitioners. Not the people that are building the technology, but are actually putting it to work. So this next segment, we're really excited to have Jason Scheller on, and he is the director of Data Decisioning, another new word that we learned today at IView. Jason, welcome. Thank you. So before we jump into it, give everybody kind of an overview of what is IView, what are you guys all about? So IView is a digital video ad tech company focusing on driving ROI for our brand. So we use a combination of data and programmatic decisioning to connect the right ad with the right consumer to drive direct online and offline sales for our clients. Okay, and how long have you guys been around? IView's been around about four or five years now. Four or five years, okay. So obviously you're here. We're still doing pretty good, yeah. Yeah, and obviously you probably use Spark, so let's talk a little bit about your Spark journey. Sure, so we've been using Spark since I think February or March of last year, I think a little bit before the public release actually. Initially as a POC, kind of see what this new technology was all about, it's now become one of the major pillars of our data platform, and serves as not only the backend for ad hoc analytics, but also the productionizing of all the modeling that comes out of our analytics team. So before we jump into the meat potatoes of what it's doing for you now, I think a lot of your peers out there are curious about how do you get engaged? How did you guys decide to get started? How did you find some low hanging fruit that you could start your journey to really prove out the concept? Sure, I mean, like any tech company, our engineers or CTO are constantly watching so the new technologies that are out in the market, Spark technology was something that's been of interest since the very beginning. When Databricks came around, I think we were one of the earlier clients trying to get into how can we really apply this, what can this do for us? So there was an initial POC, I think we put even just a few months of data into the system and started to play with how does this compare to Redshift and some of the other warehouse technologies we were using is pretty quickly apparent that the performance with which we could do the same thing in Spark versus our older technologies was just tremendous. So give us an example of now that you've had it, so it's definitely in production. Yeah, for sure. Part of your core technology stack now. So give us something, some indicators of the impact. So one of the stats I like to use is, previous to Spark it took us about 24 hours to model one day worth of data to set up the optimization for a particular campaign. With Spark we can model six months of data at about 10 minutes. One more time, so you went from one campaign to how long? One day of data in 24 hours. One day of data to 24 hours. To six months of data in 10 minutes. And I don't even know the percentage anymore. That's too big of a number, we'll break the machines. What is that, what was the output of that? What decisions did that help you make? The output of that is the underlying model that drives our campaign. So the core technology behind iView is focusing the right ad on the right consumer and figuring out which ad works and which consumers, for which brands, right? Maybe red cars work better on women and blue cars work better on men, for example. So the model behind it is constantly watching the data coming back from every advertisement continuing to focus in on the right ads for the right person. So this is critical in terms of how fast you can iterate the model is how fast you can improve the decisions based on the feedback you get from the real life. Absolutely, absolutely. And being able to look back over six months of data versus one week is how we prep the model before that campaign launches in the first place. So once it's live, then it's constantly being updated. Now, were there any inherent advantages you've had in terms of how you built your infrastructure, your data infrastructure, data processing infrastructure before you dropped in Spark that made it possible for you to do this in a way that would be much more difficult for a competitor? True seamless. So I mean, we're entirely Amazon cloud-based so the fact that all of our data streams through the Amazon Kinesis framework made it pretty simple to just kind of direct that data stream into Spark versus Redshift or other warehouse technologies. There is some amount of, you know, when you move from something like a Redshift or a MySQL database into a Spark, you are going to kind of redesign your data architecture to some extent to support the capabilities that Spark's going to give you. But that was honestly pretty seamless. I mean, our data engineers had something up and running within a few weeks, really. When you were going from Redshift to Spark, the first thing I would think of is, you know, Redshift to Spark SQL. But I assume, you know, part of the value of Spark is that, you know, you've got the machine learning and graph processing. I mean, I always leave one out of the four. But were you using multiple of these libraries in conjunction to do something that you couldn't do, you know, on Amazon by having a sequence of engines operating? By having a combination? Yeah. I think for us, the value is really the fact that we can use a single platform to do each of those. Not necessarily that they're in conjunction at the same time, but that I have one warehouse, one set of data. My analysts can look at it one way. They can run machine learning against it. They can derive some interesting analytics from it. And then account managers can also look at that same data for just general reporting out to clients or other vendors. The fact that it's coming from one place and I'm not having to maintain two systems is really of core value to us. So that's shrinking the time from sort of capturing it to making decisions for the different audiences. Sure. I mean, the data bricks in particular, the fact that we have notebooks available. So an analyst would open a new notebook, start doing some kind of ad hoc analysis, and maybe they start to find some interesting trends in the data. As they find that, their entire, you know, thought stream is kind of captured in that notebook. So somebody, when they get to a certain point, hey look, I found this particular trend, we can, there's something we can exploit. Any other analyst can come in and look at that and kind of follow along and say, oh okay, I see what you're thinking. I see where you're going. The work is reproducible, right? It's very easy to share. It makes it very transparent. And then that notebook can be directly ported into, okay, this is now a metric that I want to identify and leverage in a live campaign. That ports directly over to a data engineer who can then put it into the main system. That's actually a big deal in the sense that you're preserving the lineage, not just of the data, but what you did to the data. Of the thought process, absolutely. And you know, in our world, and we're constantly looking at new forms of data, right? New information becomes available from different clients or vendors. It's critical for us to be able to explore that in real time and somewhat of a group mentality, right? I mean, there might be one person initially looking at it, but once you find something really game-jidging, you want two or three other people to look at that same and say, you know, we really look at this right and there's some gotchas in here that we need to consider, so that's a big deal. And how has their workflow changed having these new tools, A, new tools, and then B, you know, the speed of the available information? I think the main thing is it used to be kind of a, hey, let me set up my model or set up my analysis, hit a button, and walk away. Let me go get some coffee, have a few meetings, do an interview, come back and get an answer. That was, you know, there was a lot of like, let me kick this off as I'm going home for the night and hopefully I'll have an answer in the morning. Whereas with Databricks, they'll have two or three notebooks open side by side and every one of them's working, crunching different sets of data. There's no walk away anymore, right? By the time I write my next query, the first one's done, it's a few minutes, a few seconds. And are you finding like epic, epic things that you just didn't see before or is it just being able to incrementally move faster down lots of the little steps? I think it's both. Again, I mean, the fact that one analyst can now look at such a huge volume of data where they used to take a team of people to accomplish, as well as, you know, there's different kinds of trends, right? You can see systematic trends where something's constantly kind of up into the right, but a lot of what we do is trying to find trends in individual pockets among the data and then be able to leverage that. This is a theme we've been hearing more and more about where the old style of building sort of analytic, I don't know if you call them apps, but sort of ad hoc analytics, you start with a full population of data in some storage format, you'd massage it in a subset of it, sample it in the database, you'd further sample that to model it, and then further sample that to visualize it. And now we take the whole population in one integrated stack and integrate it into the operational app. Can you give us some examples of that? Yeah, I mean, that's exactly, you know, I think for us, the way I view personalized advertisements, I might have a million individual ads that are playing for the same brand, right? And every ad has, every video has a slight variation, whether it's the product that we show or the map of the closest store location, things like that. So for us, with Spark, we can actually monitor the performance of every individual ad variant across that campaign. And if there's a few million of those, I can't sample it, right? I have to be able to look at full performance of every individual ad variant and how that's working against what types of audiences. And as I find what is really working, then I can kind of zoom in in terms of, let me target more of those kind of people that are really responding. So explain again the, you're targeting an audience of one and what are the attributes that you're measuring? So just as example. So measurements of whether or not an advertisement is working could be click-through rates, completion rates, we call PIA, if somebody went to visit the site of the brand within a few weeks of the initial impression. So those are the kind of metrics that we would look at. Depends on the individual brand, which one we're kind of focused on. They're all... It's the full sample, not the sample, it's the full population. Sample doesn't cut it anymore. Yeah, sample really doesn't cut it. And to your point, that was kind of the old way of doing things, right? Let me take a small sample of data, let me massage that, work with that, model that, right? It doesn't work here. I need to be able to slice that entire lake of data. And what's the impact of your customers in terms of their yield on their advertising campaigns, their yield on their investment and all these different assets that they're creating? For all clients, it's all about ROI. I think the last case study we published was a four to one ROI for one of our clients. So that means for every dollar that they put into that particular campaign, we can show $4 of actual sales, whether it's online or offline in their stores. But you're a supplier, you're an arms supplier. And the ad... Arms supplier, that's a first. I've never been called that before. Well, you allow the advertiser to compete more effectively. Absolutely, but... It's all about more efficient use of their advertising dollars. But what you're doing is not exclusive to any one client. So I guess my question is, who captures the value of what you're providing? If it's exclusive to one or two clients, they might capture most of it. If it's distributed across all advertising clients, you might capture most of it, depending on... It is actually exclusive to the individual client because we'll produce a separate model using Spark for every individual client. In fact, for every individual campaign, right? The individual factors that affect the advertiser's effectiveness is on a per-brand basis. Oh, so it's not... And even per campaign within the brand, right? Some brands are really focused on driving site visitations. I want more people to come to my website. Other brands are directly targeting offline sales. I want somebody to walk into a store and buy some products. So it really depends on what their goals are. You're not really a software company. You're delivering a service but that's created on a software platform and customized essentially by professional services on a per-client basis. So you're not really a packaged app that's delivered over the wire. No, no, it's not a packaged app. We work in concert with the brand directly and with their agencies on what are their goals right now. That's interesting. Because that goes back to that IBM conversation we were having. Yeah, and it's also this whole concept of not having to sample and to be able to target to the individual. And then you're also doing that with the assets on the backend as well to really make the most effective match as measured by the campaign goals defined by the client around that campaign. I mean to your point, right? There's a lot of different characteristics that drive the effectiveness of any ad campaign, right? And it's going to vary per product or per campaign but from the beginning you're always kind of looking at here's the 10 different characteristics, right? Maybe it's age and gender and household income for example but as you start running different advertisements you might find that in the Chicago area the gender really is the more critical factor in driving performance but in Florida it's really more the household income. So from that perspective you have to be able to slice and dice the data in every possible way to figure out what's working and what's not and then focus on those areas. Just continues to reinforce that sampling is not. You don't need to sample anymore and so now that you don't need to sample you really shouldn't sample because you get the richness of the data. And sampling by definition was always a kind of an estimation, right? If the sample is consistent if the full data set looks just like this and all the ratios align then my sample will work but how do you ever possibly know that unless you can really look at the full data set? Yeah, there was an interesting story the rise of really the first political polling there's a bit of a tangent I think it was Gallup organization polled like 1500 people Readers Digest polled all million readers and Gallup got it right to within three points and Readers Digest was off by like 20 points. But tell us above Spark now what does your technology stack look like so that you can rapidly turn out solutions per client? So for right now, I guess there's kind of twofold our analysts work directly on Spark at the moment you know because of the notebook product that the Databricks gives us because they can log in, make their own space kind of save their data very independently there's no need for us to build an additional stack on top of that, right? Whatever the analyst comes up with that notebook can be inserted directly into the primary production system. Second that we're starting to connect the Looker BI tool into Spark to be able to visualize that data so now the kinds of analytics that the team has to come up with and kind of the brainpower of that analytics department can now be mapped directly into a visualization where an account manager or a salesperson can see that without having to be the technologist and write a query and understand that, right? I just got to say, go ahead George. Just to clarify, so the notebooks are almost like parameterized reports or simple... The notebook is, no it's not a report it's almost like if you're a developer, right? When you would open up your IDE and start writing code that's the notebook for them but a notebook in Databricks can be here's the SQL queries that I'm running because this is the data that I want to look at it could be Scala code, it could be Python, right? And the visualizations are built right in so for example, if an analyst starts looking at hey, which of my ad variants are performing better for this particular campaign? Okay, here's the 50 ad variants of the 500 that are really performing I want to show this to the account manager they want to know what's going on. With Databricks, I can just click a button and instantly have this pretty visual here's the graph now I can show that to somebody else who understands that. So contrast that with Looker. Contrast that with... That's a little unfair. I mean Looker is a full up BI tool, right? The visuals of Databricks are great for quick let me turn my data into some simple visuals. Looker's more focused on a non-technical end user being able to build their own visualizations of data their own dashboards without really having to understand kind of the data and data science that's going on behind the scenes. So Jason, we're running out of time here I want to give you the last word it's very exciting and it really just demonstrates that if you're not leveraging these tools if you're not competing at the individual asset level and still working on generalizations and samples who you are, you're in trouble. So kind of looking down the road it's February if we see you again a year from now, what are you looking forward to over the next six months, nine months, 12 months? Oh the next six months? Look Databricks is only getting bigger and a lot of the new technologies that have been coming out from Spark directly and from Databricks are only making things faster. The machine learning libraries are just growing and growing. They're getting to the point where it's going to be just push button for us to kind of generate a model, learn that data and just walk away. It's, yep, here's the output. It's taking less and less energy to get to the end result. Excellent. Well Jason Schell from Ivy, thanks for stopping by sharing some of your insight. Appreciate it. Appreciate it guys. All right, so Jeff Rick here. We are live at Midtown Hilton. Just want to make sure to remind everybody go to Siliconangle.tv and check our Women in Tech of the Week. We had a guest from yesterday that I think we have up from for yesterday in Juul. A bomb brief from IBM is this week's guest. We have tremendous women every single Wednesday. We highlight a new Women in Tech of the Week. So check it out at Siliconangle.tv. We'll be back here at Spark Summit East with our next guest after this short break. Thanks for watching.