 Live from New York, it's theCUBE. Covering Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, NVIDIA, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and George Gilbert. We're back, Daniel Henderson is here from IBM. He's the Vice President of Integration and Governance and he's joined by Shiv Segel who is the Solutions Architect and Product Manager at RSG Media. Gentlemen, welcome to theCUBE. Thank you. Thank you for having me. Shiv, let's start with you. RSG Media, we know what IBM does, Daniel. So what is RSG Media all about? So a little bit about RSG. We are a software solutions company and we've been around for the past 30 years and we've been working around with the world's leading cable and broadcast networks. For example, like Viacom, AMC, Discovery, the over the top folks, whether it's the four sports leagues and the consumer product side in terms of how jerseys are sold, hats are sold in various distribution channels and of course the studios and anybody really in media and entertainment who creates content, who sells content, those are the folks who are using their systems today. Okay, and they use that to increase the productivity of their activities, their merchandising? Yeah, so over time actually the media entertainment industry is actually quite interesting because it's always changing and over time it's been evolving much more to understand who your audience is and where people are actually going to engage with their content. So it started off being folks are on linear TV, they read the newspapers, they got the radio but now you have the tablets, the desktops, the PC, mobile and you could watch real content, long form content, you could watch short form content in YouTube. So it started off as just being what's available and what can we use but now who's watching our stuff on what platform, what are they engaging in and what are these different types of content that are driving our viewership and how can we better understand those people who are watching our content so we could provide them more personalized content, hyper-target them for advertising marketing? Okay, great and we're going to talk more about that but Daniel, let's go to you, you guys big week this week, you guys get some big announcements, driving the whole cognitive vision governance which is part of your title, that's been a big deal over the last couple of years, you had all these new projects spinning up big data projects and somebody said, whoa, wait a minute, let's bring those in, so give us the quick update from your standpoint. So we've been, as part of the IBM DataWorks piece of work been rethinking what governance means, often when our clients think about governance they think about it from a compliance standpoint, I got to do it in order to solve my discovery needs, my records and retention's needs but especially in the big data side, you build a data lake, you got to understand what's inside of it. Once you understand what's inside of it, you want to make sure that it's being accessible to the individuals that might want it but you want to do so in a way that actually exposes that data to only the individuals that should have it, not to necessarily everyone that might be, for instance, have access to the data but perhaps shouldn't have it. So modern data governance is our take and ultimately it means infusing metadata in our IBM DataWorks platform so that we understand the data in all of our data products and once we have that we can infuse policies and automatically enforce policies so that you can construct things like data lakes and deliver not just the compliance that's necessary but also the value with the trust that's assured. And the value comes not only from risk reduction but also is there a data quality component? Absolutely, so governance is ultimately also about data quality, so who has access is one dimension but also the data I deliver, whether it's to do scheduling for instance for one of my clients as RSG is going to depend upon how reliable that data is, data quality is ultimately a key factor in the reliability. All right, so Shiv, let's get into your big data story. Okay, where did it start and take us through the last five or six years. So I think it all started in the meeting and entertainment industry when Netflix and Hulu came to town and I'm saying specifically Netflix and Hulu, probably YouTube as well if you want to include it but ultimately there was a traditional way to do business, right? People were comfortable and all of a sudden, oh my God, we have some new entrants and what impact did that have? Well actually now folks started to understand, all right, if we have people who are subscribing 999, who are those people first of all, they're a younger generation and don't typically want to pay their cable bill. They really want to watch sports but they're going to bars. So beforehand we were just gonna, we have a deal with Warner Brothers, we usually get 20 movies a month, you know, 10 of them are comedy titles, perfect. We're good to go. But now they started to understand what's the ROI of actually us purchasing that one specific title. And nowadays we're starting to get into the day and age where that one title that's being selected, we're starting to forecast and predict what is the rating on a certain network and any given point of time across any day part. What's that C3 rating where advertisers can start to understand the value of that specific time slot that the movie would be airing? And that's over in a nutshell, what's happening is that cable and broadcast network started to get much more granular. We want to understand specific viewing behaviors and understand how do we better target that ultimately that one individual eyeball. We all used to fit in one demo adults 18 to 34 and we probably all fit in the same demo but we all have varied interests. And that's the story really that is here, you know, we're here to tell us that in today's day and age, you really data provides a competitive advantage where you can run your business smarter, you could mitigate risk. And without these type of tools in today's day and age, you really don't have the wherewithal to understand just simply what's the ROI of your content and who is your audience and where are they? And you provide solutions to expose that ROI, do what ifs, you know, test different scenarios? Yeah, so it all starts with understanding in the media entertainment space, Nielsen is king. So just first of all understanding what were the ratings for my content yesterday versus last week when it aired across different networks? What was the performance of that one show across different day parts? That was the usual use case. But now we wanna understand, all right, if you have your content on both linear TV and Netflix and we wanna make sure we're talking about the same version of that content, the Wizard of Oz with James Franco, not the 1936 Wizard of Oz and the SD version, not the broadcasters, you know, directors cut. How do we start to understand that one individual entity performed across platforms and was there a digital lift? For example, Hulu is supposed to provide you that complimentary viewing where you could watch something live and you could catch up on that's current season on Hulu. So is there actually that complimentary viewing pattern or people just watching random stuff? And these are the different things that helps you make sure that you can engage with your subscriber with your audience base and taking these data points to better create an effective linear schedule and making sure you could allocate the right content across the right platforms. And of course, we can't forget about the advertising and the marketing inventories because that's where the money is. So I can't wait to hear George how they do this. We'll go ahead. That was exactly what I was gonna ask, which is it sounds like you have more data feeds or sources to deal with, some external, perhaps some you can tease out internally that was difficult before. But then also, you know, what are the governance technologies that have to come into play for you to trust those data sources, internal and external? You said the key word, George, it's all about trust. And Daniel, you're talking about governance. None of the analytics that we do means anything if you can't trust the approach to how you manage your data. And that's, first of all, step one. So the whole process in terms of how we try to make our data pipeline as transparent as possible is let's just get everything under one roof because right now there's data on Excel, there's APIs, there's watch folders, there's, you know, FTP sites. So let's just get it all under one roof. So that's step one. And then after you have step one, well, what are we talking about here? We're talking about the linear data for Wizard of Oz, the HD version of the SD version. Are we talking about the, you know, on demand version? You know, are we talking about the clips, the trailers? What is that reporting entity? So then we start to transform the data in a very cleansed view where we understand, all right, this was how that one specific entity performed across linear and non-linear across these different tablets, devices. We start to break out the data in these very, I call them master tables, but ultimately it's these cleanse tables. And this process is very, very important. That, and it's very complex. There's matching algos that come into place with Spark and Python that says, all right, we have these different assets. We marry them together and clump them together. But that process right there has, because it's machine learning, obviously we need to teach the machine how to do this. So it's a very hands-on approach to train the model, to make sure that the matches are accurate across these platforms. And then blessing that as we start to move forward into building how that content performed, building out those table structures. So it's a data pipeline. It's some ETL magic that's happening. And ultimately it's these, it's managing whether it's dash DB or Cloudant, the operational databases that allow you to touch and access and massage that data in a very timely fashion. And by the way, this all happens under eight hours. So, Daniel, what role is IBM playing here? You mentioned dash DB, Shiv and Cloudant. What do you guys provide? So IBM Data Works is our data analytics platform. What you see is the potential impact that we could deliver through that platform. So the data pipelines that are funneling data into his landing zone, which is in dash DB, that's on Spark a fundamental component of IBM Data Works. He's applying machine learning for cleansing as well as for some of the scheduling optimization. That's Spark. Our contributions in open source for machine learning are embedding of Spark in virtually every aspect of our offerings is a key component to that. The operational data that he needs to support his apps, Cloudant, which was a franchise that we brought into the fold and that we've integrated inside of a lot of our offerings is a key part. So, IBM Data Works, the integrated complete data analytics platform underpins his solution. And his solution is a great example of the kind of applications you could build rapidly. And you access this on-prem, you do this through BlueMix and SoftLayer? It depends honestly on that client and their political preference, to be honest. We see a lot of clients moving to the cloud, but there's also those traditional folks who like their stuff on-prem. And that's the nice thing working with IBM is that we have the flexible, it's a non-issue. You want something on-prem, all right? You got the HDB local, you want something cloud hosted, all right, you're good to go. So we have no preference as long as we're flexible, then it's all good, it's ultimately what the client wants. One of the questions we have is that in the big data ecosystem, the open source big data ecosystem, which pretty much the same, there's a lot of innovation at the tool level and then a lot of sort of experimentation on the go-to-market level. To what extent did you choose IBM because they could bring all the pieces to bear and they could put them together? You actually perfectly put the words together. It's putting and packaging the right technologies together and that's what ultimately we care about. Do we have the tools as a technology company to manage the data, to transform the data, to create the algorithms, to interact with the data? Can we build applications on a platform and be able to have our Node.js application or another Java-based application based off of what that specific business need is? And IBM's platform, their data works, their Watson data works project provides a huge set of tools for us to be able to build what we need in a timely fashion. And the ability for us to prototype, as you said earlier, George, for example, coming up with an algorithm that schedules content appropriately on linear television, well, first of all, how are we gonna build that model? Are we gonna use the logistic regression, a random forest model, a time series model? And in fact, we did a little bit of everything and just compare it and contrast the outputs of all those various models and then meld them together. So the ability to fail but fail quickly, the ability to prototype and do it quickly and then package it all together in a nice seamless app. Well, that's ultimately our business model and that's ultimately the advantages that we feel on this platform. It's consumption models of linear consumption and non-linear consumption. We talk about it all the time on the web, this non-linear, it's like little cookies all over the place and Hansel and Gretel. But then you see things like binge-watching, which changes that equation. So what are you seeing in terms of the consumption model? So the seeing in the industry is linear dollars and digital dimes. There's a lot of viewers in linear TV but also there's a lot of viewers on non-linear platforms and you ultimately have to manage both. So even though TV viewing is down, I believe the number is about 8% and the number last year was about 6% overall. There's still people watching TV and there's households and so for those folks who are tuning in, we wanna make sure we have the most eyeballs watching your content. But for the viewers on the non-linear platforms, well, we're gonna bring it all together and understand when you're scheduling content on linear TV, what those viewing behaviors are on those non-linear platforms, so you can create a plan that engages with those non-linear folks as well. So we're not siloed, we can't think of linear by itself, we have to think about them together and that's actually one of our exact applications that we have cross-platform reporting, which allows you to understand the viewership of content across these different platforms. I love the saying, linear dollars, digital dimes, but the Olympics this year was a good example where NBC put a big effort into live streaming virtually everything and the consumption patterns were, as you described, but the industry seems to be trying to avoid what happened to the print publishing business. Do you have confidence that they can navigate through that or is it just like, damn, the torpedoes were going at full speed? I think the fact that the Wall Street Journal can charge a digital subscription as well as a paper subscription was just noteworthy by itself. The fact that they know that consumers want their product so bad that they will pay for two subscriptions. However, the media and entertainment industry is a little bit different where you have somebody who has a, maybe your parents have a cable subscription and then I will be able to watch TV everywhere using their login and authentication. So the business case is a little bit different where the MSOs, the service providers, the Comcast of the World, they allow you to share these credentials across, there's five different accounts that you can use. So these concurrency limits, if the industry really wants to see what happened to the paper industry and take some notes, well, they should understand people were able to monetize on both a paper subscription and a digital subscription. Right now you just have one subscription that's enabling you access to this huge locker of content. So until you start to see these different business models change, right now it's still, nothing has changed on the MSO side. Right, but it's in the journal is kind of the exception to the rule, right? Exactly, yeah, but they're the one of the few who are still around, I believe. Right, the vast majority. Look at the Boston Globe, I mean the number of reporters that they fund now is way down, but we probably could have bought it if we were to create a funding together if the other billionaire grabbed it first. All right, Daniel, we're out of time. We'll give you the last word on this week, big data week, big data NYC, Strata plus the Duke World, IBM announcements. Watson data works is a big deal for us. It's not just the data and analytics platform that we're delivering, we're taking the best of the experiences that we've created with our partners and our customers and we're offering it through this method. So stay tuned, pay attention to Bluemix. You'll see more offerings, updates to existing offerings every week. It's amazing. Great, love it, Daniel, shift. Thanks very much for coming on, great story. Thank you. All right, keep it right there, we'll be back with our next guest. This is big data NYC, we're live from New York City. We'll be right back.