 All right, so next up we've got Thomas Barr from NBC Universal as a chief architect there And he's going to talk about building next-generation audience targeting and analytics Can you hear me? Good So as mentioned, I'm Tom Barr. I'm from NBC Universal. I am part of our advanced advertising group Okay, so we are responsible for those dreadful ads you see on TV and everything else But our industry just like a lot of industries Are facing some severe challenges in this environment, right the competition for your attention has never been greater right we have Cat videos, which is you know, we were talking about in DC OS days. We're making fun of it But you know when you think about all of the self, you know user-generated content that's out there the YouTube Everything else that's vying for your attention, you know It's no longer an age where the family comes home has dinner together sits down and watches TV at night Right, we all consume our content in different ways, which creates unique challenges to us but also creates tremendous opportunity right and so the advanced advertising group at NBC is been set up to take advantage of that to not look at it as you know TV is yesterday's media is to take that Original content that we have excelled at for so long and then to allow it to be You know viewed across all of the different distribution channels in different ways and to be able to monetize that and take advantage of that and so We were asked to stand up a cutting-edge platform to house all those data assets and allow us to Do that and we've you know, I'm going to discuss with you how first of all some of the business challenges a little bit more detail and Then go into what we do with DC OS So would it surprise you today that? You know we sell essentially advertising on TV the same way today as we did 50 years ago Right, we sell 70% of our advertising up front It's not data-driven if you ever saw the show Mad Men It's very similar to that and yes, there is a lot of alcohol involved right and You know we literally there's no data analytics driving behind this or very little Right a chief marketing officer will come up during our up fronts Which kick off in this big event in New York all in the well everybody does this right? And they'll just sit there and literally sometimes they'll say well, you know my kids watch the voice Let me buy the voice right Okay, we'll sell you the voice Doesn't mean that they're necessarily getting to the audience that they want Right, but we'll sell them the the voice What's worse is We sell it one one selling title at a time Right, we don't really know We have intuitive feeling but we kind of sell it by one selling title at a time It's the same thing that's been done for 50 years And when you think about that right I was thinking about this on the plane ride here this week entire industries have have started from nothing Become raging hot and now have disappeared off the planet In those 50 years that we have essentially done the same thing right Blockbuster came to mind. I was thinking about that right Started up gangbusters once when we started putting videos on VHS It was the darling of Wall Street for so long and now it's gone Yet TV has persevered Right this year we'll do about ten billion dollars in linear TV Okay, and so it's a very good business But we face the challenges that I described and so how do we take advantage of that? right So what we want to do is find follow and engage our consumers in a personalized way To navigate this monumental shift So that just went nuts. Oh, I see. This is animation Silly me Never made it in viewing mode then I by the way you can tell by and there's there's a prize Not really because he'll ask me for it But at a certain point in this presentation You're going to see where I go out of the marketing slides and then go into the engineering slides It's really obvious. I'll clue you in Right, but most people don't know, you know, I mean we're familiar with NBC and some of our shows But when you look at all of the distribution channels that we have our content providers our partnerships You know, we're called NBC universal because we have universal parks. We have universal the movie theaters, right? We have all of the networks you see up there. This gives us a huge amount of reach and every one of these Issue, you know distribution channels networks that you see are backed up with applications and websites that all generate Content and we're not really taking it a full advantage of that content because of how we're structured You know, all of our brands are individual and they operate fairly independently So part of the event part of our challenges to gather all that data Create a 360 degree view of our consumers so that we can then target our You know provide additional benefits and additional lift for our advertisers to target audiences So how big is our reach? All right, there's a fact NBC universal reaches 268 million people every month I would probably surprises everybody and that's across that whole portfolio It would surprise you to know that that's 35 million more than YouTube and 25 million more than Facebook and that's not taking into account They kind of have a problem counting these days in case you haven't heard right and been counting a lot of bots Not just Russian bots so, you know We still reach a tremendous footprint, right and this is all original content It's quality content. It's content that our advertisers want to be associated with right So how do we monetize that? So as I mentioned our portfolio Right, it includes 95% of the US population right We have the expansion of OTT and app deployments, which is increasing our interaction Right one thing I want to ask here and make this somewhat interactive and at least somewhat entertaining Right, how many of you are cord cutters, right? Don't have a cable subscription Right. Yeah, we don't like you. No, I'm just kidding Right, you still are consuming our content. You're just doing it in ways that you choose to But it's hard for us to measure that as weird as that sounds, right? We're still measuring TV linear TV the same way we did. Guess what? 50 years ago, right? It's based on about Was it 40,000 households? Ha 30,000 households, right on a panel and then we go across all the distribution Providers that gives us the ratings Let's see what else we have here We have parks. So we have millions of annual visitors to our parks both here in California and in Florida Right, we have transaction services through golf now in Fandango. Yes, we do have Fandango as part of our portfolio All right, and we have growing engagements with film films DVD sales and much more games You name it. These all have interactions with consumers. They're all generating data We have to gather all that data into one central place create that 360-degree view and then give increased value to our advertisers So Audience studio Was set up about two years ago to do exactly that Right. I was brought on about a year and a half ago To and and by this gentleman right here And I was asked, you know, hey listen we have to build a state-of-the-art data asset analytical engine that can allow us to target build these targets and then Apply them to any of our distribution means this means apply them the linear TV, right? So that we can generate an optimized media plan that's based on data That's not based on somebody's kids watching a show So, you know, and if our advertisers want to come to us with their own data Great, we'll plug that in we'll build the audience based on that data We'll extend the audience and then we'll target that Right and our whole thing is to do it at scale Guarantee doesn't matter how we're gonna see that consumer. It doesn't matter where we're gonna see that consumer So if it's on our app because the person's a cord cutter fine We'll target them. We'll count them. We'll do all of them. So this is what we like to call our ice cream slide Looks like an ice cream cone But there are three aspects to this right there is our national living linear TV, which is what you're mostly familiar with I define that as You watch the TV for the most part When we want you to Right, that's linear TV. Now there are aspects to that where it's time shifted or VOD you're still going to see the same advertisements right But for the most part linear TV is defined as you're gonna watch the show when we want you to Right, then there's addressable TV. This is after a certain amount of time You go to your box you watch VOD you watch your OTT apps, right? But now we can target you with custom advertising based on what you What are your interests what we know about you and by the way and you know Autonomized ways so don't get you know, don't worry about privacy. We trust me. I have to worry about that every day Right and then we have our digital aspects, right? We and as you look Still vast amounts, right on a scale. There's a hundred and five plus million US television households There's 19 million addressable households because not every cable box has the ability to be addressable Right, and then we have a hundred and seventy two million uniques per month on our desktop and mobile applications We're putting our money where our mouth is So this year we announced Linda Vakarina who's the head of all of ad sales for NBC Universal announced and said and this is pretty Unique within the industry because this is still this whole data analytics driving sales is still fairly new in the industry There's there's some players that are doing it, but we're kind of we believe we're taking a leadership position here And so out of that ten billion dollars we allocated one billion of that for programmatic or Databased advertising so one-tenth of our entire sales this year And we're well on track to do that and achieve that goal right So it's taking the trade and you know in ad week it actually says it's taking the training wheels off of audience studios So now we've no longer are the the little toy that was kind of nice and We have to do it at scale right. We can't just do it and paper it along So we have some technology goals Right We got to unlock and utilize the data you would think it was a lot easier. We spend about a You know a couple of days here every other month just talking to our brands that are here Try going down to Florida trying to get the data get the data all together Everybody's individual cost centers and you know profit centers. So this is not an easy task right We have to Take ownership of our linear optimization solution We have to be nimble Right. This was part of the challenge when I came here was We didn't really know what was needed. This hasn't been done before so it's build us a platform That can take anything and do anything Right and so you know and you can see throughout all of your discussions today that You know in the last two days if you're here for DC OS days DC OS lends itself to those kind of Dynamic nimble. We don't know what we're you know, what what the challenges of tomorrow are going to be right Let you read the rest of that we have several this year of technology goals So What's our value proposition? We're targeting at scale We'll utilize our advertisers data. We'll utilize our data. We'll combine them We'll do it in a privacy safe manner in a manner. That's fully tenant so that we're not going to intermix advertising data Right, right. We're going to get better cost efficiency So, you know in the example I started with where somebody came in and said I'm going to buy the voice prime time because my kids like it Right. Well, what if I can go ahead and save them money and get them the same audience across some of our cable properties And you say well, why would you want to do that? well, they'll free up that ad slot right for somebody else and Then we move that off of there and still get them the audience So there's there's a win-win for everybody. They're saving money. They're still reaching their audience We're freeing up the slot. Maybe we can monetize it for more Right, there's somebody who needs it because the voice happens to hit that demographic really well for them All right, and then inventory insights, right again, this is exactly what I was talking about Shifting across all of our portfolio from some of our hot portfolios, right? So Anybody notice the change in the production quality? Right. These are my now. I always say that you don't hire me for my PowerPoint so They are absolutely not You're you're you're good. I like that so Essentially what we're talking about is building a data lake right and Because this is this is kind of loose how you know requirements we have we have a whole bunch of data from a whole bunch of different sources We got to bring it all together. We got to Analyze it figure out what we're going to do with it create a target send it out right How do you do that? But those were essentially the requirements we have into this day to to a certain degree what the requirements we have So we wanted to go ahead and build a data lake and As I was looking at it Little little stuff, right? I I looked at it and I was you know, I wanted to go in the cloud I didn't want to be on-prem because of some of the challenges that can be there with some of the you know As I have experience with on-premises data cloud data lakes They they almost fit, you know, they have difficulties. I won't use the term fail By their mere success because if you're on-prem you have limited resources, right? They're they're limited Right, you know, it doesn't matter at some point. You're running out of spindles And so if you you create this data lake and everybody jumps on it You quickly have these unbounded resources because you're usually successful You know requirements and you have these very bounded resources So along comes the cloud right cloud solves the problem. Well, you can't do it cost effectively Because if all I do is shift my Hadoop cluster, let's be honest That's what we're talking about here at least on-prem HDFS and I move it into the cloud There's a tipping point and cost Right, we can debate on what that is 200 tera 300 tera But eventually it's too expensive Right and so I was always thinking about this and this is not earth shattering But this was a shift we made about a year ago I kind of thought about it and I said well, you know the problem really isn't the cloud Right the problem is that we're taking HDFS and a Hadoop infrastructure and moving it into the cloud Which is really where the cost is right those those volumes those spinning volumes are expensive Right, and that's what costs you money So we quickly shifted over to a s3 as you can tell we're an Amazon cloud But all the major providers have an object store and we're just basically shifted over to running parquet files on s3 It's a cost-effective storage solution for us and it's led me to believe and Greg over here asked me to drop the bomb I view honestly with all due respect to the to our Hadoop fans as Hadoop is yesterday's technology Right as quickly as it came all the rage When I have a DC OS cluster running spark at scale, I can go against an s3 backed right Parquet based system, which is infinitely scalable never runs out I can do it as a cost-effective way and if we're truly honest with ourselves and Look at the cost of ownership on prem Which is in terms of all the people that are there that maintain it The dev ops the amount of disk that have to be replaced it replaced the support contracts We're pretty much on par in cost, you know from the back of the envelope Analysis I've done quite honestly. I don't want to have to do that. I'd rather let the provide the cloud provider Worry about that right Same thing with you know DC OS will get into I Have so many challenges within the business. I don't want to worry about infrastructure I don't want to worry about You know those those spinning volumes. I don't want to have to have dev ops I want my engineers as many as I can Concentrating on solving this problem because quite honestly we have a big challenge ahead of us Right, you know, we got we like everybody else. We got Google and Facebook staring us in the face right so That's how we solve the problem now. There's also other advantages, right? one of the things that I Want to go into here is that There's a couple of things within all data solutions and data lakes. I can't stand copies of data right Everybody in an enterprise especially at NBC copies data and they bring it into their own little toolbox And they do it for a lot of reasons number one They can't get the data at the right form, you know, they can't get the data at scale in the way They need it so they take a copy and they move it right and then they do their own transfer. That's horrible number two How many of you have ever seen within your data lakes where it's like the Roche motel data goes in but it never comes out right and I contend all data is worse than no data Right, so, you know, we can set an expiration date on s3 and it just goes away Right and unless you're really savvy with a Hadoop cluster How much of us have really implemented multi-tendency and you know user permissions. It's really at they I Got access to the cluster and now I can see everything Well within s3 or any object store. I can control based on policy Who sees what file at an atomic level I can control it by the directory I have all that permissioning stuff It's all built in and I don't have to do anything other than create the policies Right, so I get a lot of advantages here so like everything else even though I've been spending a lot of time with an audience studio talking about Parquet files in s3 We do have to have other data stores in our lake, right? We do use Amazon Redshift. That's our only cloud locking technology and I contend that it's not It's nothing more than a postgres database with a colon or base data engine Right, I can get that out of Vertica Right if I want to pay for it. I can get that anywhere. It just was cost-effective to do it Right, and I needed that enterprise data warehouse to put a GUI in front of Right for my ops teams Run postgres run postgres because I still have you know relational needs those things don't go away guys Right if I'm gonna build rest services. I'm gonna have access control. I'm gonna have these things There's still we have to relate objects together. So you utilize what's best for it So postgres exists in our stuff and yes, we use Cassandra We use Cassandra to go ahead and link in our data digital data Expects so that we actually can do the targeting for the digital layer. All right Next slide so Part of what I also wanted to do when I built out this platform Was I wanted to do and service all of the aspects of the big data for the enterprise? Right from custom UI solutions, which is what our operations teams do right? This is typically what we all do we build our rest services. We put a UI in front of it Right, and then we give it to our ops teams right and then all the way up at the top are Engineers and like myself and our data scientists who want to attack the big data at Scale right and generate the code to generate the algorithms generate the ETLs, right? and so the complexity goes up and Your your you mount a model data right at the top. I'm dealing with the atomic level What I wanted to do was build out an infrastructure that would allow Everybody within the sphere here interacting with the atomic data Right and by the way, I don't care how you do it I'm going to offer you certain what tools you use Because one of the things that that I led me to believe you know to believe that a lot of the data copying happens Is because they all have their favorite tools at NBC Universal we have a tremendous amount of data analysts okay, and data scientists sort of okay that are providing huge Value to our customers. They're doing custom reports. They're doing analytics. They're doing stuff for our customers They're interacting with our clients Okay, and guess what they're not engineers Right because first of all we're engineers. We shouldn't talk to clients Second of all, you don't want to pay somebody like me to do that There's no need to it So these people have different to have non-engineering degrees and they have tool sets that they're used to SQL being one of them right Micro-strategies Tableau these things are not open source, and I know they may be bad words around here But this is the reality of the enterprise These people are providing critical roles that we constantly ignore But they still have to do their job so they copy the data Right, so I wanted to open this all up and allow them through the power of DC OS and spark to attack that data right at Scale or if they don't want to use that if they have their own since I'm on s3 Let them go ahead if they want to run a snowflake. Don't know why Fine spin it up. I'll give you your keys have at it Just utilize the data that's in the lake if you want to augment it Write it back out somebody else will use it. So how do we do some of this within DC OS? Right, that's my little data pool Right, so raw data comes in Every data that comes into audience studio comes into s3 Right, it's it lands into our data lake. The first thing it does is our DC OS cluster through our ETL Processes kicked off by Kronos and stuff pick it up and Immediately transform it into a parquet file, which is an exact match of the input data This way. I don't want to hear. Oh, I need to get at the the original data Now nobody needs to do that kind of string processing, right? I mean CSV is CSV for the love of God, right? We can all just deal with parquet files, right get a much more efficient view of it If you need to we can talk right and then We're gonna start linking the data together and we're gonna start pulling all of the aspects of that stuff so that we can get that 360-degree view of the consumer I talked about earlier and We link all that data and all of those intermediate steps are also written out To s3 and made available to the enterprise to attack Right, and so then as I mentioned I Honestly, if you want I'll give you access might you being the enterprise To the s3 parquet file so you can use whatever you want you want to spin up your own stuff in the cloud Go have at it, but I'm also going to go ahead I'm going to offer you data as a service with the rest API I'm going to offer you audience products, and I'm also going to offer you a sequel interface Through a spark thrift server that we contributed to the community That you see here I'm doing okay So, you know one of the things that we do this our analytic tool sets We utilize Zeppelin notebooks tremendously right It's it's it is truly a very powerful tool But the thing that I surprised me the most about Zeppelin right, I don't know how many of you have used it, but You you you're shaking your head right? It's very powerful tool And so people engineers like myself some of my data scientists not all of them Very comfortable with it. I'm like sitting there going but wait a minute three lines of code I'll give you the three lines of code you can now attack it with a sequel interface more importantly I can do in-memory outer joins God darn it. How cool is that and When we go back to that slide about the enterprise that may be cool But I discovered that although I consider it simple. It's not simple for people without engineering degrees Right, so that was a real shocker because I really thought through notebooks We had the whole solution solve, but we don't so we go ahead and we go ahead and we give you a gateway a Thrift server a spark thrift server through DC OS and now all of a sudden I don't care what BI tool you have if it talks sequel plug it in and now what we've done with this is we've opened up the power of spark Right that kind of big data analytical capability or querying To the entire enterprise just through that simple enabling technology Right, and even some of our people like using sequel editors like the beaver, right? I don't care They can have at it Right, and then we have our spark cluster Right, it's it's truly been Good for the enterprise, you know, it's it's you know, I one of my colleagues today said well You know a lot of the data gets copied because it's just inherent everybody wants their own copy And my answer to that was well, there are certain things I can't do but I can eliminate the excuses Right, so if you know one of the main ones is I have to copy the data because I I don't want I want to use my tool set I just eliminated that Right so Now lastly DC OS Greg is finally sitting there going. Oh God finally fucking right, right? We run everything on this cluster Right from our management our CI Right our log management through elk Everything right our compute side spark zeppelin play Kafka, right up until They called me and they said hey, we want to talk to you because you're running our smack thing I with the exception of the a we're running smack and I was like oh, we are I didn't know just seem logical, right? I think all of us a lot of us came to this conclusion before they named it was just natural within the power of the engine here And then we have our data stores where s3 sits off to the side now Interesting for the first time and this is one of my last slides, and then you guys can happily go for your Booth crawl and your drinks You finally see the dirty little word Hadoop Right, you just served me go on for the last 30 minutes about how I don't run Hadoop. I run a very small Hadoop Aspect onto my cluster and why is that? Well, I need to get my spark history. I Could write my spark history to s3 Problem with that is that doesn't give it to me interactively. It only writes it at the end of the job Right, so I do need to run a little bit of a dupe cluster within DC OS And you know when I was initially talking to some folks at mesosphere They were like yeah, that's kind of why we originally put it in there, and then it kind of grew from there, right? So that's why we do that And then we have Cassandra We're not really using a lot of H base at this point So I kind of went through this reasonably fast Landed the plane in time so you don't get upset with me. Do I have any questions and did you stay awake? Hopefully you stayed awake Go ahead. Oh Yeah, they wanted me to know if I wanted to talk to you and I didn't get back. Well, we can talk I Asked this gentleman. We can talk. I don't I you know, it's interesting. I've never really associated with the microservices piece because We're fairly new to NBC we being the audience studio engineering team And so we're a bunch of digital advertising veterans. So we've kind of been doing this kind of technology for a while It's just new to TV But absolutely in terms of Bottom line sales when we commit one billion dollars one tenth of our linear sales to being data bit driven and hence Microservices driven. Yeah, it's had a direct effect on the bottom line. Okay. So, yeah, absolutely right on the revenue and and how we do and necessarily how we do things Right, I mean, this is a big change for us on how we sell advertising I kind of glossed over on it But you know, it's it's it's hard for an organization When you have that much top-line revenue and you know, we increased ad sales this year 9% that's been in the paper so I can say that okay, I have to be careful sometimes and And it's very easy to rest on our laurels Right, we're kind of leading the front of this because we see the tsunami coming at us And if we don't change how we're doing business Right instead of that 10 turning into 20 Right, it's going to turn in the eight And that's not good Right. I really enjoy working for this company. So, yeah, it has revolutionized how we do business Any other questions? Yes, sir Excuse me Okay, we're at about a we're just starting out. So we're at about a hundred terror or so We process data both in real time So all of our OTT and digital assets come at us in real time and then add into that audience graph All right, and then a large part of just the nature of Linear TV is its batch So we get a lot of stuff that comes in batch at night and that's really also by the way We're utilizing technology like DC OS Really allows us to save money and get some cost effectiveness, right because the very nature of what the cluster does Changes so much from our night time overnight batch processing, you know The real time kind of goes along kind of constant state. It has its ebbs and flows as you know, people are awake, right? But you know the nature of the of the same stack changes at night It's running these UGTL processes stitching all the data that gather that ran together viewership data You know when you watch something on your cable box eventually, I'll see it Right if I if it's part of for some of the providers we deal with By the way in a totally autonomized way, so I sir I will not know that you're watching whatever you're watching okay, but It comes into the it comes into the cluster at night and then during the day our data Scientists are data analysts are attacking the data providing value to our clients so it changes much more into an analytical mode and I don't have to have Several clusters to do this. I have one Right, so I hope that answers your question any other questions. Oh Yeah, oh, yeah, that's that's the guaranteed part Yeah, yeah We have that we have this bad word in the industry. It's called to make good. That's when we that's when our optimizer Okay, and our and our forecaster goes oops and we have to make good It's okay if we over deliver the audience But if we don't on if we under deliver the audience and we have to make good on that We don't like doing that. So yeah, we do have after pacing reports and everything else that are made available to the clients All right anything else? Well, then I do believe it's time for booth crawl and drinks. Thank you so much