 Right, so basically this is why is data important to product management, which to some extent you probably all already know I'm also not going to try and give you the hard sell on any products But we are going to do a bunch of demos interactive things that maybe have some free solutions for you to do some stuff You know try it, you know at your at your own peril. Let's say that So very rough agenda because I think we're going to get way off this real quick I want to talk just a little bit about why I think showing your work is maybe the most important part of product management And then we're going to kind of go through some case examples Planning a product with data Instrumenting a product with data maybe making analyzing the product and the users and maybe we'll make a decision about a product But in reality you should probably know a little bit about who I am and then we'll get into the meat of things So I'm a product manager at Google. I've been at Google for a few years I work on two things inside of Google that are effectively the same thing The internal thing I work on is called Dremel. It's Google's data warehouse. It's where we store and analyze a lot of data Suffice to say it's a lot of data We also externalize that as a system called big query And I'm going to use big query as part of my my talk today if you'd like to use it, too Awesome cool. It's free. It's free for a lot of use cases. So go for it. You don't have to Before that I worked as a product manager at Oracle Done hardware devices. I've done You know shrink wrap software. I've done cloud services Before that I was an engineering manager and before that I was a researcher I've got a PhD in computer science and I did a postdoc in what's called complex systems It's complicated Suffice it to say all of that is really like you can look at that and like maybe I'm credible Maybe I'm not I'm not sure But I think you all probably already know that product managers are kind of weird, right? We all came from sort of odd disparate backgrounds. I'm no different So this talk really is going to be about playing breakout And so I would like you to all take your phones out and go to this url And I would like you to play breakout. I am also going to play breakout. Oh Right sir. You guys did that url. Oh, I'm a terrible person Hands up when you got it Big phaser dash demo dot app spot dot com and the motivation here is I got excited about all I'm going to get to talk to these folks Who want to be pms I should make something for them So I made you guys a thing and and this is going to be kind of where we take our case examples But I want you guys to do something with it first It's going to ask for a player name Doesn't have to be your real name doesn't have to be your email address. It can just be a name you make up And if this works successfully you should get a little breakout game I encourage you to play it And get frustrated and maybe play it again Hands up if someone's gotten here yet actually all right great cool Can I play breakout too? All right, I'm I'm going to play too Oh, I didn't want full screen. Well, we can do full screen Right, so if you get here you'll literally see breakout Maybe not the most exciting thing I hate to say it. This is the rest of the talk. It's just just me doing this You wonder what I do at work all day you're looking at it All right one more life to go All right game over I guess I'll I'll stop I'll stop So so what does this talk really about? So it's really about the fact that when I'm asked what do I do as a product manager? And this is when I'm interviewing other product managers This is when I tell people I'm a product manager. I have a terrible time answering The question what what do I do? Because sometimes I'm a janitor and sometimes I'm a strategic thinker and sometimes I'm the person who writes the support email But the one thing that I I do as a constant every day Every day I analyze data. That's that is the thing I do and maybe this is Maybe this is something that is more common at google than it is in some other organizations But in previous positions as well the one thing that was a constant is I analyzed the data And so what we're going to do today We're going to do some demos we're going to talk about some concepts related to those demos And it's all going to be in the context of this little breakout game that that we've got here And so if you get bored if you think this is uninteresting you can now just play breakout and so you know you can't be that bored So is this a talk about data or big data? I mean it can be if you've got a lot of data, but I think the the biggest thing to remember If you if you are a product manager for a software service or anything else is don't worry about the big data thing Someone will ask you about it. Just remember you're going to have data might be big might be small doesn't matter Um, and even if you get to big data, you'll all you'll still just have data. You just have a lot more of it um And oftentimes the most important thing is being able to join data together Particularly data from different contexts. And so I think as product people One of the things we actually have to strive to think about is As we and our teams collect data Is it going to be possible for us to put that data together in a way? That might actually create some value for us later on Now you may have to do some negotiation with different teams and different people to make that happen, but uh So why why do I think this is important? Why is this the thing that it literally I do this every day versus all the other things that I do only a little bit? um credibility basically when I when I got into product management, um The first thing that someone told me is like well look at you, you know, you're going to have to establish credibility You have a background that's very technical So some people who are engineers will find you credible The people in sales won't find you credible because you've never sold a thing in your life true, um And one of the great challenges I think in product management is that you lead through influence It's very rare that as a product manager You actually own All of the sort of everyone reports to the product manager like maybe you'll have a couple product managers report to you Um most of the things you need to get done you have to get done through influence um And if our work is then about influencing people to do the right things Engineering's got to build the right thing marketing's got to help you craft the right message Sales has to execute the right play and in smaller companies you may wear multiple hats and do some of these things yourselves But you're still going to have to influence somebody And not all of these groups will find you instantaneously as credible as some others And so if what I need to do as as a product person is establish credibility Um, I have to figure out how to convince most of the people most of the time I'm not going to convince everybody all the time Um, but if i'm trying to get these things to happen Um, and my background alone is not enough There's only one thing and it's I will say it is weird to say this in the era of fake news Um, but there's only one thing I can think of that is credible to to all of the different stakeholders that you have to influence And that's facts So where do you get facts from right facts don't just emerge from the ground fully formed right so the way to get facts There's actually a really simple formula for it Get some data Hopefully it's good data oftentimes it will be terrible messy data and you'll have to deal with that as a secondary problem analyze it And then share the results of those analysis And then the important part the part that turns it into a fact is that you show your work Because if I say oh guys i've analyzed the data and this is the case What does that mean? I analyzed the data. How do you know I analyzed it correctly? How do you know that I even know how to analyze it? If you show your work if you allow other people to see Here is the data here is the analysis put them together We can have a conversation what you get is consensus and that in fact breeds a fact from data and analysis So that's the point right that is that is in my mind this this is why this is what I do every day And this is a slide that basically says that It also changes the way you disagree with people when you're trying to influence them because your disagreements become Actuals You dislike that I used 4.5 instead of 7 for the constant You dislike that I misunderstood how to analyze this particular thing. Okay, that's a that's a thing we can go and and You know actually debate and fix and come to a new conclusion Um and they'll always they'll always be abstract. Should we build a platform? Should we build a box? You're always going to have those But they're going to get less abstract and you're going to get fewer of them about the little things if you can get some facts And now we have a great recipe for making facts All right back to break up So I want to I want to kind of do this as a scenario and we'll talk through some examples and things like that And so the scenario I have is let's make a video game and if you don't like video games I'm sorry. It's the most approachable thing I could come up with for an audience of people. I've never met Um now if we were going to make a video game We could just start making one But if we're really going to be data-driven product managers We should think about the product the problem space and we're going to have stakeholders that we want to we want to convince about this How much money could we make? What kind of game should we make? What would our potential ratings be? How will we position it? These are things we'd like to know So we're going to do that And we're going to go and we're going to find some data You know I got some I got a csv of some video game sales data that I got off the internet I've got a web scrape of some ratings from a web from ign.com I'll just go ahead and put them in spreadsheets and we'll just kind of figure that out right and Like maybe you know I'll have the spreadsheets and you won't have them But you know I'll all of analyze the data That doesn't help right that doesn't actually help grow a basis of facts for people um so Before you can even show your work you actually have to set up a workspace in which you can show your work right so um organizing data in a way that it can be found and shared between yourself and your team is really important um and organizing data So that it could be analyzed by different tools is important as well because if I know how to use a spreadsheet and you know How to use ipython notebook and he knows how to use sequel like maybe we need a way that we can all work together um Now this is the part where this is going to sound a tiny bit like a product pitch and then we're going to move away from it um Big query and dremel at google to some extent were designed to do this Part of the reason that google relies so heavily on this for our own use is because A shared ability to analyze data and broad sets of data that are not necessarily Owned only by me but owned by me and owned by you and owned by him and joined together Is important for making our decisions Um so the the thing that I work on I sort of said big query is the externalization of it right so It's managed you don't have to worry about any infrastructure It scales really big because we use it to It's a sequel tool and I will say if you don't speak sequel you should probably think about learning it We're going to look at a little bit of sequel and then we're going to go away from the sequel because you don't have to Know it but you should know It's encrypted. It's highly available. It's got all these sorts of things the biggest thing I think for the context of this audience is it is free For the first certain amount of stuff and you literally don't have to manage anything You can just start to put data in it and do stuff to it Like I said, it's it's free for a lot of stuff So if you want to go play with it after this cool, I appreciate it So let's actually analyze some data So questions we might ask right Who is the biggest biggest publisher by revenue? Is it nintendo is it yay, right? These are interesting questions Could our little video game startup ever aspire to be as big as them or sell to one of them? If we're really going to go do this video game thing How much money are we going to make can we make a million dollars? How many companies make more than a million dollars? How much can we make across the world? These are interesting things we might want to know just before we enter the market the kinds of games We might want to make we might want to mark make as well Um So here's a sequel query. This is not super exciting. Um I'm actually just going to get out of that and Oh, no I've got an exit full screen. Oh, I could just do this. Okay So this is this is big queries friendly web user interface. It is super tiny. So we're going to zoom in um Now I mentioned I have these spreadsheet or these csv these spreadsheets I found like okay Let's say I want to start with them and say Hey, I've got market data I'm going to make some uh, I'm going to add some create a new table Um file upload I'm going to choose a file Uh, I think I've got video game data video game sales. It's a csv Um, I don't know what's in there. So just automatically detect it. Um Maybe it's got a header I'll say Just do that No, it's not loaded yet There we go loaded now But so basically I gave it a csv and it tells me a bunch of stuff right ranks platforms names of games things like that Cool. Now I can actually answer some questions So let's go back to This one And this is really okay. So this is just who's the biggest publisher right? All right, so you don't you know sequel if you don't know sequel Maybe we should actually figure out a way to to do some some sequel So I let's presume I don't know any sequel Well, I know I want to query this table with game sales and let's just go to the this is the same table with a different name um I'm just going to say query table and it went and wrote some sequel for me Which is useful um because I'm trying to learn sequel. It's nice if something kind of helps me out along the way um Well, let's say I want to know who's the biggest publisher um, and I want the sum of Global sales No, that's not that's not very useful. I probably should order that Ah, okay, it turns out it is nintendo not electronic arts That's that's many millions of dollars that they that they would make over the course of this data That's a good question to answer Um, but I've also got another table that I uploaded which is these video game ratings And I've got you know, oh, this was a painful game apparently she apparently shouldn't play star wars yoda stories Helpful advice to folks who are maybe looking forward to it before the next movie Um, I mentioned joinability is a big deal right and so it's one thing to be able to say Oh, I chucked in a csv. I chucked in a csv. I chucked in a csv. What is the point? Why why did I chuck? Why did I chuck in these csvs because it's fine to say oh, I counted up with the total amounts of these things Or I could go do that in a spreadsheet But what I'd really like to be able to do is I I took these two sort of random pieces of data that I went and found And what's similar about them right? Oh, I've got a name here, which is the name of the game and game ratings has a title All right. Well I went ahead and whip whip this up the other day. I could say top rated games Right where now I can sort of say I've got this game ratings data that I scraped I've got this sales data that I scraped I'd like to put them together if title is the same as the name Put them together and now tell me the average rating it got as well as the the amount of money it made There we go all done What did we learn? Grand Theft Auto 5 made a lot of money But one of the things that's interesting for us as we're thinking about market space is we can start to say Well, are there genres that tend to do better in terms of ratings or in terms of revenue? Are there smaller publishers that publish in certain genres that do particularly well? Are there people we can emulate in this space if we're going to fast follow? If not, is there is there a real hole in there that we could maybe meet? Is there is there a need for mobile arcade coin ops that are terrible ripoffs of breakout? Maybe maybe and maybe there's money to be made from it and just to to Because I because I did ask myself the question in the slide. I would like to know you know, how many How many companies make more or less than a million? So the over-under on us making a million dollars is actually not that bad So maybe maybe my breakout thing is going to be a hit. We'll see Right, so here's top rated top selling which we just looked at But here's the big thing right? I don't know sequel We're going to take questions at the end if you don't mind just just to make sure we stay on track because I don't want to I don't want to just just you know disrupt our our camera audience So the little tool we were using I was able to click and assemble some sequel But maybe you don't know sequel right? Maybe you're comfortable in excel Maybe maybe you just needed a report to share with people you want to play with things visually That's fine. And I actually know product managers even at google who Do not think of themselves as sequel experts or or even data analysis experts that have this need And so there are tools for this That we can leverage against platforms internally like big like big Korean Dremel and out in the world There are other ones you can find as well Again, not a hard not a hard sell here But one of the things we make that is a totally free tool is a thing called google data studio I'm going to show you google data studio now So google data studio is a free reporting tool. Basically it makes charts for you based on data So let's make a report real quick So it's going to ask me To select a data source. I don't know what it is. So i'm going to make one Whole bunch of stuff right if you've got stuff in ad words or in youtube or in play store or whatever Pulls that stuff together since I was using big query. I'm going to say I want this stuff from big query I was keeping that in my big phaser project. Whoa don't need that All right, let's look at that market data and maybe we want to look at We'll look at game sales and we'll call this So it's going to go through and say hey look at I connected to that. I've got a bunch of stuff for you I'd like to add that to the report I'm going to go ahead and add another one just because I feel like it's a good idea So I'd like to get ratings data in as well Say i'm going to do Yeah, okay, here's score ratings and maybe I want to do sales Chandra Corporal sales Well gosh these score ratings don't really help me a lot with understanding how much How much is going on there? So maybe I should add genre here as well So I can understand what kind of ratings go to different sorts of things I can see Okay, for some of these things great good, okay Like these are things that people might say about the game We might make and we can kind of see how they're done action games. Maybe get better praise than do Say shooters or strategy games or puzzle games. I didn't have to know any sequel Now the thing I think is kind of important about that is as you work with your engineering teams With people you work with in sales in marketing With other product managers if you create a shared repository of data wherever it happens to live It's important to be able to say you bring your tool. I'll bring my tool By virtue of us both being able to work We can show each other our work and we can get to a point where we can agree on facts And so if I happen to know a whole bunch of sequel and you don't hey, here are these things Here's data studio. Go play with them. Let's have a conversation. Let's actually build that consensus All right, so we we made a game Great. You played it. It's breakout. Is anybody still playing breakout? No, you're all with me cool um So once we've got an mvp and we've done some market sizing data Um, you know, we have an idea like I've somehow convinced everybody that like a tiny mobile coin Free to play thing is going to be the next big hit But we would really like to know what people are doing, right? So we would like to instrument our products. We would like to get data about stuff We'd want to figure out if our users are happy Right obvious ways we can do that. I want to know how my users are feeling about stuff You can always survey users right now methodologies for surveying and how you do it may vary But you can always survey users It's a good way to get to happiness and a lot of times when we want quantitative data at google about how users feel We survey them because sometimes there's no better way to do it Um, we also care about what they're doing in the game, right? We care about that sort of event level data now We think about that What we could do is we could ask our engineering team like hey, give me a dump of all the times this happened Or all the time that happened But the nice thing about choosing good shared repositories is That your collaborators should be able to write to them as well So rather than saying could you just give me a dump of some data you ought to be able to say to your engineering team Hey, is there any way we could start logging event data to this place where we keep our market data Where we keep our advertising data and so on and so forth? In the case of something like big query we make this really easy. So for the game Not that anybody needs to know javascript or anybody needs to be a programmer to be a product manager Basically you drop in a chunk of code like this and now you're in a position to send any event you want To big query life, which is cool um So the point here though is like by extending our framework right by extending that footprint. We have new things to talk about So for example Well, you guys were all playing breakout I've been Logging data and this is not necessarily data. This is older data, but there's data from you guys as well so let's go ahead and Forget the sequel part for a second Like let's actually go back to that data studio report since maybe we're not comfortable with sequel And add another data source and we can figure out what people are doing As you'll see i've got you know the score at the time of the event the number of lives when it happened I make that Yeah, we'll do that that way um Who the player was the events the text the number of bricks they had at the time This should give us a little bit of an insight as to when things are happening Let's go ahead and minimize that and say Now this line is going to show you pretty much What you expect Now that said this is you know back when I built the first version of this and this is today when everyone's playing So let's add a filter Just today the 25th 24th, I don't know somebody maybe should tell me um So now we've got some people with bricks remaining and scores that went up We've got a little bit of player information. Maybe I care about some different stuff though. So maybe we want to do A bar chart in aggregate where we say Yeah, here we go. We've got players and scores. Wow somebody did really well I don't know who nav is but nav did really well So now we know a little bit about how about how people are doing See what happens if we add lives as another metric Not super interesting By the way, I hadn't actually analyzed this data before so we're actually kind of figuring some stuff out um So could we come up with some facts from this if we had to have a debate about like well, you know Do we not give people enough lives? Can they not achieve a really high score? Well, some people achieved a much higher score than I've managed to um possibly um So the other thing we could do as I said is we could survey people So you all played breakout which I really appreciate I would now like you to also Go to a form and fill out a survey about breakout So phones out one more time you can play breakout beforehand again if you want just to get that reminder that feeling for how awesome it was So why am I why am I making you do this right one because if i'm always analyzing data I'm always trying to get more data and so even in the course of this talk. I'm trying to get data from you I will not capture your email addresses. I promise um There's no pii shared here So to fill out the form i'm not tracking your email or any pii about you You will find out when you take the form that i'm going to ask you some very simple questions Now as you take this form, I want to I want to just briefly step back and kind of summarize The kind of data we've looked at just in the course of the talk, right Market data that helps us inform broad decisions about where we should go Easy to pull together into common repository Now the thing i wanted to add to that was instrumented data about what people are doing Okay, what people are doing we've got that we can now look at that relative to that market helps us make better decisions about What's going on in the product relative to the space However, what we're talking about now is user experience and user experience to some extent actually requires users give us additional information And so if you're filling the form out, and I hope you're filling the form out and it actually all works You need permission. Oh my gosh So tiny yeah, so tiny url and google docs doesn't always work super well. Um, we can give it a shot real quick There we go now it should work if you refresh it it should work now And yes, I cheated I used google forms So there's the url again for those of you who hadn't gotten there I really I really want a solution to the fact that like like google short links are not as good as tiny urls Like it's it's really bothersome to me. Um, I haven't been able to convince them to not do it this way Yeah, you know it's a really as a sidebar as people are perhaps taking the the survey one of the things I found with this with sort of user experience research is the longer the longer the survey The less the less you're going to get a bunch of users to take it out And so when you get really serious about it figuring out how to statistically power a Survey is important and assume that every minute a user spends taking it you're going to lose that that percentage of users Now there are situations in which you can say I really need this qualitative data There are market research firms you can collaborate with and say I need 300 responses from this class of people And they'll help you take your survey and and guarantee you that number of respondents, but you're going to pay per respondent so Generally speaking a lot of times when we think about doing these sorts of surveys with our users We try and go really broad and hopefully get enough that we get something statistically significant However, if I need if I need something that's like truly powered Like I will actually go out and ask a research company to give me that But what I'll ask that company interestingly is I don't want you to give me a slideshow about it I want you to give me the data set And you can negotiate with these companies to get that data set And then you can put it in a place like this. All right. Did some people fill out the form? Cool So Oh, yeah, a bunch of people filled out the form cool um So one of the things you get with with google forms is the answer can actually be your responses can all go into a sheet Right, and so here are our responses Cool. They're in a spreadsheet all of the rest of my stuff is in big query That's not helpful um So one of the things that I like about big query that we do a lot inside google is allow you to analyze disparate things without them necessarily having to be stored in one system So big query in fact will let you um Define a table over a spreadsheet So you guys could continue to update this form or other people could update this form because they found my incredible breakout game and They could actually You know they could give us some information in that spreadsheet or you could go and edit that spreadsheet and make some changes Because you wanted to change a particular column value or something like that and Let's query that table Let's say I want to do Average number of stars obviously. This is not super sophisticated. You could have done this in the spreadsheet but If I wanted to say give me average numbers of number of stars Is it not called rating? You should know my data better score Okay, so My average rating is not quite like it is versus the rest of the industry, but it means there's room for improvement So what did I do? I analyzed data from a survey in a spreadsheet With market data that I found externally and I brought them together and if I wanted to share them with you I could really easily do that. I could in fact say Share data set Oh, I could share this data set with you guys Like anybody I wanted could have access to it in fact Since I don't know all of you I need to zoom out I could make this available to everybody if I wanted to I could make that data public So now I've got a way to actually take that user experience in that data that's sometimes qualitative and bring it in or is Qualitative in the individual but can be aggregated in a quantitative fashion. I can bring that into my discussion around What should we do? um And basically All of this is I hope this is somewhat resonant right is that once you've started to build a foundation Even if it's just like I have market data or I have logs data There's a whole bunch of other stuff you could do This by the way just talks through what I talked through. Um, there's a whole lot of other stuff. I might care about For example, if we looked at that and said like man, nobody is scoring high enough Like maybe we need to double the point values We could store our build logs for like when did we make changes to the data? To the game itself we could store those build logs in that data set and in fact They're actually all right here because i'm running it on app engine Like I've actually got every web request that hit every error that got thrown Let's see how many shameful oh no errors Well, that's no errors today you go back here when I was building it Oh, there's a lot of errors. There's a lot of errors. I'm no longer a professional developer Um You know how many web requests I got I've got now that information. So if we were now talking about well Should we have double point values? Well, maybe we should what was the change before and after if we wanted to do an ab test We could roll out two different versions of the game compare their activity logs versus players We're bringing all of that together in a shared foundation Do we want to make a change to it? Anybody have an opinion based on what we've looked at? Yeah, let's make a change to it All right, so Wow, this is super tiny now um Ball hit break score plus 10. Let's go ahead and make it score plus 20 Now nobody cares about how good my javascript is So now i'm sort of put on my developer hat. I made the change that p.m. Asked for I'm going to go ahead and deploy the new build and so now what'll happen is we'll get those application logs from the new version We'd actually go and look version to version if we wanted to we could come back and figure out Oh, that's when the new version went up. Now. Let's look at average scores. What happened to them? Yes, let's go ahead and continue It's not my first deployment, but it may still take a while. I'm not going to make you guys watch that But if you come back you should find that your scores go up a lot faster Um, and I'll have to go and look after the fact to figure out that was meaningful um So application logs service logs these are good things to have your engineering team We'll probably want to have them if you can give them a shared place to put them Where you can also analyze data that's relevant to your purpose You're going to get a lot more traction There's a bunch of other stuff you might care about right if you're marketing through an external system if you're using sales force Maybe you want to pull that data in Billing data cost data not all product managers will own pnls But if you happen to own any kind of costs having those Next to what's going on with your users what's going on with your product makes it a lot easier to explain Why did we spend this money or how much money do we have to give people back? Worst thing is a product manager. I've got to give you your money back because I screwed up badly It's been known to happen To me and others So in general like I hope that I hope that this is becoming preaching to the choir like yes Of course, I should have all my data in one place like yeah, it shouldn't really matter What tool I use to analyze it as long as people can analyze it and find it So that kind of brings me to wrapping up and I have no idea where we are with respect to time So hopefully it just means more q&a But the point is and and I've learned the hard way Your background your charisma and your your gut product feel is never going to be enough To win all the arguments you need to to make a successful product You need facts and there is a recipe for facts as weird as it might sound It's data and analysis and showing your work By the way as far as showing my work, I'm happy not only to share all the queries that we've done today But the data sets if people are interested more than happy to share them with those Right and again, it's not enough to say I analyze it like I'm offering to share my work You guys you guys should offer to share your work with your your collaborators as well And I guess if I was if I was to put a button on it effectively As a product manager you should think about the data you have Collected in a way that is joinable and analyzable as if it were capital Because if you do it will grow dividend for you You can leverage it in ways that you cannot in isolation All right questions I got a question. Did anybody actually enjoy breakout? All right, cool Because I I enjoyed making it quite a lot Yeah, absolutely absolutely No, I I so I would I would argue that working Working with data is the sort of thing that you know There's a point where like you probably want to go and learn some stuff and read a book Tools like data studio are actually a really good starting point because you probably already inherently understand a bar graph and a pie chart Right. I mean we've all seen these things before we all have a relatively good idea how to how to understand them I think what happens is the questions and gender Questions that provide answers in gender new questions, right? So if we did the bar chart and we saw wow this person scored 6,000 and everybody else scored a really really small amount I might want to say well, how was their behavior different? Now if I wanted to answer, how was their behavior different? I might find that in fact I didn't know enough for the dashboard into all I was using wasn't sufficient to answer that question At which point i'm going to have to go and learn something new Um, it's part of the reason we use systems like BigQuery to sort of auto assemble queries for people But you could also put it in a spreadsheet and figure it out from a spreadsheet and then take it from spreadsheet to a SQL query The other thing that really helps a lot with this and I actually Particularly with newer pms or pms who aren't experienced in SQL when they come to google At some point they run into the dremel team, which means at some point they run into me Um, and they'll have a question around like well, how do I access this or I need to count up all the these And one of the things that works really really well is having A knowledge base that you can share with people so another fun big query thing Say suppose I had saved This query you didn't know how to join two pieces of data together. I've got this query which shows you how to join it I can save this as And I can share it to people right and so I could share it as a public thing I can grab this link send it to you And the interesting thing about a language like sql in terms of learning it is that Unlike python or c or a number of other languages like you don't have to deeply understand the language and how it works You simply have to understand that it is declarative. You tell it what you want so People end up tweaking it right like oh well dan sent me this query about this stuff Says average score Some of global sales well I want average global sales does that work? Try it. Oh, okay. It does work. Um, I want to order it differently. I'll figure out how to order it differently People basically build up these libraries of like sql script Somebody gave me that I have tweaked to do my job. And so you can kind of learn by doing in this case Now if you want to get very serious and get a data scientist or talk to a statistician By all means do try and get something reusable out of out of them something you can tweak yourself So so yeah, so specifically the sales data comes from Kaggle.com There's a there's a data set in Kaggle called video game sales and then the ign data It's a web scrape that's out there on like some public site and I was like I'll have that. Thank you The rest of the data we generated right the rest of the data came from the form and came from you guys playing breakout And came from my app engine request This sql the sql I wrote But the data sets I downloaded yeah and and public data sets are actually a really useful thing for product people right and and Google offers a lot of public data sets amazon offers a bunch of I so I downloaded these csvs and just shoved them up. Yeah I could go and make it a public data set Like I just have to say share it publicly and then voila In some areas it may not yes Price product Yeah, I mean it's interesting because with enterprise stuff you can get data sets from Analyst firms right and things like that and so sometimes sometimes what you end up with is like I may have valuable data That somebody paid for right so I work on an enterprise product Gartner matters a lot to me Gartner makes available certain data sets to me for pay. I want to keep track of them I put them in BigQuery for myself and my team to analyze It depends on the product to some extent right I mean for something like an enterprise product it may be a more It may be slowly moving It may be we have to make a decision at the beginning of every quarter I've worked in jobs where it was a hardware life cycle where you had to make a decision once a year In something that's a service like you may have to make a decision right now If it's if it's something where you're running a service and you make decisions somewhat frequently and an emergency happens You may have to make a decision right now if you're running software as a service and Turns out you have a huge outage like you probably need to understand what happened right now to your users And so it's going to vary Don't think of the frequency with which you should analyze data In a particular way as being the sort of gating thing the reason I analyze data every day is that there's always a question Right every day invites upon it A question this broke who did it affect like this this worked really well Why are your sales numbers down like who's using this new feature like all of these things come up Any given day right and so when I say like I analyze it every day. I don't know what I'm going to analyze I just know I'm going to Wait, I got I got to so what do you and then you Mm-hmm Videos are really weird So the question was how if I had billions of video files, how would I do it and video presents its own kind of challenge? So some things like metadata about About video right can be stored in sort of a tabular format like this you can analyze it People use tools like dremel or google to do that YouTube has their own particular thing that they built for that But then the video stuff itself there's a whole other sort of thing around like video forensics and picking out objects in it And we've got machine learning guys who work on it and it's effectively a separate problem Now still need to analyze that data, but technologically kind of requires a different solution Yeah, to some extent There's some stuff people can do with images and databases that that might be vaguely interesting But like the really good image analysis is almost always a separate machine learning piece Do you have any best practices around synchronization of more real-time data so you analyze every day Mm-hmm It's a it's a really good question. So how do you deal with the fact that you may have data coming all the time? Um, so I I cheat I cheat and I use BigQuery because BigQuery will allow you to insert 100,000 rows a second per table And it's live Which means that I don't really worry about it um In previous positions Generally what I've done is have something that catches a bunch of logs and has a periodic sort of dump and you sort of understand My analysis will be stale up and to this point um Depends on the kind of depends on the kind of engineering robustness you want to throw against the problem And one of the questions you have to ask yourself is even even if you could know right now How long would it take you to act? Mm-hmm Right No, no, no, no, I would just want to append if at all possible I mean part of like this whole notion to me of a data foundation is like Append or mutate don't don't truncate and reload I couldn't tell you the question Mm-hmm Mm-hmm. It's a really And this is yeah, this is this is a really good question. So which data right so oh, it's great Dan has convinced me I need data not really convinced me because I already knew that but how do I know what data is important? um In in my experience, there are two good ways to figure out if data is important One is to ask people like if you gave me the data I might ask you What is this data? Tell me its story because maybe I'll figure out if it's important I run across a lot of situations in which I come across data and I don't know if it's useful um So what I tend to do in that case and this is personal practice I try to answer the question with it And if that answer seems reasonable and what do I mean by reasonable? If it meets my sort of general assumptions around the way the universe works And it doesn't tell me that you know gravity doesn't exist or that I have a billion users because I know I don't have a billion users Um, I will think oh, maybe this is a useful data set I am now going to to take the extra effort to figure out where it came from So at some point though, I may not be able to figure out where it came from In which case part of showing your work is showing your uncertainty And so there are situations in which I've presented analyses to my teams and said Based on this data set whose origin I do not understand Which if nothing else at least you're saying this is the data we have we don't know if it's the best data Uh, okay, I don't know who's who's first we'll go we'll go left to right No, so so big worry basically we will give you 10 gigs of storage for free and your first terabyte of analysis every month for free Um after that we charge you two cents a gigabyte for the first 90 days Then it's a penny a gigabyte and we charge you five dollars a terabyte So everything we did today falls like well under a terabyte. So like I won't get billed for that So that's a that's a good question So the question the question for those who didn't hear it was what percentage of the data I analyze is sort of google-owned proprietary data versus Data that is sort of public The answer I will give is maybe not a tremendously useful answer Because the reality of the situation for me is that the vast majority of the the data the analysis I perform Tends to center around the same sorts of questions. What do users do? What do users want? How much money? How little money? What happened? Most of the time those analyses correspond to a couple of key things for me The logs of what happened the things users said to me Maybe what the the billing system said so most of my day will revolve around those three or four artifacts And then what gets sprinkled in maybe well, how does that relate to a public thing or how does that relate to the news? Right like I have said. Oh, well, here's what goes on in my logs I'll go and join in the dow jones news table to see if there's anything that happened that day For the most part the data you're going to analyze is not going to be public unless you're trying to solve a public problem But if you can find public data it might be useful for people who are in e-commerce or retail or like if you're if your business touches The public world in a big way public data can be super useful Um if what you do is operate like a giant service for google like most of the stuff you care about is the stuff that that thing generates There was another question Linus Yep another question there, okay Uh, what would you recommend it's the front end to import this or to tie this in something like Dean or It's a good question. So so The sort of canonical I work at google I should recommend it is a thing called firebase That's only for mobile. Um generally speaking any language a thing is developed in um So if you're writing your own code, whatever your language is we have a client that will allow you to drop in a couple of lines Of code and write to big query If Yeah, absolutely Well, there's also this notion that like for google analytics like you ought to be able to say like export my google analytics data Same thing for youtube for play store Adwords write a bunch of the google manage sources. We'll just put it in there for you If you want real-time event streaming data, there are other systems like a luma that will do that for you And a few others like most most things that do Streaming ingestion know how to talk to big query at this point But if you're also writing your own thing just like drop us a lot of code in there. It's actually pretty easy So where do you see product analytics as an industry or as activity in five or ten years? I mean, uh, just some context. I'm like the self spreadsheet was developed 30 years ago and uh Google analytics was developed in 15 years ago or so And I guess big queries developed in recent years Uh, I mean drummer was existed at google for 10 years. Um, that said this like sequel dates to 1978 And then recently there are like keep analytics and those companies can help people to Do that and raise the results of engineers to put the code in the left side, right? Yeah, so what are the new things you see like for example with the word you see big query five years and third See the So this is a that's an awesome question. I really like this question So where do I see product analytics and like how people are going to analyze this data going? um I think when I look at how people analyze data you're you make a really good point, right? Like sequel has existed for a really long time spreadsheets have existed for like a really long time And they're great because they're malleable and they're programmable and stuff like that But they don't necessarily deliver the analysis you want right now and in a very simple fashion So the things I get really excited about Natural language processing I think is really interesting in this space It's been a long long held dream in the database world to be able to take a question You wrote in human language and produce an answer for you from a database. That's a really interesting thing Um, I think a thing like smart reply is really interesting as well Dan ran this analysis for you like for you a while back You've asked a question that looks similar to the thing dan usually runs Here's a slightly modified version of it or would you like to open the thing that that you ran previously and tweak it? I think there's a lot that can be done around coaching around disambiguation um Tell me what the average sales for under armor were year over year is under armor a category is under armor a brand Ask that follow-up question help refine the analysis. I think a lot of product analytics is going to Is going to find greater purchase within both small teams and disparate organizations through Smart suggestion of where to look or how to refine Right because I may not know what your churn model might be but I might be able to say like Did you know if you added this maybe this changes? So can you you could you would have to dump from exchange so it it accepts imports from In various file formats And then it will accept it can either federate queries out to other google databases like cloud big table and cloud spanner and things like that Or we're working on some some stuff to make it easy to sort of say like I have some data being made over there Please just mirror it in for me People have written them um google itself doesn't maintain one there are people out in the open source world We've actually built connectors for that sort of stuff Or like there's a plug into So there are so there there are there are partners who do like segment.io will deliver stuff to big query if you want Or we have a transfer service where you can say I'd like to bring in my analytics my google analytics data Those are the those are the typical ways people wire up really simple stuff There are groups like five tran which will basically go and you tell them I want all of these things from these sources They'll go wire it up for you and say there's a big query database go go have fun with it Different ways to do it Oh pushing stuff out of big query, so we're not a triggering system. We're not a serving system in that sense We're very focused on being an analytical solution Yeah, what we what we do see what we do see people doing is Taking event data because we generate an event stream right for everything we do We do see people taking those event streams and triggering workflows with them. They'll use something like Google quad functions, which is equivalent to aws lambda and they'll say oh when you see this table updated go do this thing Right, so you do have people who build actions sort of based on what we do But it's basically taking our own event stream You can you can take that event stream and put it back into big query if you want to have the snake Sort of eat its own tail Okay, so many big query questions. This wasn't supposed to be about that, but it's Can you schedule them? So not currently we expect to offer that pretty soon People do write their own schedulers or cron stuff right like I've got I've got a scheduler that I just run a bunch of periodic scripts in You know this that is that is that is kind of the question I hope somebody was going to ask me today Is this what you think about product management because it's the kind of thing you work on or Because you think it's how it happens and I cannot give you a good answer to that because so much of my So much of my background academically and professionally was always in Dealing with large amounts of data. I was a scientist before I was an engineer That I've always gravitated to these kinds of products Which may mean that I am very strongly influenced by The things I know at my core. Um, I would argue they'd been successful for me. I've seen people Come to product management from say like a purely sales background And they can do phenomenally well But they do I will say Experientially they do ultimately end up needing data analyzed at some point And the question I would I would always ask a prospective product manager is Would you rather be the guy the the the guy or gal who can come to the rescue and analyze that data Or the person who sort of flaps their arms and says like if only someone could write a sequel query for me Don't be that person. So yeah, I'm hugely biased. I am hugely hugely biased But I think I'm biased in the right direction Yeah Gosh You know it changes over time. I guess the question is if I understand it correctly is like how do I see myself as a product manager? um It has it has changed over time. I think Early in my early in my career as a product manager. I was very focused on sort of just analysis and And and just the very technical Over the years I've had to deal with lots of other things. There are ui things. I have to deal with there are sales things I have to deal with there are sort of overall budgetary things. I have to deal with um and What I what I find is inherently I may have less time to analyze data than I used to um Which is part of why it's really important to me to teach other people how to do it And to really liberally share right this this notion of like hey, I made a query that analyzes a thing Share it. Um Because inevitably I should be able to ask for that favor back at some point Um, and so like I know I have you know sequel queries at google that like other people You know two jobs on have like taken on and like own and at some point If I get stuck I hope I can ask them. Um Or at the very least I've taught enough other people to fish that like it will reduce the burden on me Yeah Yeah, and a lot of and a lot of that is building I think a nice foundation where you can say like here are the things you need You can all fish now Come to me to coach not necessarily You know, I think I think a lot of this happens like data science is a weird area too Like people have feelings about that as a sector But I tend to think that a lot of those a lot of people who become successful data scientists Evolve into coaches right where they're coaching Other data scientists or analysts to do things, but they're using their expertise where it matters, right? And so sometimes, you know If I'm analyzing data every day, it may not be I'm writing the sequel queries every day, although I read a lot of sequel queries Um, it may come down to you wrote a sequel query and I look at the analysis and say like I'm not sure can we try it this way, right? So figuring out how to how to elevate yourself Welcome to the rest of your career, man Well, we're losing people Oh, yeah, sequels sequels a complicated language people mess up all the time You know, I think the biggest the biggest danger I've seen is is Be careful when you use statistics Um, you know averages, you know standard standard deviations variances counts everybody can kind of agree on what those are If you delve into I made a linear model. I did a logistic regression. I did a fancy machine learning model Be very careful For two reasons one you need to make sure you understand How the analysis you did functions and what the outputs are But again part of showing your showing your work is also saying I did I did students t test I analyzed these populations the means are different You don't know what students t test is I need to make sure you understand what that implication is Um, I've seen a lot of situations both as a researcher and as a product person in which people have Gone to the statistics machine and come back with a black box answer to like It's significant. This is what it is and I do remember once in in grad school a collaborator came back He's like look at I found this great model. I was like That that can't be that can't be because that will take longer than the heat depth of the universe That can't be possible And you have to you have to like sometimes ground it and like do we fundamentally understand what this thing does? So like the more black box your analysis gets the better you better be able the better you need to be at Making that black box clear