 Hi, I'm here to talk to you from the product school about the five basic statistical tests you really need to understand as a product manager. And the reason why you have to know these tests is because these are the five ways people really use data incorrectly. So as a PM, you wanna be aware of them so you can make the right decision. I'm Christopher Crosby. I'm a product manager in the core infrastructure team at Google focused on data analytics platforms. Before coming to Google, I had a few different types of roles working in both data and technology. My educational background was an undergraduate degree from Ohio University and a master's from the University of Pittsburgh. I studied information and science, information science and telecommunications, which ended up being a pretty good foundation for product management. Both the degrees, they had a focus on applying technology to real world problems, but they didn't necessarily shy away from programming or the scientific calculus that also became really invaluable. After school, I went to work on coding biostatistics applications for the NSABP, which is a cooperative group underneath the National Cancer Institute. I was part of a team that built software for large scale phase three clinical trials related to breast and bowel cancer. Now it was in this role that I was working as a full stack developer, but this is where I really got interested in more in like what the bio statisticians were actually doing. How are we doing this kind of data analysis? So I ended up going back to school in New York City at Hunter to get a second master's degree, a master's of public health and biostatistics. I then took that combination of statistics and computer science and went to Memorial Sloan Kettering Cancer Center, where I managed a data science team where we were focused on bringing new data tools and applications and making the data of Sloan Kettering and all this cancer research data more accessible to both clinicians and researchers. I then moved into Amazon where I had a couple of roles, one of which was on a grand challenges team of Amazon, which was an organization is in R and D department basically where we were tasked with coming up with new ventures for Amazon to get into. Finally, that's after that, I made my way over to Google where I'm currently a product manager in the data analytics space. So data science and product management, it's really important. Data analytics, it's been important to my role. It's probably gonna be important to your role too. And so, as a product manager, you're gonna have things like business analytics and metrics. How many customers do you have? What's the revenue? What do your margins look like? Second, you're also gonna have your product analytics. Often this comes from things like your logs analysis or a lot of the telemetry data that you're gonna build into your product, understand how users are actually using your product. Windows, how often do they take advantage of a new feature? Windows, something fail, all of that. Third, is you also have a lot of feature testing. You might do a lot of experiments to see how you want to approach a problem or a new feature or a change to your recommendations. And a lot of digital native companies have gotten really good at this, doing things like AB testing. And the product school actually has a few other great videos that talk about AB testing specifically. And then finally, another thing that you're gonna use data science for in product management is just knowing how to build the data and things like ML into your applications. You've seen a million implementations of features for ML in different products, whether it is spam filters or smart replies, or if you're a banking application, you build fraud detection. You need to know enough about some of the basic statistics to know how to apply these more advanced techniques to your product as well. But all of that really, is not what I'm gonna get into today. Today, I am really going to focus on just five basic statistics that you probably learned about several times starting in elementary school. So while some of it might seem redundant at first, I'm gonna do my best to explain how these five basic statistics actually apply to your role as a product manager. And so these statistics are pretty much, something that I use for all of my data driven decisions. So all of those different data science use cases I just explained, these are the fundamental things that underlie all of that. So let's get started. Number five, hypothesis testing. Now this is an act in statistics, whereby an analyst will test an assumption using a very specific methodology. Now the methodology depends on the nature of the data used and the reason for the analysis, but the hypothesis testing is used to assess the possibility. And you have sampled it. You don't necessarily have the entire world but data you have samples. So product managers, this is what we do. We need to become experts with coming up with hypothesis and then rigorously testing them. As a PM, you don't wanna be the guy with just another or a guy or girl with just another opinion in the room. You wanna be the person with the validated hypothesis. Now you wanna test your product assumptions about everything and this is not just AB testing. So one way PMs do this hypothesis testing is we come up with really well-scoped MPPs that define a very specific hypothesis that we wanna go and validate in the product. Now the approach to MVP, that really has to change based on what you wanna test. For example, there's an MVP known as the RAP or the riskiest assumption test. And this is a great way you can go and test, the thing about the product that I'm not sure if it's gonna work, I go and just test that. There's also ways to test a product market fit overall without having to build the full product first. So a popular approach is something like coming up with a concierge MVP where maybe ideally you would have this whole system automated, but first you wanna test if people would even use that. A famous example of that is when Zappos got started. Zappos originally was not all automated. It was people taking pictures of shoes, posting them and then going out and getting those shoes but it really proved the market fit so then they could expand on the technology section. Lots of opportunities to do that. Even another approach that's similar is taking a lot of people's products that already exist, wrapping those up and then seeing if that works for your test your market fit. So that's something known as an OPP or other people's product MVP. But the way to approach, what you really wanna do with all of this is just when you test hypothesis is make sure that you are using your customer feedback to validate your hypothesis. So I often hear the job of the PM is to talk to customers or be in the voice of the customer. But the problem with that is in most companies today that are customer focused, everyone talks to the customer. So when it comes to put in feature requests, everyone has opinions. Customer service has a list of friction points that customers have hit. Engineering, they've talked to customers about scaling issues and they want to implement those. Marketing, they talk to customers about maybe future needs or business fit. What needs to separate the PM from all this noise is that the PM can bring validated hypothesis that have actually been tested with the methodology. So you wanna be very thoughtful about how you talk to customers and what you want to actually validate. So you can do that by formulating in a real study plan. You can know what hypothesis is, you're actually trying to test ahead of time and then know you're trying to get out of it. And then you wanna make sure that you're asking a lot of open-ended questions, give your customer chances to talk about their pain points and understand their business better. And that way you're not just biasing and biasing them with the few things that you already think that they want. So Cindy Alvarez wrote a book on lean customer development where she gives some great open-ended questions that have worked for me almost universally in everything I've researched. So things like, tell me about how you do the like, whatever you're trying to study today. Or if you have, do you use any X, study tools to help you get this process done? If you could wave a magic wand, what would you do? And you say, forget about what's possible, what would you wanna get done? You can come up with solutions based on that. And then last time you did whatever you're studying, what were you doing right before? And then maybe what were you doing right after? And then, is there anything about, always in with, is there anything about this subject that I should have asked? Now, this doesn't mean that every conversation you have with a customer is gonna be this open-ended conversation and exploring their workflow. You still have sales pitches, technical troubleshooting, but these are the type of open-ended questions that you could and should be incorporating into those conversations. Every customer interaction could be treated as an opportunity to either confirm or deny a hypothesis that you have about your product or service. Definitely recommend Cindy Alvarez's book if you wanna go deeper on this subject. She has a very short read, but a very great read on how to actually go and do data-driven hypothesis testing with your customers and get that and get that level of detail that you need out of the customer conversations. So, the next test I'm going to talk about, which is more of a traditional statistical test, is a T-test. And this is probably one of the most common statistical tests I wish was presented next to a claim that some salesperson, PM, or engineer makes. Tell me, explain exactly what this is. I'm just gonna walk through an example. So I can't tell you how often it happens that you have some engineer that comes to you and says like, hey, my new feature, it increases customer revenue by $105 per customer. And then they have something like the spreadsheet to the right where there are users with and without the feature. And now I know a lot of digital native companies, they do this the right way with proper control groups and testing, but not every decision that you're gonna make can have that rigor and can be made that way. So for example, like one thing that comes up for me pretty often is when I'm trying to price a new feature, I have to rely on data that I can build. I can't AP test customers on pricing. So like one time, I was trying to price a new tiering feature and I had inside, if there should be an increase in consumption or not, and we have to look at data and you really wanna make sure you have valid analysis before you make these kinds of decisions. So back to my example here, the engineer brings me this spreadsheet. And of course, it looks like the one on the right, whenever I get this, all I do, first thing, I always just go and I run a T test. It usually just takes a couple of minutes. It's a fun, if you look, this is the actual function that you would call in a Google sheet. And what it does is it not only takes the mean, but also the standard deviation of those names. So that tells you how much fluctuation there is in the data and how reliable is it that the two populations are actually more the same or they're different. So in that spreadsheet example, what I do is I put that T test in and then it spits out a P value, which is basically the chance that this occurred, or the probability that it's occurred by chance. So in this case, there's almost a 70% chance that that $105 per customer result occurred by chance. Now, typically you want less than 0.05 or a 5% chance of something occurred for it to be statistically significant. So in this case, I can just dismiss that $105. It's just not statistically true. One thing to keep in mind as you go about doing these T tests is it is for more or less linear data, things like money or it works well for, but if you have counts of something in a bucket that you're counting, or anything that's categorical, you definitely want to run a chi-squared test and not a T test. That's also a mistake I've also often seen made. Okay, number three. By far, the most popular technique in statistics is regression. It's the basis for a ton of machine learning and so it's certainly worth understanding in a lot more detail than I'm going to provide in this webinar today. Because when it comes to machine learning in your product and knowing what it can and can't do, an understanding of basic regression and all of its various limitations, if you study regression, that's probably the biggest return on investment you're gonna get. So there's just countless new ML algorithms coming out every day, but a lot of them all come down to basic regression techniques. So if you have a deep understanding of regression, that's gonna help you not only come up with interesting product features, and then you don't have to just speculate based on some science fiction novels that you've read about what machine learning can and can't do. So once you're just to a level set, if you have not come across it before, what you're looking at is regression and it's most basic form. This is a standard linear regression. In this case, we're comparing cricket chirps versus the temperature, and you can see that they are completely correlated. That line that is your regression model, the red points are just data points. A trick that is very popular that most people play is if you have a binary outcome, either your customer clicked or they did not click, and it's one or the other, you can take that same regression, you can plot that between zero and one, and what that outputs is a probability that, hey, of what that click rate is going to be. Again, this is just barely scratching the surface of what you would wanna know about regression, but if you wanna go more into this, and I recommend a machine learning crash course that Google puts out, it's totally free. It's a great way to get deeper into both regression, but also just some general machine learning techniques that'll really help you if you want to better understand how you can apply machine learning to your product. Number two, the number two on my list is ARIMA, which is a statistical technique you typically want to use for working with time series data, things like predicting customer churn, setting monthly sales quota, capacity planning for resources over the years. All these are time series analysis, which often gets put on the product manager. And the way they're typically done is painfully wrong. So I know it might feel like I'm skipping ahead about 20 chapters going from things like T-test and regression to an autoregressive integrated moving average, but I do feel like it's super important for PMs to be aware at least of this type of statistical modeling, especially since estimations about in the near future often have lots of big repercussions if you get them wrong. So my goal here isn't to necessarily have you come away with the capability to run a rock-solid ARIMA model, but rather I want you to remember the concept so that when you are doing your monthly revenue projections or some other very important estimation for the business, you at least know to go do some research or ask for help. So the mistakes I see people make when modeling predictions on time range from very simple things. So maybe you're selling pumpkins and your sales have been going out all of October and November, then your VP goes, hey, can you predict the annual recurring revenue, the ARR? And you just take that November revenue, you multiply it out by 12 and then you say, hey, that's my recurring revenue for next year. And then you ignore all that seasonality and you're completely gonna miss your sales targets because your model didn't correctly capture the seasonality when you ran that prediction. Now, even when I do see seasonality taken into account, seasonality often does not tell the complete story. This is where the benefit of like an ARRMA model and decomposition modeling comes in. Now, the example I'm showing here is actually not an ARRMA model per se, but it is demonstrating how a time series analysis can be done by breaking out the various components of what goes into the data, things like seasonality. And so here you can see I took a beer production dataset, you see it broken into a, it's based on time on the left. And then on the right, you can see how it's broken into its various elements like the seasonality versus the actual trend. So ARRMA models, they take this even a step further and try to describe the autocorrelation in the data as well. Now, if I've piqued your interest into learning more about correctly building time series models, there are a couple of good places to start. One is R, the programming language and the other statistical programming language. The other is Google BigQuery. So if you're willing to write a small amount of R code, it's quite easy to get started with an ARRMA model. If you have data at the quarterly or monthly level, a suits decomposition is an easy way to get more insights in your time series data. You can find that and a lot of other simple time series functions in this seasonal R package. Now, ARRMA itself can get quite complex. It has just a lot of options, but there's also a forecast library in R that contains this auto.ARRMA function. And that follows best statistical practices to find the right way to analyze your data. Now, like all things automated, you definitely should be cautious of the output and validate with what you know against your data and your business, but a lot of statisticians do rely on this function. Another option is actually one of the products I work on, Google BigQuery, which is Google's cloud-based data analysis platform. That has a RRMA model still in, and you actually get a free, terabyte of processing each month to use. And then there's even public data sets available so you can practice your modeling of some data that Google has made available for you without having to load up by anything. If you wanna go deeper into this subject, a book I recommend is Forecasting Principles and Practice. This is a little bit heavier of a read. It was, I think it's designed as a, like a kind of an MBA level book. So it is approachable in its map, but you know, you should, you know, if you're going into this, you should expect to, you know, open up R a little bit and get in there. All right, this brings us to number one. Number one is average, because this is the one that is used in the most misleading ways. So always be a lookout whenever someone, you know, talks about the average. And, you know, you should know when you're talking about median, mean, and standard deviation from that means. So mean means the most, this is the most common approach. This is all the numbers added up and divided. This is the same way you got grades in school. But the mean, that can hide a lot of information. Let's say you're trying to understand the average wealth of an Amazon employee. So you go to the cafeteria and you ask everyone their wealth as they walk in, there's about a hundred people in the room. But then Jeff Bezos walks into the cafeteria and says, oh, you know, 180 billion. So, you know, does that mean that, you know, everyone at Amazon is a billionaire? You know, no, you're, you know, this average can be really misleading. So this sounds intuitive, but PMs do this all the time. They make them say constantly. They have one or two whales, you know, big customers who drive up the overall averages in their product. And then they claim to those, you know, how the average person uses their product. And I've seen this hidden in a lot of metrics too. PMs might talk about, you know, the customer acquisition cost or LTV based on some average revenue. But, you know, you might only have one or two whales that completely skew all those metrics and make it sound like, you know, this is this fantastic product market fit. But then the average customer is spending all this money. But in reality, if you just had like, you know, it might just be one company that are doing some tests. And then when that company walks away from the product, you know, all those awesome metrics, they just vanish. So the median, that will tell you the actual middle. This is more often what you want. It's less common, but typically it's a lot more useful. Now, don't get me wrong. There's plenty of examples where the median does not work either. For example, the median of my cloud product having an outage, that's about zero. Now, my recommendation to customers is, hey, still set up disaster recovery to fail over because there is this off chance it can happen. But, you know, so that's where the median isn't that useful. But also, like maybe if you're in a consumer business, especially one if you have like a subscription fee for it, the median doesn't really matter at all because I went to pretty much paying the same. The general rule here is that you want a fair reflection of the data. And that is really hard to come up with that hard and fast rules for. But hopefully you kind of remember this presentation and the next time you hear the word average, that should trigger you to start thinking about the right follow-up questions that you want to go and ask. So the last book I'm going to recommend today, it's a great follow-up, if you did want to learn more about basic statistical principles. It's called What Is a P Value Anyways? And it was written by a top statistician that I used to work with at Moral Sloan Petering Cancer Center, Dr. Vickers. And he uses book to train medical students on what they needed to know about statistics to do their research. And he gives some great, really funny examples of basic statistical techniques that make you easy to kind of understand and remember how they all work. So if you want to know more about how to use data in your job, this is definitely the book to go check out. So with that, I appreciate your time and hope some of this was useful. I am on LinkedIn and feel free to reach out directly, would love to chat more. Thanks.