 I'm gonna talk about statistics and I understand that's probably not the most exciting thing. I really wanna know real quick, who has like a master's degree or higher in statistics in this room? I'm thanking God, I don't see any hands out there. Good, so I'm really excited about that. If you've been in the Ruby community for awhile, you might know this guy, Zed Shah, who wrote Mongrel. For you young folks, Mongrel was what you used before there were unicorns and rainbows to run your server. And so he wrote this blog post that really affected me quite a while ago and it literally was called programmers need to learn statistics or I will kill them all. And he really, it's worth reading because he's right about a lot of stuff because developers throw statistics around, all sorts of people throw statistics around like they know what they're not talking about and a lot of people use them wrong and then use that to justify something. So my goal is to make sure that you don't die because you don't know statistics. I wanna make sure you survive and whether you wanna think about it like a zombie apocalypse that you need to survive or like fighting in the World War II that you need to survive, that's the guy from Unbroken in real life. I wanna make sure you survive by knowing a little bit of statistics. That's the goal of the next little bit. And so the things that I am hoping to teach you, I wanna teach you all the things about basic descriptive statistics. I also wanna try to use every single meme that's on the top couple of pages of meme generator. So I came pretty close to that. I think it's statistically significant but I'm not gonna give you any data to calculate it. So basic descriptive statistics, the first thing we wanna talk about, how to compare things faster is faster. I wanna talk about how you compare performance tests for instance, I spent a good part of my last job in the wilderness of being the developer assigned to a marketing team. And so I wanna be clear that one does not simply run an A-B test. It's a lot more complex than that. We're gonna use that as a starting point. And if you guys and gals are thinking about applying for jobs, Mr. Wonka at Wonka Corp has another question for you and they're ridiculous questions and there's ways to answer those ridiculous questions using statistics and Fermi estimation. So I kind of threw Fermi estimation in. Because most of the time people aren't sure if things are statistically valid or they're lucky guesses because you don't know enough about statistics. And if there's numbers, you just take numbers and say, oh, there's numbers, they must know what they're talking about. Clearly the thing with numbers is better than my lucky guess. And I'm gonna show you that there's ways to actually understand those numbers better. Well, my, that's me, I live in Virginia. A long time ago, I was a chemist where I studied science and statistical mechanics, which I really would not recommend unless you really love FORTRAN or possibly Mathematica and MATLAB. And I got to do things on computers like that one, which is the best Vax terminal I could find and that which was actually the Cray that used to be at the National Center for Super Computing Applications, which probably the iPhone that you've got in your pocket is faster than now, but we begged for time on that. And then I went to an education school where I learned that educators don't know very much about statistics, but they're really good at pretending about it. And we use tools like tape recorders and Scantron sheets and tools like SAS and SPSS and IBMs. And boy, that was fun and that was a learning experience. And now I do Ruby at Big Cartel. There's a long path that got me here. Anyhow, last year I was at Rocky Mountain Ruby and I was teaching people about using data. There's a bunch of gnomes sitting on a big pile of data and filling in the missing piece to figure out how to make a profit. And I use this slide at one point. I said that aggregates were a really not compelling story because I was talking about more advanced data science techniques that I really like and love. And I gotta admit, I was wrong because I find out that people need to do a better job with statistics and that's what I'm here for. There are three kinds of lies. Lies, damn lies and statistics said Mark Twain, except Mark Twain didn't really say, well he sort of said it, but it was really going all the way back to 1895. So even this quote is a lie. It's just not a statistical lie. So this was made in a speech way back in London in 1895. It's basically talking about how frustrating statistics are because statistics can tell you all sorts of ridiculous things. Like XKCD, can you read, I guess you can read that. So if you look at those, that's kind of interesting because statistically there's an overlap there that might lead your business down a really interesting path. So don't make these mistakes, all right? The XKCD really hits a lot of these things in some of their statistics. It's not the last one. Anyhow, descriptive statistics are where we're starting because you have a sea of users. I'm gonna tell you a story. I used to work in an education company that rhymes with Seamouse and we were, and still are, a independent educational company that I'm no longer associated with and I was given a population of users to describe. There are 169 users that earn 3,320 badges which means there is an average of about 20 badges earned per person. So we just describe the users because when we describe users, we always expect, oh, well, whatever data I pick, of course it's gonna follow the normal curve, right, a bell curve, a Gaussian curve, whatever word you used. This is what it looked like when I graphed it. What the hell is up with that data? I mean, 132 of the people got one badge and so I sat there in puzzlement because I was trying to put a project together with this data and I was wondering what went wrong, right? Because I had a census or a sample of data. This was a census. It was an entire population of people that met a certain set of criteria. But one important part of statistics is that along with a central value, that's what we were trying to get to with the average there, there's a dispersion around that because any statistical measurement is a distribution. It's not a single thing. So reporting a single thing is problematic. We're used to central values. Mean or average is the thing people work with all the time. There's also the median, which is the exact middle of a set of data, and there's the mode, which is the most common value in the set. So if I even had taken the basic statistical central value numbers, put the same data in, this is what I got. The average was around 20 badges. The median was two in the mode. The most common was one badge. A piece of background knowledge you should know about this is at CMouse, you get one badge just for signing up. It's the newbie badge. So looking at that with a little prior information really made this mean, the average up there is really not a good representation of what that data is, right? So what I need to do is talk about the dispersion of it. The dispersion of the data, we can look at a range between smallest and largest. We can look at the variance and the standard deviation. We got math ahead and I wanna say really quick, everybody leaves off standard deviations or variances or even ranges. How many, I mean, if you listen to NPR this morning, they gave you a statistic about something. I don't know what it was because they did everyday Fox News does it, CNN does it. It's not about good, bad, truthy, non-truthy. People just like to give averages and stop and averages without any more information than that is what Zed Shaw wanted to kill you for. Because it doesn't describe what's going on very well. That average of 19 did not give me a very good story. So anyhow, let's do some math real quick. Standard deviation, they do an AP statistics now. I've got kids in high school and AP statistics just blows me away because I did a lot of what they do in grad school. But so that's math. You all like Ruby better. So it's not very hard to do in Ruby. This is the sum that's not hard to do. Then you average it, then I find the variance which is where I sum all the differences of squares. And then the standard deviation I take the square root of the variance. It's not really all that complicated. And in fact, I could use the descriptive statistics gem and jump right through it. So let's see what happens when I take that data. The range of badges earned was one to 227. The variance was 1,330 badges squared. Variance is really weird, which is why people don't report it because the units are never what you want. They take the square root for a lot of reasons, but mainly to make the units match up. So you get back to badges. We're down to 36 badges. That's the standard deviation. So that was 19 badges. Let's call it 20 badges plus or minus 37 badges. That's a much better story than the average is 20 badges. And we'd make very different decisions. Now it turns out the real story was I just did a quick sequel query for this. I mean, this took all of like one hour to figure out back when it happened. But it turned out I was like, oh, we're doing something for people internally. I'm just gonna get users that have an internal email address. And what that pulled in as all of us as developers know, all the junk in the database from all the tests where you have a test account at yourcompany.com and they all had the newbie badge. And that's why this data was so crappy. And it really was just for an internal competition to see who earned the most badges. It really wasn't important, but it put in even trivial data, just tiny extra time makes it better. So do not put your faith in statistics until you've carefully considered what they do and do not say. So next we're gonna talk about inferring things from statistics. This is where statistics get a lot of fun. Benchmarks, everybody done benchmarks with the benchmark or the benchmark IPS gym. This makes it really easy. I'm benchmarking string and symbols looking into hashes mainly because I just wanted to pick something simple. And if I run this, clearly one is faster and one is slower. And I can report those and then I decided, hey, that's weird or that's not a very good example because they're so drastically different from each other. What would be simpler and closer? Well then I did the same thing with quotes and double quotes to see if quotes and double quotes matter. Not because I care just because it's an easy example. And so now I've got an example that I have to go out one more decimal place to see if it's any different. One's one, I said that to one, this one's like a fraction of a percentage better. And the question is, are those really different statistically? I wanna be careful. I can make what statisticians call type one errors, which are false positives or crime wolf. It's when I have things that are the same and I say they're different. And this is when you talk about the alpha level, the AP test, the 5% significance that the entire world has decided is a good enough value. Then there's the other option, type two errors, a false negative. Most people are willing to take an 80% chance on the false, sorry, that there's a 20% chance of a false negative, you wanna be 80% certain there. And so we've got these two scenarios where we can make type one errors, where we cry wolf, we can make type two errors where we have false negatives. And so to solve that problem, we naturally go to beer. Because in the late 1800s, a guy at the Guinness company wanted to figure out the difference between samples of beer quickly, easily, and using small samples. And he did an enormous amount of statistical work and the derivation is something I'm sure we're all really sorry that I'm not going to show you. But basically he came out with a way of doing things called a T test, which is a way to compare samples. In the means of samples, basically, to see if they're truly different or if they come from different populations or they come from the same population. So going back to the string versus symbol lookup, if we throw this into a quick graphing thing, I had to punt and I found, I'm so sorry, numbers is so crappy. And I didn't have Excel on this map, and I just assumed I could do stuff in numbers that I couldn't, so I had to run to the web. So that's the two means from that first example when we were doing symbol versus string. And they are hugely different. It was 100, one was 28% faster than the other one. And of course those were very, very different. The p value up there, what's the percentage that this is by chance is way less than 001. So we're thinking, oh, this is pretty good. But then there was the other statistical test, or the other test that I did that was the single quote versus double quote. That was really close. Those numbers, one was seven tenths of a percent faster than the other. Were they really different? Well, if I throw those same numbers in, believe it or not, statistically, those are different. It's saying there's something different between those two. Do you care? I don't know. Single quotes versus double quotes. Maybe I was streaming more Pandora when I was running this test locally. There's all sorts of sources for bias. I'll be honest, I ran it three times and I got different answers every time. Not just the actual numbers, but which one was faster or slower. So I wasn't feeling really good about that. And there's some other statistics you can use to look at multiple population samples. But anyhow, just as an example, at least I'm going to the trouble of seeing if there's a statistically significant difference here before I'm making a claim. In the first situation between string and symbols, I calculated out a T value of 557. Well, to get really certain, it needs to be more than 2.5. You take absolute values of T values because the negative doesn't matter. Because you hand wave and say the negative doesn't matter. But it's clearly way, way higher than what it would have to be. For the second one, oh my gosh, I did that wrong. This one is quote versus double quote. Copy and paste error, my bad. We get a T value of 3.868, which is a bit better than 2.5.76. So it's still saying it's significant, but it's nowhere near as overwhelmingly significant. So quick and dirty, inferential statistics. The thing to remember is there's a way to do these things, to compare two sets of means. T test and Z test are the path you take. And there's lots of easy online tools for doing this and any statistically minded person can probably help you with it. All right, next up. Testing leads to failure and failure leads to understanding. So this is clearly about AB tests. So how many people have done AB tests or have companies that do AB tests? All right, and everybody is thrilled about AB tests because you, oh my gosh. So AB tests, we talk about usually increasing conversions and we take traffic to the homepage and it's always 3% conversion on the buy now button and you say let's make it free trial so it increases conversion and then you pick a sensitivity. You're like, hey, I really wanna move the dial. This is a step that most people don't start with. They just wanna say, hey, I wanna change the text. Let's see what happens. Okay, already you're one step ahead of anybody doing that because you know that I told you you need to check the power. So I'm going to, I'm sorry, keep doing that. I wanna move the conversion from three to 4%. I want it to matter. So that's a 33% sensitivity adjustment. Now, here's the thing that sucks about setting a sensitivity and wanting to make sure your statistical test has enough power to prove what you want because there's math that tells you what you have to do and you are not going to like what the math says. So there's an approximate way to calculate this out which is okay because you can do it in your head. It's over there on the left. There's a web calculator at Evan Miller's website to do it. So if I wanna have a baseline conversion of 3%, I wanna see if it goes up or down by 1% because I wanna see if it's better or worse. Each of those options, the by now and the free trial, I need 4,782 people to see that. That's a lot of people, right? You know, let's call that 10,000 order of magnitude. You know, I mean, think in your head how many days of people hitting your page where you asked them to do a conversion that would take. I mean, for some sites, it takes three hours. For some sites, it takes a few days. For some sites, it takes infinity, right? If you don't have a lot of traffic. But you've done something really important here. If you don't have a lot of traffic on that page, you said there's not enough statistical power to choose. Pick whatever color you want for the button. Pick whatever text you want because statistics cannot help you here because we don't have enough traffic to make it matter. We can calculate this actually using the real formula if we move over to R. R is a very, very fun language for dealing with statistics that's almost but not quite as hostile as octave, which is sort of like MATLAB. Clearly, someone's used octave. We can actually do a for real power test with the same numbers. I wanna see if I can go one percentage point from 3% conversion to 4% conversion. I'm putting a significance level of 0.5 because everybody does it. I'm putting a power of 0.8 because everybody does it. I want two-sided, which means I wanna know if it's better or worse. I just put those into a built-in R function. It tells me I really should use 5,300 people per. So the rough approximation was 4782. This one says 5,300 using the real numbers. Either case, it's a lot of people have to see the site before I can tell. You cannot believe how many times I got to shut down creating an A-B test because there was not enough traffic to make it worthwhile. And it's great because the designers are trying really hard to use science. And it's great, but sometimes it's like it doesn't matter. We can't tell. There's 80 people a day that look at this button and we can't tell. We just can't tell. Well, we could tell if there's like a million percent difference. I'm exaggerating and not using real numbers. Anyhow, so that's a lot of visitors. Here's the worst part. You gotta run the test 50-50 split and you can't peak. Almost all of the online A-B test tools keep letting you test until you hit significance. And oh my gosh, that is unadulterated. I'm sorry, bullshit. The statistics just do not work and they know if they have statisticians. But there's no incentive to tell you, oh, you gotta wait until 10,000 people look at this so don't even bother for five days to see if it's significant. That's horrible and it's painful and it makes running these tests really insane because you wake up every day and you do the same thing. And at the end of the test, you've decided whether things are right and then someone shoots you in the face to start the test all over again because you just ended up needing to do it again and you really don't wanna wait because you're moving at web speed and you want it now. So what if I told you there is a statistical set of tools where your beliefs about the test and the results actually affect the results. Morpheus is just perfect for that. That just sounds like something that would come out of his mouth. So there's this guy, Bayes. And this is, again, ancient statistics. This is the only Ruby talk you're gonna see anytime soon where most of the source material is from the late 1800s. So basically what Bayes said is it is stupid to not let the data and the way the data is starting to move change your belief in your hypothesis if it makes you more certain your hypothesis is wrong. Basically he's saying if you're wrong, the sooner you admit it, the better and faster you can get to the right answer. So I think this video is gonna work. We're gonna do a click-through rate test here and we're hypothesizing two different things and what we're trying to do is find out if this is where things start statistically. We've got these two bell curves, we're looking for two slightly different answers and we could just run, let's say 10,000 people through this and look and see if it's different and that will take, let's say three days at the rate we convert on the website I'm working on. Or we can put this in motion and let Bayes do his magic. Let's see if I can actually make this run. Oh my gosh, sorry. So what this is doing, you can see them separating out and getting narrower and that's starting to say there's a real difference here. Who's convinced there's a real difference? You're not convinced there's a real difference as they separate out and get narrower? Right, so let's do it again. Oops, sorry. When would you say stop? Right, you know, but I can't stop if I'm doing traditional A.B. testing, I gotta go all the way to the end to get the power I said. But what Bayes said is you are stupid to not use the data you have. That's what scientists do because with traditional A.B. testing you end up with answers like this. How many of you wanna tell your designer that? You know, we tested the buy now button and we rejected your null hypothesis. The p-value just was not good. Or with a lot less data, you could say, hey, there's an 85% chance that this has a 5% lift. That's not a very good answer either but it's a better answer because it doesn't hurt anything. The beauty about using these Bayesian statistics which is just to whet your appetite for Bayes is that you get to peek anytime you want. It doesn't mess up the statistics for peeking. In fact, you're encouraged to peek. There's actually also a parameter that goes into some of these called regret and it lets you set a regret level and that way you can avoid things going wrong soon. And you know, I'm being really flip about this but there's a really real reason. Bayesian stats, particularly with regret factors are used a lot in medical studies because what kind of medical researcher if you find something cures, oh, let's just say Ebola because it's on people's minds. Say, you know what? We gotta get through 1,000 people before the power of this is statistically significant. I mean, we clearly see it's better but it's not statistically valid till we're done. That's horrible and no one wants to do that but that's what frequentist statistics technically says you have to do. There are so many ways to adjust it and all sorts of hedges but if you're going through all those hedges around your stats, you might as well be using Bayesian statistics anyhow because it actually lets you put a number on regret and it lets you use your actual data. And one really cool thing for A.B. testing is it lets you switch horses in midstream. There is no way to find a good picture of switching horses in midstream. That's a jockey. I think it's gonna be walking on two different ones or standing on two different horses anyhow. But if you're agile, you don't want to run a 3D test, a three day test of red versus blue and then a 3D test of the blue variation one versus blue variation two and on and on and on because that would be really sad. So you wanna be able to see, oh, this is doing better, grab it, go ahead, put that into practice and move on to the next one. So Bayesian statistics gives you a lot of power. The reason people that you might be asking why doesn't everyone use Bayesian statistics is because they don't teach Bayesian statistics a lot. I mean, how many people have ever had any Bayesian statistics anywhere? Good. So statistically speaking, that's 0% of this audience. But it's still great that some people have seen it. And Bayes is full of crazy. I mean, you get into it, it is full of crazy. But so is normal statistics. All of statistics is full of crazy. So gosh, I'm sorry. So again, because I love these guys. So let's say you've got a neutrino detector. You're trying to figure out if the sun's exploded or not. A traditional statistician, one way they'd interpret it if they, basically the box can lie. So this statistician is a frequentist. There's a one in 36 chance the box is lying, but that's less than 5% significance. So if the box says the sun exploded, the sun exploded. So he's ignoring his knowledge that the sun probably did not explode. Because historically for every day that person has been alive and every day the earth has been in existence, it has not exploded. And whether the earth has been in existence for a little over 4,000 years or billions of years, either way, the chance of the sun having exploded is really low and your confidence in that is really high. So why do a test like this? And so a Bayesian would probably say something like, you know, that it hasn't. Because he's much more likely to be right. Anyhow, I'm gonna really quickly go through Fermi estimation and continue in the same thing here. You might have seen the painting, the earth thing that talked about Fermi estimation. Fermi estimation is about orders of magnitude. Traditionally, Enrico Fermi, when he was teaching physics, would give students impossible problems, much like Google and Microsoft give you impossible problems when you are interviewing. And he very famously used this as well when he was estimating the yield of one of the first nuclear weapons. And he dropped a piece of paper. He knew how far it fell and how far away it blew and used that to figure out the yield in rough numbers and came within two or three orders sorry, a factor of two or three of the real answer. He was in the right order of magnitude, but it took days with the computers that they had to turn by a hand crank bank in those days to figure things out. People done Fermi estimation, have you ever been asked the ridiculous question like how many piano tuners are in Chicago, which is one of the classic ones? And I wanna show you something real quick about this because I'm running out of time. I found two answers online. One was from NASA. Three million people in Chicago, assume a family of four. This is what Fermi estimation does. It's like get the order of magnitude right unless you know better what the actual number is. So three quarters of a million families, 20% of them have a piano, 150,000 pianos. Piano tuner can do four pianos a day, five days a week, 50 weeks a year, 1,000 pianos a year. There must be 150 piano tuners in Chicago. Okay, so now you can get into Google or Microsoft because you know how to do Fermi estimation. I looked this up on Wikipedia and they picked Chicago's population was 9,000. They used two people per household, one house and 20 has pianos. They made exactly the same assumptions about the piano tuner. They got 225. So one had 150, one had 225. Same number of zeros, right? It's in the hundreds. And then if you ask the Oracle of Wolfram, oh sorry, first of all, Wolfram will tell you where on earth did they get nine million as the population of Chicago, but it doesn't matter. They still came close to the right answer because the right answer according to the Bureau of Labor Statistics is probably somewhere in the two nineties. So I used absolutely nothing to come up with an answer that's actually pretty accurately. And that's what's called cosmologically equivalent in astronomy circles who use Fermi estimation all the time. So the reason this is important is you can build a SAS. Everybody's working for software as a service. And you do these kind of calculations to find out how insane it is. And that really can close the door on a lot of projects. I want one million in yearly revenue, $50 a month subscription. That means I need around that many customers. Those are real numbers because I can do math. Well, one million's just me pulling a random number because I wanna be a millionaire, right? Conversion rate of 1% is a pretty good statistic. A lot of them, that's kind of like a good rule of thumb. You can measure it. Click through rate, 1% is a great click through rate for a lot of ads. So to get that many customers, you need 16 million and change impressions. Let's call it 17 million, okay? So if you are doing piano tuners for me, we can estimate if we have a viable business model here. There's 3,320 million people. We'll use the same, let's break it into households by dividing by four, divide it by 20 to figure out how many pianos. 4 million pianos and we used our 1,000 pianos tuned a year per tuner. We get 4,000 piano tuners in the US. All right, so we need to do 16 million impressions and there's 4,000 piano tuners. Now the good news is, you've already figured out now you need a sales staff because you wanna get a third of the extent available piano tuners instead of randomly spewing out ads, hoping you hit, right? So you've just made a valuable insight for your company without even having to do anything interesting plus you found a pivot. 17 million impressions and we just figured out there's 80 million households in America. Those numbers are cosmologically equivalent so it means you should have something if you really wanna be a millionaire and do the 1% click-through rate with the 1% conversion rate. You need to market something that affects every household in America. So now you know what your other option is for the business, something that everybody needs, which are two very different startups. So, why Fermi? So, horseshoes and hand grenades, right? Things are close, right? Except for horseshoes and hand grenades and a lot of times you need a hand grenade to blow things up in a startup. Changing your colors on your button that gets you from a 3.2% conversion rate without giant numbers behind that, Fermi scale numbers, orders of magnitude really doesn't move the dial much. It's great for iteratively improving, no question, but if you really need to move the dial, Fermi can help you get there. So, real quick. You came to Mountain West Ruby Conference and you learned some statistics which makes me really happy. We talked about basic descriptive stats, always to do a standard deviation. You know that you can take two different statistics and compare them using students T, that's the T test, that's the Guinness thing. So remember, standard deviation in Guinness, run an A-B test, basically learning about A-B test lets you know you should really look at what Bayes can teach you. And then you can answer ridiculous interview questions. So, I don't tell lies, but when I do I do use statistics because that makes it look like you're a better liar. I want to thank very quickly my patrons here in town. I work at Big Cartel, which is just down the street here. They're totally repping here. We believe in the artist. We are hiring and my name is, like I said, John, John Paul. And that's my easy to get to address and that's my Twitter which will not statistically be of any value at all to you because I don't do a lot of tweeting. And you can find me for questions and if I can take a minute for questions if you got them I'm happy to do it. People always ask about books. Those are some really good books. They cover stats and machine learning and the languages you need to use for this, almost all of the examples you run into are in Python because, I mean, frankly, Ruby is a crappy language for mathematical stuff. It just is. It's not what it's for. Python's good, ours fantastic. Octave is painful and extremely good and Mathematica is really expensive. Julia's fascinating and open source and free. So all of those things are things you'll run into. So, thanks again.