 I'm excited today for a sort of doubleheader. Not only do we have a wonderful presenter who will be introduced shortly, but we have a wonderful introducer for the presenter, and actually the relationship with MOMATH starts with our introducer. Summer before last, you may remember we had a solar eclipse, and I happened to be in Oregon where Gerard Van Bell was giving a talk at the local high school. He works at the Lowell Observatory, and he gave a great talk about what we should be looking for in the sky and what we shouldn't be looking for, and he gave such a wonderful talk that I invited him to MOMATH. He came and gave a talk here last May when Jupiter was in opposition. That means it was as close as it comes to the Earth, so many of you may have been here for that. And then through talking to him, I discovered that his father is a mathematician, and so I think it's only appropriate that I let Gerard introduce his father. Welcome, Gerard. Thank you, City. So thank you all for coming out, City. Thank you for inviting me out here not just once but twice. I'm glad I haven't overstayed my welcome so far here, and so I appreciate that. It's a real pleasure to introduce my father for a talk. It's not something I've had the opportunity to do before, so if you'll indulge me, I go into a bit more detail than I normally do with a speaker than if I have a colloquium speaker at the Observatory in Flagstaff, Arizona. I'm the comic relief. I get to be the warm-up act, so enjoy that. My father was born in Holland in 1936, so I'm going all the way back to the beginning here. And family folklore tells us that the family had been in the country there for many centuries, though apparently the 23andMe report says we're from Japan. Anyway, the family was mildly dissatisfied with the incursions by the country to the east that happened over the previous three decades, and so they immigrated to Canada in 1949. Manitoulin Island, in fact. I only mentioned that in passing because it allows me to drop a little Wikipedia tidbit that Manitoulin Island is the largest freshwater island in the world. That's an interesting thing to go look up. And they later then went down to the Toronto area, and Gerald's family has a long line of farmers and florists in this particular line of the family, and many brothers and nephews and nieces that are in that line of business. And my father's father, upon seeing his horticultural flair, said, maybe you should think about math. And so this is how a career in math was born. And at the same time, slightly after Opa, my father, had come out, my mother Oma immigrated to Canada around 1960, and they were married in 1962. I've heard that there are rumors of their scandalous courtship, but you know what they say, you know what happens at the Dutch Reformed Church stays at the Dutch Reformed Church. So my mother is in fact a medical doctor, or as my own daughter likes to say, Oma is a real doctor, and Opa and you dad, you're fake doctors. So in fact, her inference from that is that only women are real doctors, and men are fake doctors. And my father will say something about sampling bias during his talk. So it reminds me of a story that after I got my PhD, I was visiting my parents. We got a phone call asking for Dr. Van Bell, and I was very pleased to say, well, do you want Gerald, Johanna, or Gerard? They were trying to get somebody to sign up for a credit card. So I get 5.9% APR now. Anyway, dad got his master's and then PhD from the University of Toronto in 1967. They had two kids and then immigrated south to Florida for seeing the fact that Canada was going to become a liberal socialist hell. And so they got to the south of Tallahassee, Florida. He had a postdoc at Florida State University. Sorry, I don't mean to confuse that with the University of Florida. And they promptly did an important thing, which is they had an anchor baby, that's me, and stayed in the States ever since. After a second anchor baby, my younger sister, they were off to the University of Washington and had a final child, my younger brother. My parents apparently have an approach to children that's much like going to Sam's Club. You go in for clinics, you leave with stock and warehouser, and that's kind of how it worked out. My father's been in the University of Washington ever since. He started off in the Department of Biostatistics. He received the Distinguished Teaching Award there in 1985. He was appointed Chairman of the Related Department of Environmental Health and Occupational Health in 1991. He's now the Chairman Emeritus. My understanding now as an academic myself is emeritus means you don't have to go home but you do have to give up the corner office. He's been a frequent consultant to the FDA and research topics such as sudden infant death syndrome, Alzheimer's and airborne toxins. He is senior enough that at this point in his career, he's retired many times and has recently been drafted for a number of years to be the principal investigator of the Resuscitation Outcomes Consortium. So he's been a leading authority on many sorts of studies regarding people's health and different kinds of way that their health gets affected by procedures and that sort of thing. He's written a number of books on the subject. In fact, he's literally written The Book on Biostatistics. It's the textbook that everybody uses. You might have missed it on the bestseller list at New York Times. It's on the not quite fiction list. Statistical Rules of Thumb is another volume of his and Design and Analysis of the Experiments in Health Sciences. He is a man of deep reflection, careful consideration. He has the highest ethical moral standards. Those qualities, I'd say, are well-marinated with a sense of humor and quick wit. It's very clear that I get my genetics from my mom's side of the family. And with that, I'll have Dad come on up. Thank you very much. Well, thank you, Juard. My wife said, don't say to the audience, I really don't deserve these remarks because they'll find out soon enough. Would you get rid of your stuff? Here we go. So in a previous incarnation, my wife and I were Randolph and, or Gandalf and Frodo, and that was kind of dull, so I decided to become a statistician. And there is a lot of excitement in the field. My talk is uncertain variation. The title on the post is that certain variation and that's okay, but I decided to switch it slightly to uncertain variation and you never step in the same river twice. I would like to think of uncertainty as having two aspects. One uncertainty as full and then there is actually a side to uncertainty as a friend. And I would like to illustrate that at the end of the talk, but I will focus primarily on uncertainty as a full. Now, before we start talking about specific issues, I think it's useful to make a distinction between variation and uncertainty. So if you set up a little two by two grid, you could have no variation, no uncertainty, no variation, uncertainty, and the other way around and then both. And I will actually focus on the both aspects. But let's look at some examples. So if one talks about the location of Calcutta, clearly there's no uncertainty as to where it is and there's no variation. Calcutta is where it is. So this is a situation of no uncertainty, no variation. There's a lot of uncertainty about life on Mars. It's either there or not, so there's no variation, but it's clear that at this point we don't know whether there's life on Mars or not. NASA would like to know this, of course, and is spending lots of our money to find out. Finally, there's variation without uncertainties. If you think about the mileposts along the highway, they clearly vary, but they're very predictable and there's no uncertainty about that. There's more uncertainty. There is uncertainty and variation. And for example, the question is radiation therapy better than surgery or chemotherapy for prostate cancer. There's clearly variability in survival times of patients and there's clearly variability in the treatment. And the question is how do radiation and chemotherapy differ, and that's a question that involves both variation and uncertainty. Now, I'm going to actually illustrate all of this by a balloon activity. All of you should have a balloon on your chair, and there was one person here already who I actually did what I'm going to ask him to do. Would you blow up the balloon? Everybody blow up the balloon, please. Blow it up pretty high. Keep blowing. Make it about 13 inches. This is also an exercise in medicine where we look at forced expiratory volume one to look at your lung function. I see a lot of good lung function out here, but also some relatively dubious lung function. Now, hold the balloons up in the air. Everybody hold the balloons up in the air. Out of three, I say go. One, two, three, go. Now, where did your balloon wind up? It's uncertain. How far did your balloon go? It varies. So this is clearly an example of variation and uncertainty, and there's actually more that could be said about this, so let's think about this just a little bit more. If your birthday is in Friday, if your birthday is in February, pick up the balloon nearest you. Anybody whose birthday is in February, pick up the balloon. Now, all the people with birthdays in February stand up. Wow. Anymore. There are only three people with birthdays, or five people with birthdays in February. Now, the question is how many of you, there are five people, how many of you have a red balloon? One? Just one red balloon. So is there another red balloon out there? I can't quite see it. All right. Anyway, you may sit down. Thank you very much. What's the point of all this? Well, when we blew up the balloons, all the balloons in the room, we could call that the population. So that's the population of what we're interested in. Secondly, we had a randomizing mechanism. We blew up the balloons and we distributed them. And where they went, I didn't know, you didn't know, and we still don't know where these balloons went. Now, the February birthday people represent a sample. There were only five people in the room that had birthdays in February. And then finally, the color of the balloon is the observation. This is what we observed. And it was out of the five people, there was one person with a red balloon. Now, this is the kind of model that I want to talk about, basically throughout this talk and illustrate how there are little perturbations that really create problems when you try to make inferences about how the balloons got distributed in the room. So I want to talk about uncertainty as a foe. Well, clearly, we just went through a midterm election. The results on Monday, we hope said something about the results on Tuesday, but we are uncertain. So polling is really a determination of trying to find out what's going to happen on election day before the election actually happens. So we have a population of interest, namely, in this case, would be the people that actually go out and vote, and then the pollsters try to find out before the election how these people are going to vote, if they're going to vote, and there are all kinds of problems, of course, associated with polling. So polling is one of the problems that we have where there's uncertainty and variation. Another one is, say, time to go to work. Many of you go to work in the morning, and the time that it takes you varies in any variety of ways, depending on lots of things, road conditions, whether you take Metro or whatever it is. Homework for class, for those of you still in school, well, the question is, how long is the homework going to take me? Well, it depends on the subject. The math work may be easy. The English is going to be harder because I've got to write an essay. So the length of time that it takes me to do the work depends very much on the assignment. And then, nicely, just for the sake of argument, success for cancer treatment. We have a son, George's brother, who currently is undergoing therapy for colon cancer, and our question is, how well is the treatment going to work? From a statistical point of view, what is his survival time like? Can we estimate that? Well, it depends on many things. It depends on where the cancer is. It depends on the degree of the cancer and so on and so forth. There's a lot of uncertainty in his case, and then there's a lot of variability from person to person who has colon cancer. So some live long, others don't. So that clearly is a very important issue for us, and for you probably, in your situation, there will be situations like that as well. Now, Alfred North Whitehead of Whitehead and Russell wrote a book called Science in the Modern World dated to 1925. That really was a great book. I just love the book. I read it over and over again. My copy is kind of falling apart. And he said, first of all, that observation is selection. And that's clear what we had with the balloon experiment. We had five people with birthdays in February. So that's already a selection. And then we also note, he also notes that the things directly observed are almost always a sample. So what we have is, in this particular case, a sample of five people and only one has a red balloon. If I'd asked for people with birthdays in June, another group would have stood up, maybe bigger, maybe smaller, we don't know. And they would have had some set of balloons and they might have had three or four red balloons. Who knows? So the whole point is that we have a sample and what we're really interested in is what can we say about the population? That's the key question of statistics and that's the key question for many situations involving life. So observation is selection. And selection creates uncertainty. And the key question is, where does the observation come from? And we talk about that as being the population, the sample space. In more philosophical areas, they talk about a universe of discourse. So that basically talks about the area that we want to be discussing. And the question always is, how is selected? And the only thing, if you take away one thing from this talk, then I would say the question is, anytime you have an observation, you should instinctively and immediately ask the question, where did the observation come from? What is the origin of the observation? Because I'm probably not too interested in the observation. I'm more interested in where the observation came from. And so that also raises the question of selection. And what I'm going to do is, I'm going to illustrate this by means of six stories. So the first story is a war story. I lived in the Netherlands during the Second World War and I remember the planes coming over from Britain, making bombing runs into Germany. And I also remember that they didn't restrict themselves to bombing runs in Germany because the city where I lived, which was Enschede, which was just inside the Netherlands adjacent to the German border, was what was called the secondary target. The secondary target was that if they missed the target, the main target in Germany, then they would unload on secondary targets. And the reason my city was a secondary target was that there was a large airfield right next to the city. So that's the place where they would also drop the bombs and the problem was accuracy was not exactly very great because I suspected these guys flying back said, drop this stuff and let's get the hell out of here. So anyway, so one personal story on that is, my mother took my brother, Peter, to our uncle, to visit her father in the hospital and they get outside the house about a block and all of a sudden there's a plane or planes that are dropping bombs. Usually there's enough warning, there was no warning. So my mother took my brother, Peter, threw him against the wall of the house and laid on top of him. And the air raid was over and she stood up and he couldn't stand up because he was too excited. He had his calf was cut in half and to this day he limps because of that. So he had a very distinct experience of the war which he carries to this day. Well anyway, sorry for getting a little emotional there, the question to the allies was how can we reinforce the planes so that they have greater chance of coming back? So how can we reinforce these planes? So they assigned it to a statistician named Abram Wald who was a Hungarian mathematician that had fled to the United States in 1938 and he was with what is called the statistical research group during World War II and very, very well known statistician and mathematician actually he was a graduate in mathematics and he's the founder of operations research and sequential analysis for example. Now these are the kinds of planes, these are pictures of planes that returned to the airfields in Germany. Here's one, and here's one, I find it quite amazing, this plane was actually repaired and flew again. So the idea was that you would take the profile of a plane and find out where the plane had been hit and so Wald was assigned the problem of finding out where these planes were vulnerable so that they could be reinforced. Now he didn't say so, but he basically said reinforce the planes where they have not been hit and you sort of say hey how can that be? So I want you, as you said in your row, form groups of three or four and think for a few minutes about why Wald gave that advice. So just form a little group, little group here, little groups there and just form little groups and discuss for a few minutes why you think Wald gave that advice. Alright, would anybody like to offer a solution? There's one person there, just wait for a minute and give that person the mic. Yeah, I sort of have a advantage because I came from the military too, but the thought that jumped in my mind was that these are the planes that made it back. So if they made it back and got hit where they got hit and on your premise that they could get hit anywhere then more likely that the places, the ones that didn't make it back got hit somewhere else too. Anybody else would like to offer a comment? Right over here, over there. You didn't tell us what the sample was and how many planes had the spots in the same place also. And under probability, the topic of the speech here is you never step in the same river twice so I have a feeling that the probability of being hit in the same spot probably is fairly low. Okay, any other comments? Okay, well, I'll answer, I'll deal with the second comment first. Walt looked at about the records of about 500 planes. I don't know how many planes went out on a particular sortie but that's the number that he looked at so there were clearly waves of planes that went back and forth whether the 400 or 500 were one sortie which actually happened or whether that was a combination of sorties I really don't know. But your answer is of course right on. The question is where are the planes that are downed and those are the planes that you don't see because they don't come back. So what Walt did was a very mathematical study of where the planes were hit and assuming an equal distribution over the surface of the planes he showed very clearly that the engine compartments were the most vulnerable which when you sort of think about it makes sense so if the engine gets hit the plane just doesn't come back but the whole point was that here we have a situation where you have a sample but the sample is not representative of the population in fact it's not even close to being representative because what we're really interested in are the planes that don't come back. So this is a clear example of a bias sample and the usual jargon is a survivor bias because those are the ones the planes that come back are clearly ones that survive. So let me show you, Walt wrote an 80 page monograph so we've talked about that. Before I do, there's no picture in the Walt monograph he wrote this 80 page monograph and it's all mathematics and it's pretty dense mathematics and Walt never said reinforce the planes where they've not been hit but that's been basically the story what he showed was that on his mathematical analysis which took 80 pages that the planes that had that the engine compartments were the ones that were quite vulnerable to having the plane come down and he also showed the probability of a plane being down if it was hit more than once so that gets back to your point as well. Now here for example is chapter 5 which deals with the mathematics of the data that he got and it's clear that it's pretty dense mathematics but he had a very practical idea in mind of course and he came up with the right answer. So now let's go to a polling story we've heard about elections and this is the election in 1936 Landon versus Roosevelt and the Literary Digest was a magazine that had been polling since 1920 by polling I mean that they would send out sample ballots to a population and then as the ballots came back they would tally them and every week they would start sending them out in August and every week they would sort of tally how many ballots had been returned and who people voted for so just to give you some idea in 1928 they sent out 19 million ballots now all of these took a stamp this is not email all of these took a stamp so just imagine the amount of money that they spent on doing that in 1932 they sent out 20 million ballots and in 1936 10 million ballots and in each year they received more than 2 million responses this is big data even for 1936 well what happened in 1920 they predicted correctly in 1924 they predicted correctly in 1928 they predicted correctly in 1932 they predicted correctly who would win in 1936 they hit a bomb not a real bomb but they predicted that Landon would get 55% of the vote Roosevelt 41 and point of fact Landon got 37 and Roosevelt got 61 and the magazine really had to eat crow so two weeks after the election they published an article what went wrong with the polls and also a picture of the front of the magazine which they couldn't publish then because they had already made the picture but is our face red so they clearly were embarrassed by this complete fiasco in terms of predicting the results of the election but what went wrong well after all you know again getting back to our example here we have a population of suppose we had only red balloons and blue balloons in this room that would correspond to the Democratic and the Republican Party and what we really want to find out is how are the people going to vote on the day of the election well who did they send the ballots to well they sent the ballots to mailing lists of phone numbers mailing lists of houses mailing lists of voter registration that's how they came up with their their 20 million 10 million ballots well the problem was that one problem was that in 1936 not that many people had phones who had the phones well the upper crust clearly had had the phones and they tended to vote Republican so this was one problem another problem was low response even though they got 2 million ballots back 2 million out of 10 million one in five why one in five what happened to the other four why were they not sent in so there was a low response rate then there was a problem of low response bias so again who were the people who had perhaps money to put a stamp on the ballot and mail it back you can think that it must have been the people that had more resources than those that don't and the people that had less resources were typically would vote or were more likely to vote Democratic and so they missed that particular part of the population and then finally one of the problems with this was that they started mailing the ballots in August well just imagine ballots get mailed back they have to be sorted probably by hand I'm not sure what they did in those days it takes a lot of time to assemble this stuff but you know there's the deadline of pressure we got to get this magazine out every two every week so the third week in October was the final issue the data probably came from the beginning of October so now they had all kinds of problems this was a perfect storm selection bias non-response a change in time people tended to become more Democratic over time so by the time that they got the data and published it they were just way off the mark and the interesting thing is that the literary digest folded two years later just was gone and because they could never live this fiasco down and again so remember even though you have two million ballots in your hand where did these ballots come from that's really the key question and that's the question of the balloons as well well things don't always go for the better there was one person in 1936 who predicted correctly guess who this was? George Gallup because he took a random sample of the population he didn't go by voter registration list he had people out on the streets picking people at random he had a very careful randomization scheme and he predicted the 1936 election right on come 1948 Gallup misses the boat everybody misses the boat why is that well because in 1948 the two key people are Dewey and Truman Dewey has such a huge lead that in August because polling is expensive they decide to stop polling so Gallup and everybody else hey you know why spend money because it takes a lot of money to go out and poll they stop polling and what happens of course Truman went on his whistle stop campaign went from the east to the west in two weeks in October and basically convinced enough people to vote democratic the Chicago Tribune had a problem they their their press men were on strike so the management was running the presses there were more problems but that was the big problem so these guys on Tuesday by six o'clock they had to get the paper ready because it had to be shipped out in the evening so at six o'clock they said Dewey is going to win so big headline Dewey defeats Truman by six o'clock in the morning they realized hey you know we've done the wrong thing so they sent their carriers out to recover all the papers and of course a lot of people laughed their heads off and said we're not going to give this paper back you know they were often free copies and they said aha this is too good to be true so that's the reason that the Chicago Daily Tribune has the headline Dewey defeats Truman and there's a picture of Truman holding up the paper Dewey defeats Truman so that's another example of selection bias well let me tell you another war story Donald Rumsfeld some of you remember Donald Rumsfeld was the Secretary of Defense during the Bush Administration during the war in Iraq and Donald Rumsfeld was I think one of the one of the people who was quicker on his feet than anybody else than I can remember he could just spout off stuff so he was interviewed at a regular press conference on February 12, 2002 and the question was there are reports that there's no evidence of a direct link between Baghdad and terrorist organizations and here is his response there are reports that there is no evidence of a direct link between Baghdad and some of these terrorist organizations there are known knowns there are things we know we know we also know there are known unknowns that is to say we know there are some things we do not know but there are also unknown unknowns the ones we don't know we don't know excuse me but is this an unknown unknown I'm not several unknowns and I'm just I'm not going to say which it is the Secretary I'm right here okay so let me just make sure that you heard what he said so he said that as you know as we know there are known knowns there are things we know we know there are also known things there are known unknowns that is to say we know there are some things we don't know but there are also unknown unknowns and this was so funny that a couple of people put this to music Brian Kong and the Elender Wall and let me play this for you it turns out that Rumsfeld makes a very good point namely you should always ask two questions if data are missing one question is are the data missing but the second question is why are the data missing and that's really a very important aspect and I can show you some examples of that of course so if you're late at work because of an accident statistician so your your presence is absent statistician would say that this is missing at random it's got nothing to do with your work maybe you should have left earlier if you'd heard about the accident there's nothing you can do about that so this is called missing at random but there's informative missingness for example we know that young voters tend to vote fewer times than older people there's a pretty clear correlation between proportioned voting and age we also know that young people tend to vote for one party more than for another party I won't mention which party that a missing vote from a young person represents informative missingness because if that person had voted that might have changed the election so that would be called informative missingness now I had this little thing about big data that I think it's overblown big data presents the challenge that first of all you often don't even know where the data are missing and you almost certainly never know why the data are missing and so that really creates a problem of inference in terms of how valid are the data for the kinds of things that you want to deal with and think of the literary example that clearly was a case of informative missingness because the people that were going to vote Republican weren't being polled so here I have a genuine fake news story let me read it to you this is from Washington D.C. not the other Washington a month ago President Trump and his son Donald Jr. and joined the golf outing at the president's home base Tago Del Lago Donald Jr. is a fairly good golfer with an average score in the high 90s on this particular Sunday he scored a disheartening 115 his father ever supported said Donny why don't you have a couple of visits with my pro my treat and he can teach you some of the finer points of putting one month later another golf outing Donald Jr. scores 99 the president says I told you my pro really helped you improve your score do you agree why don't you just have a little discussion with your group and tell me whether you agree with the president's conclusion any thoughts anybody have a idea well he normally golfs in the 90s anyway so maybe he went back to the norm and secondly you don't really know whether he went to the golf pro I'll pick the first part of that answer any other comments or thoughts I think that like if he was in the 90s and maybe he accidentally scored 115 that time so so it's like the same thing so I disagree so it turns out there is in statistics something called regression to the mean so 119 as you pointed out was in fact an outlier his average score is around 100 and he just had a bad day so as we also learn from this genuinely fake news story Donald Jr. says under his breath I never went to see the pro so regression to the mean was discovered by a statistician in Francis Galton a polymath and he what he did was he looked at the heights of parents versus the heights of children and he discovered the following thing that tall parents tend to have tall children but not quite as tall if George were to stand up next to me he's not quite as tall as I am similarly short parents tended to have short parents but he wasn't quite as short as the parents were so he talked about that this was a phenomenon of regression to the mean so I would think that in the case of Donald Jr. that this was simply a case of regression to the mean so you have to be careful again when you interpret data that you don't attribute causes to data that really don't support that particular cause in spite of the fact that the president says that was the reason and there was something called the Sports Illustrated Jinx what anybody heard about the Sports Illustrated Jinx you've heard about it, you've heard about it so the Sports Illustrated Jinx is that if you're an athlete and you're on the front page of Sports Illustrated it's probably because you had the major league batting average whatever it is the next year you're going to do worse so that's simply a case of regression to the mean and so athletes call this the Sports Illustrated Jinx but in fact there's not that much of a jinx actually a matter of regression to the mean alright, another story I received the following email which gives me a tip to buy certain stocks I had some money so I invested in that stock and a week later they sent another tip I invested in that stock that they recommended and this happened six weeks in a row every time they sent me a thing they were right on it was really amazing so on week 7 two things happened first of all I received an inheritance of $50,000 secondly the company that sent these emails said well we've been six times we've been right six times in a row if you really want to get our advice for this week please send us a check for $1,500 for an annual fee so what is your advice should I join this company, what do you think? discuss briefly okay let's hear it from the audience thank you so one possibility could be that people who received this newsletter bought that stock they themselves drove the price of that stock higher and every single time it was so self-fulfilling prophecy so it wasn't that they made the right call they just convinced people to buy any other thoughts over here over here or it could be that they just started with lots of people and only continued to send messages the people they were right on each time well let me show you a scheme not saying this I think it happened again the whitehead question I have these six observations where do these data come from? well they considered the following the company selected 9,600 affluent people they sent half of them one piece of advice and the other half the opposite piece of advice the ones that got the wrong advice they dropped now that we have 4,800 people they sent half of those one piece of advice and the other half another piece of the advice the ones that got the wrong advice they dropped doing this and I just happened to be in the pot that got advice six times in a row correct but it was just random so this is called the Rainmaker Scheme and it's illegal don't just start doing that so anyway if you do this six times it leaves 150 people including me and they would want $1,500 $2,000 from 150 people but there is no reason for spugness assume that there are 50,000 financial advisors in the US I think there are more and each advisor gives you advice one week to do this or to do that how does that differ from the Rainmaker Scheme? I think it's identical so be careful that you don't put too much trust in your financial advisor that's one of the bottom lines alright we're moving on does your dog bite this is from a Peter Sellers movie Peter Sellers The Revenge of the Pink Panther 1975 let's see what we can do here here's Peter Sellers looking for a room in Switzerland does your dog bite oh oh there is a doggie I thought you said your dog did not bite is that a snot in my talk? so the question is very important the question determines the sample space and so you have to be careful that the question addresses the point that you're interested in and that brings us back to the sampling story about you know have you stopped using drones for example well if you ask that at an AA meeting that's probably appropriate but it wouldn't be appropriate for this audience so the question really determines the audience that you're dealing with and then my final story is a rainy day story of the dogs and you have to go from one building to the next and you have this existential question should I walk or should I run which gives me the least rain so that's the question now who remembers this person Anne Landers Anne Landers so you can ask Anne Landers any question this question was asked of Anne Landers okay and she was asked the question and she said she never got more responses from people than answers to that question and it turned out that one college gave kind of a nice way of actually answering the question some questions were well it depends on the rain as she mentioned there was some answers were so heavy with mathematics she couldn't understand the mathematics she had a hard time balancing her checkbook so one college did the following they took 20 students and they randomized them to 10 of them either to running or 10 to walk and before they started to walk they pasted a piece of tissue paper wiping paper on their chest and so one group walked the other group ran and what turned out to be very clever at the end they simply weighed the paper and so clearly the more rain the heavier the paper would be and the less rain the less paper and the bottom line turned out to be that they you're better off running and there are all kinds of permutations of that if you go to the internet and look up running in the rain you'll get pages of mathematics I'm not kidding because it depends whether the wind is coming from the back and it depends whether the rain is at a certain angle and it depends whether you've been forward or not but all this is really kind of commentary the bottom line is that the key is that you better run rather than walk well this is an illustration of design let me just go back for a minute and what have we concluded so far well first of all if we take observations they must represent the sample space in some sense and a lack of representativeness can result in bias and a representative sample turns out to be a random sample which simply means that every observation in the population has an equal chance of being selected so when we go back to the bomber story every plane should have had an equal chance of being represented in the data that Walt got and it was clear that there was no equal chance for the planes that didn't return so if he had had a truly random sample from the planes then he would would have been able to make a much better inference well let me just briefly conclude with uncertainty as a friend well it turns out that uncertainty as a friend occurs in many cases and in fact when you think about it uncertainty as a friend is more common than uncertainty as a foal for example all board game shoots and ladders shoots and ladders you roll a die in monopoly you roll a die in cards you shuffle cards computer games, all computer games have what is called a random number generator so there's a random number that's generated which gives you a certain play or moves things in a certain way and the random number generator is in fact a generator of uncertainty so there's uncertainty in all kinds of situations that we're involved with in sports when you think about sports it's permeated with uncertainty from the trivial aspect they toss a coin before the football game to find out which side goes which but think of sports divisions so you have major league ball you have minor league ball and major league ball is if a major league team were to play a minor league team it would be kind of boring because they were always winning so the whole idea of professional sports is to make things as uncertain as possible so for example drafts the worst team in the league which gets trounced by everybody gets the first pick in the draft why is that? to make them more competitive to equalize the probability of winning because only if you're sitting at the edge of your seat watching Green Bay play whatever it is and if the score is tied at the end you're going to stick through all the ads that come and watch the end game so the whole idea about sports is to increase uncertainty rather than reduce uncertainty the coaches of course would like to reduce uncertainty but the whole setup is to increase uncertainty so those are some of the things that go on in sports government government has antitrust laws so you can't have one company have a monopoly and why is that? because it creates what is called an equal playing field so that's another area where you have increased uncertainty because it makes things more competitive and then in medicine which is the area that I've been involved with it happens all the time when we want to study my last or one of my last areas was Alzheimer's disease which is intended to presumably prevent memory loss among other things how are you going to judge whether a new drug works? well you give maybe one group, one drug and the other group a standard drug and then you compare the memory scores at the start of the study and at the end of the study and you want to compare those two and what you do is you then look at what you want to do is you want to make the two groups as comparable as possible so what you do is you randomly allocate people to one treatment or to another treatment which makes the groups comparable there's something called evidence based medicine which is a term that came up I think in 1978 the idea being that evidence has various levels there's level A evidence, level B evidence, level C and D level A evidence in clinical trials which the FDA accepts primarily for studies is evidence of studies based on randomized clinical trials so if a drug firm also submit an application based on observational data only the FDA is going to be very skeptical so level A evidence is based on randomized controlled clinical trials which is a way of making uncertainty equal across both groups that's how you do that level B is very carefully done observational studies level D is expert opinion so if you claim that well I know this treatment works because I'm a doctor a real doctor therefore this works that's level B evidence which has very little merit in clinical trials so we have uncertainty as friend and we have uncertainty as full I think that the two actually come together and can be meshed simply by saying that you need equal opportunity when we talked about the wall study what we would want is equal opportunity for every plane to be represented in the sample that didn't happen if we can make that then we have equal opportunity for uncertainty as friend we have equal opportunity so you have an equal chance to be in group one or group two single note again and then I will quit our son was randomized to a clinical trial we said yeah of course joined a clinical trial for colon cancer well the idea was that there were two treatments one there was a new chemotherapy which was supposed to shrink the tumor without radiation so we said gee let's hope you get randomized to that arm there was radiation and chemotherapy at the same time so he went through the chemotherapy treatment for three months but taking in his feet could hardly stand the tumor had grown so he would have been better off in the other arm because now we had to go through radiation and anyway but at the time of the study he presumably was randomized by the cost of the coin to either the new treatment or the old treatment he had an equal chance of being at one or the other so equal opportunity is the answer to both uncertainty as foe and as friend and William Cowper sort of said uncertainty is the very spice of life that gives it all its flavor and I'd like to acknowledge again Cindy Lawrence and staff for MoMath for having me here Jord for his introduction and guidance with audio visual slides my wife Johanna of 55 years five years ago she became Frodo but that didn't last too long and then you the audience thank you very much and if you have comments send them to me if anyone has any questions how are the inferences from big data currently a popular buzzword similar to the result prediction for the 1936 US presidential election well I think the big data issue has the same problem as the 1936 election why were they collected a lot of big data is collected for administrative purposes and what does that mean that means that it's not done for science and so the scientific aspects of a lot of big data is dubious that's the way it is so I think that I'm somewhat skeptical of big data that that's probably somewhat of a bias that I should not exercise too strongly yes I'm coming from an engineering background but in there we were taught about something called Hawthorne effect we were taught about something called Hawthorne effect where when you make a measurement because I'm coming back to your premise or comment early on about every observation is a sample but then with the Hawthorne thing it's the effect you have an effect on the result right? so the Hawthorne effect was during a factory I'm not sure whether they're Hawthorne commended it might have been the person doing the study they had two groups one that they were going to give sort of one set of encouragement and the other group another set of encouragement to increase productivity well they found that both groups improved and the reason was simply that the employees got some attention and they became more interested in their work and that's why they improved and this is really a big problem in Alzheimer's disease getting back again there are all kinds of studies that deal with improving memory improved memory so I've been involved with reviewing grants that computer generated games to improve memory well that's the treatment group what's the appropriate control group you can't just let these people be by themselves because it might just be that giving these people attention these patients attention makes them more active and gives them so the control group in these computer generated studies is very tricky and so one thing they'll do is they'll give them reading material well the last thing I knew is to read another thing about Alzheimer's disease if I can do a computer game another problem is that if people know that one treatment is computer games they may say hey let's try some computer games ourselves so the treatment no longer is pure if you wish but it's contaminated by another thing so it's very tricky to give treatments that actually differ without making without having both have some effect that's simply the effect of any treatment so thank you and let's give a hand to our speaker