 Hello, I'm Oliver Pedra and I'm going to give a series of presentations to introduce Bayesian regression This is the first of a series of three presentations and Together with these presentations. There is other material that you can access including exercises so in this first presentation, I will introduce the principle of Bayesian regression firstly by comparing the Bayesian approach with the more Usually used Frequentist approach and in the second part of this presentation I will provide more details about the principles of Bayesian analysis So the Bayesian approach basically provides a different approach to hypothesis testing different from the traditional approach, which is often called frequentist and Particularly some of the advantages of a Bayesian approach So some of the reasons why I've been particularly interested in using a Bayesian approach to some analysis Is that it allows to include previous knowledge that we have on the phenomena? We are interested in so it allows to capitalize on previous research and previous matter analysis and so on and he also allows to incorporate every piece of information however small and use that to update our knowledge in light of the new evidence we have provided and At the end of this short presentation You'll have a clear idea of what I mean by this So to appreciate what a Bayesian approach consists of it is useful to compare it to the Convention methods that are usually learned in undergraduate courses the so-called Frequentist approach in both approaches. We are dealing with parameters Parameters are measurable characteristics of a population for example the mean IQ of adults in a country The conventional approach to hypothesis testing assumes that parameters are unknown But are fixed and it's for this reason that the conventional methods take a counterfactual approach Say for example if we experimentally tested a training program and This program is supposed to make our participants smarter We can then test the difference in IQ of the trained participants against Average IQ of participants that haven't been tested and We will start with a counterfactual scenario where we assume that the difference between this means This other average IQ is exactly zero or else the two groups are Supposed to have exactly the same average IQ. So this is the null hypothesis scenario and Based on this counterfactual scenario the statistical methods that Convention Used estimate the probability that our observed difference in IQ between the two groups occurred from the null hypothesis and This scenario is estimated assuming that we can replicate the experiment experiment several times so if the Difference in IQ between the two groups that we observe It's unlikely to have been generated from the null hypothesis scenario We can reject the null hypothesis that there is no difference and we can say that there is a difference in the average IQ of the two groups So one problem with this approach is that the p-value is defined in a scenario where we assume We can repeat the study several times or as the process that generates the data is supposed to be repeatable But this is not always the case For example in metanizes the collection of studies should be considered as an one-off There is also a problem in interpreting the p-value and the results Assume that our control group had IQ 100 If the p-value of our test of mean differences was point zero one They people tend to assume that there is only one percent probability that the mean of our trained Participants is also one hundred This is incorrect though because the p-value represents the probability that we will fail to reject the null Hypothesis of no difference between trained and control if in reality There was no difference between the average IQs of the two groups So the results really do not inform about the probability of the parameter of interest For example, what is the probability? There are trained participants have average IQ equal to 100 and this is the type of Answers that some people may want from the analysis and the conventional approach Doesn't easily provide this type of answers Another problem is that we've larger samples and increased Estimate precision it becomes more likely we can reject the null hypothesis So it's possible in large studies that we can have significant effects that are relatively unimportant or inconsequential and again one this problem arises because we cannot Provide the probability we cannot easily provide the probability of a parameter being Assuming certain values in the conventional approach A further problem is that we can assume that the data from the Hypothetical reputation of the study will follow known distributions if our samples are relatively large With small samples the underlying sampling distributions may be unknown And we may not be able to use information collected unless we make some great leaps of faith so the Bayesian approach provides a different way of Conceptualizes parameters and I will show it in the next slide So in the Bayesian approach The parameters are not considered to be fixed but instead there are variables measurable population characteristics that are uncertain So if there are sense uncertain parameters can be Described by a probability distribution in other words we can estimate the probability that a parameter like a population mean Assumes a certain value or as lies between a specific range range of values and so on So we can answer questions about the probability of parameters characteristics of the population Assuming different values and judge how likely or unlikely are Some scenarios different scenarios where we Can that we might be interested in for example IQ scores being between a certain range rather than another range So how can we do this the analysis basically start from I from I a Priority assumptions on the distribution and values of parameters some some prior assumptions on What the data could look like? These assumptions may be based on previous knowledge previous research Meta analysis and so on and they may be broad for example If I did a study where I train some people and expecting them to become more Intelligent Road assumption could say that I expect the train people will to have an average IQ between 70 and 230 say and assume also that Any value between those two extreme values is equally likely It's equally probably probable The I priori distribution then it's updated considering the new data that I have collected With new information collected by a study. I can update the probability distribution of the parameter Giving more credibility to some values rather than others Giving more credibility to some values conditionally on the evidence I have collected So if I for example the information That I have collected may show that it is very unlikely that my participants Average IQ is 70 so I can update the probability distribution of the parameter value and This updated distribution is called the posterior because it is estimated after seeing the data It's a probability distribution of the parameters that is Updated considering and conditionally on the data I have collected Once I have an estimated posterior distribution of the plausible values of the average IQ of My participants. I can't sample from this probability distribution The probability distribution of the parameter to describe means and medians of the probability Distribution for example, what is the most plausible value that that parameter may take or Also to describe the range of parameter values that are more plausible more likely based on the estimated posterior distribution so Bayesian analysis is it's basically a formal way to update models of the world by using evidence and Crush came one of the papers. I have Referenced with the material for this module Make an example where if there are two candidates in an election We can ask our friends who they intend to vote and based on these answers. We can form some expectations But then we may read some polls and based on these results. We update these expectations and Particularly if the polls we're reading have a large random sample The data may help us shift our prior beliefs to some more defined More precise beliefs about who's going to win the election Bayesian analysis basically provides a formal way to define how to update our beliefs and how we should Use data to update our beliefs and I will now provide a more formal example of this process with the next slides So the first element in a Bayesian analysis is a prior a prior is a a priori Sumption on what the data will look like and I work with this example. So I Assume that there is going to be an election and there are two candidates a blue candidate and the red candidates and I am assuming that Null votes or abstentions are not possible and indeed no Interview II Cannot can say that they don't know who they're going to vote. So they are forced to Give preference or say they are going to vote for one or the other. So that example I'm working with so in this scenario I have three variables and The first one is the proportion of people that intend to vote red, which I call P So P is the parameter and I want to get a plausible estimate of what is the proportion of people that intend to vote for the red candidate, but this Parameter is uncertain However, I can infer it by collecting other information other variables and the other variables here are the number of people that intend to vote for The red candidate that I observe and the total number of people I have interviewed So if you I don't know anything about the constituency and the candidates, I may start by assuming that Any type of outcome is equally plausible So this is my Simple prior that you can see represented here where the Horizontal axis represents the possible values of P from zero Which means that everyone will intends to vote for candidate blue for the blue candidate and P equal one which means that everyone in in the in the population Intends to vote for the red candidate the flat line between those two extremes values Means that I am giving equal credibility To every possible outcome. So this really represents my naive Opinion of not knowing the constituency is not knowing anything about the candidates and Assuming that every possible outcome is equally possible. So the probability that the parameter takes a Certain value is completely flat So I can collect some data and I want to use this data to update my beliefs and Estimate how plausible different combinations of preferences for red and blue are So in this case I have a random sample of five people that I have interviewed and three out of five Say that they intend to vote for red. So Yeah, now the function to assign reassigned the plausibility of the parameter P Is a likelihood function. So this is the second element in a Bayesian approach the likelihood likelihood function is Basically a function that represents the most likely values of the parameter given the data In other words, it is a mathematical function that tells me how the data observed May have come about may have happened in this example. The likelihood may take the four Distribution and this this this Mathematical formula basically says that the number of preferences for the red candidate will be Approximately approximately distributed according to by a binomial distribution With n represented a number of respondents at P represented the uncertain parameter values that I want to estimate So this formula is basically reporting the likelihood that respondents will express a preference for red given an observation and the parameter and That's represents how the variable relate with the model parameters to create the data So now that I have a prior assumption data and the likelihood function I want to use the information the data to update my beliefs or rather the Puzzibility the estimated plausibility of different combinations of the parameter and how can I do that? For this purpose, I use the Bayesian theorem. So This Basically states that the probability and you can see it formally described here and says that the probability of Any particular value of the parameter P considering the data Is equal to the product of the relative relative plausibility of the data conditional on the parameter P and The prior value of the parameter P. So this product then is divided by the average probability of the data So what this basically means is that the posterior distribution of probability is proportional to the product of the prior assumptions and the likelihood of the data and in the material attached with this module, I also have provided a script that shows these examples and how I created this prior and posterior probabilities and You can see how that effectively The posterior is created by Firstly calculate calculating the likelihood of my observed data considering the range of values of the parameter P from zero to one And this is multiplied by the prior probability of P the probability that I had assumed Different values of different combinations of values of P will have And this is divided by the sum of all the products of the likelihood by the prior and We'll create this updated distribution of probability that you can see here on the right side where the probability the plausibility of the Different combinations of the parameter P has been reallocated Here you can see that the mean in this posterior distribution of the parameter P shows that the most plausible parameter value is very close to point zero sixty, which is the observed value in the small sample I had But you can also see that other values are quite likely for example P value of point fifty It's also quite plausible according to this posterior distribution and this is because obviously with only five participants I shouldn't reallocate the credibility of the parameters to narrowly an observed result of free intentions to vote for red out of five Could have easily been generated by the under the parameter being points your fifty or points your seventy and so on But when I collect more interviews from a new random sample of fifty participants here I can walk from my previous model where the plausibility of the Combinations of parameter values had been reallocated. So this means my prior now is more precise Based on my previous updating and indeed. Why shouldn't it be the case? Why should I have worked from the assumption that zero votes for the intention? Zero votes for red was as likely as other combinations When my previous data showed that this was Already a very unlikely event since some of the people in the sample in the smaller sample I had earlier already expressed an intention to vote for red So using the same methods I now can update the previous assumptions that I had Updated to reallocate the possibility of the values considering my new data and You can see how I did this in the script attached with this Module and you can see that considering new evidence allows to narrow the posterior distribution as you can see here So the analyzer allowed me to update the relative credibility of the parameter values in ways that are consistent with the new data and This updated attribution of the credibility to a range of parameter values is useful If my data are not biased for example, if my sample is representative of the population of interest But this is really This is really a type of bias that is not a statistical problem. It's a problem of research. So as long as I have good data I can rely. I can update my belief my Expected plausibilities of the parameters in ways that are Providing informative Results So I want to emphasize again that the posterior probability distribution describes the credibility of the of different values of the parameter p Conditionally on the data collected and condition on the previous assumptions condition on the previous model and the prior plays an Important role in avoiding that our models get to skewed or to narrow by new data And I will talk more about this in the second presentation after this one But once we I have a posterior distribution I can use it to create large random samples from this distribution that allows That allowed to describe more than information in other words And draw random samples from the posterior distribution and because those samples are random they Are unbiased and provide a reliable description of the underlying distribution of plausible values of the parameters and You can follow the script. I have used to Samples the script again is provided with the material for this module and In this script I do I drew 2,000 random samples from the posterior distribution and in the Left Plot here you can see I represented them here in the Vertical axis you can see represented a difference probabilities of P. So a preference for candidate red and you can see that most of the Samples from the posterior distribution cluster round values that are Point zero forty two point seventy five and indeed looking at the posterior distribution I can say that ninety percent of the sample Lie of the samples lie between values that indicate that forty nine percent Point forty percent forty nine point forty percent Probability of voting for red and 66 point 26 percent of Preference for red. So I can say that there is ninety percent probability that the rate of votes for red Candidate will be between forty nine and sixty six percent and Using the post the samples from the posterior you can look at different other Is to report the results. So you can say for example that the medium value is 57 57 point eighty three. So the most One of the most plausible values of preference for red is around 58 percent and so on so the The key characteristic of the posterior distribution is that all possible by all possible values of the parameter Are ranked by the logical plausibility and in the second presentation after this one I will illustrate how these methods can be applied to Regression analysis So just to conclude this presentation. I wanted to use this quote from Michael and the statistical rethinking and It's a very good quote because it says it Describes the gist of Bayesian analysis saying that Bayesian analysis takes a question in the form of a model and Uses logic to produce an answer in the form of probability distributions and I think this is really a Way to describe what Bayesian analysis can do in the second presentation I will then talk about Bayesian approach to regression analysis. Thank you very much by now Please remember to also check the web page of the National Center for Research methods for more presentations and more material Thank you