 Okay, so today we're going to look at a case where we have a normal distribution for our data, our observed data x, with a known mean mu, but a known known variance sigma squared. And we want to be able to show that a conjugate prior exists for sigma squared. So we start off by writing out our likelihood, which I have done already here, and to note that the convention is that a bold-faced xn describes x1 to xn individually. It's the product of each of the individual observations from this distribution. It's the standard form of a normal distribution. So we, first of all, want to tidy this up so we can see where sigma squared is properly appearing in our data. And that gives us a hint about the functional form that we might be able to use for a conjugate prior. So we start off by saying our likelihood, f of xn given mu and sigma squared. If we knew sigma squared would be 2 pi to the minus n over 2. You've got a square root coming in there and it's being producted n times sigma squared, and we will leave it in the form of sigma squared to the power of minus n over 2. Again, I'm just splitting these up because it will make life easier later on. And then e to the minus 1 over sigma squared. This is going to be a common factor. And then when I put a product inside an exponential, it becomes a sum. So the sum from i is 1 to n of xi minus mu to be squared all over 2. So I've taken out the minus 1 over sigma squared from here. So that is an alternative way of writing my likelihood function, thinking about sigma squared as being the parameter of interest. So then I think about the form of an inverse gamma distribution. So with an inverse gamma distribution, and if I was to think about what you would normally write it in, say x or y, we'll start off by parameterizing it in terms of y. So we have our inverse gamma. And if we were to say an inverse gamma distribution in terms of y could be written as f of y, alpha and beta being your parameters, is beta to the power of alpha over the gamma function of alpha times y to the minus alpha plus 1. And then e to the minus beta over y. Okay, but we are more interested in sigma squared. So here we're going, when we're thinking about our prior distribution, we're going to write it in terms of sigma squared. And so instead of our y's, we'll have sigma squareds, because that's our random variable. So it's beta to the alpha over gamma of alpha sigma squared to the power of alpha plus 1. So that's minus alpha minus 1. Either way works, on its beta over sigma squared. I immediately start to see the connections. I ignore my constants, which in this case is this for the prior, this for the likelihood. I see I've sigma squared to the power of something, but negative, but that actually doesn't really matter hugely. And then I e to the power of minus 1 over sigma squared times something else. That is good news. So I'm going to say, let this be the prior. So then I recall that my prior times likelihood is proportional to my posterior. So I ignore the constants of, this is just for convenience. So this is for my prior. The constant is v to the power of alpha over the gamma function of alpha. And then for the likelihood, it's constant with respect to sigma squared. That's the important thing to remember. The only thing in this that doesn't depend on sigma squared is my two pi to the power of n minus n over two. And then I multiply my prior, my likelihood ignoring these constants together. So I get the prior times likelihood is proportional to, right, sigma squared to the power of minus alpha plus one, e to the minus beta over sigma squared. And I multiply that times sigma squared to the power of minus n over two, e to the minus one over sigma squared. The sum i equals one to n xi minus mu to be squared over two. I group like terms together. So this is equivalent to sigma squared to the power of minus alpha plus n over two, because you've got the minus out here. So it's minus alpha minus n over two minus one. So when you've got a minus grouping them all together, it works like this. And e to the power of one over sigma squared times beta plus the sum of i equals one to n xi minus mu to be squared over two. And you'll see that this, and this is proportional to the posterior. So then I would like to be able to work out my constant of proportionality without having to do integration. So I look at the form of this. Well, I have sigma squared to the power of minus something and then e to the power of minus one over sigma squared times something else. Oh, look, if I look here, I have sigma squared to the power of minus something times the exponential to the raise to minus one over sigma squared times beta in this case. But at the time of something you say, this is the same form as an inverse gamma. So this is an inverse gamma with parameters. Instead of alpha, we have alpha plus n over two. And instead of beta, we have beta plus the sum of i equals one to n of xi minus mu to be squared over two. So then I can actually formally work out my constant of proportionality by replacing beta with this part and alpha with this part throughout. So I get my prior distribution, give my data, I'm reading and knowing my mean in its formula in here is, right, beta plus the sum over xi minus mu to be squared over two and that's technically minus one to n. All that to the power of my first parameter, which is alpha plus n over two divided by the gamma function of alpha plus n over two. Then we have sigma squared to the power of minus alpha plus n over two plus one. And then we have e to the minus one over sigma squared times beta plus the sum i equals one to n xi minus mu to be squared over two. And that shows that the inverse gamma is a conjugate prior because it has the prior, that the posterior is being the same form as the prior of a normal distribution with known mean mu but unknown variance sigma squared.