 Ok, we can start. Michele heroically is going to teach during the Italy soccer match, this first time in history, and we have the fifth lecture about the gravitational waves. Thank you. So, if something happens, tell me, ok, if you see it on your phones. And you'll notice it's casual Friday, ok, so, I expect you to be relaxed. So, we've been talking about data analysis and detection. So, I'm inspired, I must say, by Shirley's interactive lectures. So, I wanted to do a little work and we're not quite at that level, but I'm going to be asking you things. So, you know, please stay on the ball and respond. Ok, so, I think you probably got an idea of how you look for gravitational waves in data, at least in the interferometric data. But, so, let's try to write down a few keywords, let's say, a few buzzwords that represent things that you need to do that are important to do that. So, what would you say? What are the important concepts in detecting gravitational waves and making sure you really see a weak signal in the midst of a lot of noise? Some words? Yes, just shout it out. Black holes? Ok, yeah. Sure, that's the source, that's what we've seen. I'm looking for things for how you see them, how you make sure that you've really seen a gravitational wave event. Yes? Again, sorry? Ok, that's the physical quantity that we're looking. Yes? Ok, so, that's the feature of the signal that you see, particularly not just seeing something that is periodic, but in that case you want to see a change in frequency. Ok, so we could say that then we're looking for something like a shape, a specific shape. Why does it help you to look for a specific shape? You can reject, right? You can reject everything that's not that shape. So, a noise in particular is kind of shapeless, it's random, you don't control the way it is. And we've seen this technique, match filtering consists of by just a simple mathematical operation of correlation, which is basically multiplication. You can kind of tune, you can kind of select out the very part of the overall signal that looks like a gravitational wave and it's not noise. So, a more mathematical way than to say this could be that we're using orthogonality. So, and if we do a little time frequency diagram, there are different ways in which we could use orthogonality. For instance, see, we didn't speak too much, but one looks for bursts, burst of gravitational wave bursts. In what sense are bursts orthogonal in the time frequency plane? They're localized in time, yeah. So, and if they're going to be maybe somewhat broadband, so a burst would look something like this. So, you can write an algorithm that uses just this criterion. So, you're computing your time frequency diagram, your spectrum as you go and you're going to look for excess of power for just bright spots and if you get a cluster of bright spots that is localized in time but not as a sign of frequency, that could be a candidate for a burst. At the other opposite, okay, 90 degrees off it, one can look, say, for signals from a tumbling neutron star in the galaxy or from a neutron star that has some non-ellipsoidal, actually symmetric mountains on its spots. So, for that you'd get a gravitational wave again twice, the rotation frequency of the neutron star, what would that look like in the time frequency plane? Yes? It would be a periodic signal, okay, essentially monochromatic, okay, essentially pure quadruple. So, it would kind of look like this. Now, a chirp is the one we actually saw, it's more interesting, what does a chirp look like? A chirp looks like this. But in practice it's a little more complicated to get yourself orthogonal to a signal that's curved in time frequency and so on. But basically that's what it is, that's what much filtering is. And in all these cases, we select something and everything else we try to throw away. So that's why it's called filtering, it's matched because you matched the waveform but it's still filtering and by filtering we mean removing something and zooming in on something else. So, very qualitative picture but keep this in mind. Okay, let's look for some other keywords. Apart from this, looking for signals on what's important, what has been important in this detection and actually has been necessary in claiming that it was, the LIGO thing was actually seen and true. Yes? Second, sorry, noise? Noise removal, of course, yeah. You want to be in an experiment that's as much isolated from, and you want to make sure that noise is readable. Sure, so we'll call it, let's call it noise characterization actually. Because some you can remove, if you cannot remove, you need to understand it to characterize it so that for instance when you have glitches, you can just toss out the data. Something else? Screen, please. Two detectors, right. So, yeah, that was the requirement. Anything that you see in a single one may be due to noise that you don't control. If you see it in two, it's you square, right, that probability. So I'm going to call this coincidence. Okay, and there's a last one that kind of goes with coincidence. So say, you see two things in 20 interferometers, but the masses don't match. Actually, the search pipeline, the way it's implemented in LIGO would actually reject that. That wouldn't even come as a candidate. So let's call it consistency. Okay, so I think between the four of these, it pretty much describes the, at least, theoretical data analysis approach to the signals. Two things that are a bit on the other side of this. One is this issue of knowing in advance what you're looking for. Okay, so some have called it a problem of the expected versus an unexpected. And you have a continuum, right, of possible searches. You can search for the things that you already know out there. And you can search for those very well if you know exactly what they're going to look like. But if you do that, you're going to miss possibly the things you cannot imagine or the things that you don't know, I mean, the data. You go to the other side, and you go to the other side, and you don't know what you're looking for. You want to be surprised. So in some sense, you have to be less sensitive. Okay, so at some point, the criteria for signal is something I can convince myself is not noise and then can be a discovery. So that's the part that's most exciting. That's the part that would be also harder to, hardest to achieve even. But let's hope to be surprised at some point. In astronomy, there were so many surprises. But in a sense, you know when you see an astronomical source, right? They're all the points of light. They're all birthed to photons and so on. They're all localized and so on. So even something you don't know in astronomy is it's usually clear that it's a signal. Although sometimes it can be a microwave oven. If you follow the story of fast radio bursts, fast radio bursts are the current right puzzle in radio astronomy. They're seen, they come very fast. They're very short. They're very strong. They could be a cosmological origin. But the subpopulation of them, see in Australia, was traced down to the microwave radiation from a microwave oven where people were opening the door too soon. Right? It let you open, when you opened the door, you switched off the circuit but there was some transient. So that's, if you do that, if you, you know, when your popcorn is done, you just press on the, to open the door and you don't first say off, you're doing something wrong with regards to the astronomy. Okay. So what do I have here? Oh, the other thing I want to see since we're just chatting, the other thing was a false alarm. Okay. So these two detections were seen with these incredibly small false alarm probabilities. Less than one in 600,000. So that, less than one in 600,000 years. Okay. Way more than five sigma. If you, actually, no, that's five sigma. But even more than that. I think that's very appropriate for, you know, a field and for detections where there were this, this initial original scene in the 1960 of claiming something very big and finding out it wasn't true. But if in a year or two, we're at the point where we're seeing, you know, a signal in a week or 10 signals per week. At that point, the value of the individual detections is in that they build the population and that then you do an inference from the entire population about, say, your population of black holes and so on. So I would say that at that point, you could probably drop this false alarm requirement to something like maybe one percent. Okay. And that, that would be consistent with how things are in other branches of astronomy or of science generally. For instance, even the Kepler, you know, the Kepler catalog of planets, there's some fraction there, probably, I don't know, probably more than one percent. But that, that are actually false, you know, false detections from some, some kind of foreground binary, for instance, and so on. And that's not a problem. It's a source of error in your population estimate. Okay. But you can probably take it. So, so in a sense, this is prompted a bit. We were, we were, we were discussing this intermediate, this thing, this, which is not quite an event, right? The LVT 15, 10, 12, which has a 90 percent probability of being an event. So it's, right now, it's not worth the paper. It's not worth climbing, claiming as a gold-plated event. But, you know, in two years, if we have a hundred and ten are like that, you can probably throw them in and use their parameter estimation. Okay. So any comments on this? Yes. Okay. So, and this slide was, we talked a bit about, you know, the match filtering algorithm and so on. These are kind of like the real-world aspects to it and how, how you, this thing is actually implemented in the reality. So of course, the detector doesn't just give you a, without, without lots of words, doesn't just give you H, the gravitational wave strain. You have to calibrate it. You have to condition it. Then you do this filtering step, where the filtering is, is done by the correlation, weighted by noise over a large bank of templates. Then you request co-incidence and consistency, as we were saying. So you want to see the things happening, the two detector at the same time and that their amplitudes, the sky's position, masses are compatible. That's somewhat jargon applied data quality cuts, but it means that, since you know the noise in the instrument, you've been monitoring all those other channels, magnetometers, seismometers and so on. If you see that they are turned on, they are excited and you understand that there's a coupling based on your study, that there's a coupling between the shaking of something and a possible output, possible signaling in the, in strain, then you throw away that the data, then there's the step of estimating statistical significance, which is done with this empirical background estimate, the time shifts and so on. Then there's a follow-up. Follow-up is actually a long checklist, which is probably something like 30 items or more, somewhat formalized, where you have a candidate, you know when it happened and then you go through all the steps, some are just trivial, the interferometer was in good operative mode, but there are lots of things to check, pieces of the experiment, inspections of, I don't know, the way that the template bank was excited. So that's, at this time, it's a very human and time consuming process, which probably took two months after September, but it's what eventually leads you to claim that action. The other branch of that get upper limit was the, what we had been doing for 15 years before this, which is, okay, you have some triggers that are not, they're not strong enough or convincing enough to be detections, but what you do, you say, okay, that's what we saw was the highest level of, the highest rate of these events that could be there and be consistent with these triggers. So not a detection, but an upper limit. Okay, so let's move on a bit to parameter estimation. So what is parameter estimation? You know you have seen something, you want to also see, you want to know what physics it has, so what physical parameter it contains. And parameter estimation is basically the idea that you have a variety of different shapes, you have a parameterized family of waveforms. In this case, it's a simple burst. The simple burst has just three parameters really, but I'm going to fix the amplitude. So it has a frequency and it has a characteristic damping time. So now we're given noisy data, actually we're convinced that it contains a burst like that because we saw it in a burst search, time frequency search. And now we want to see, we want to know what the parameters of the signal are. Yes. The what, sorry? The previous night, I just showed a few plots. The five plots? Yes. Oh, so those were the same, waveform in the same family changing the value of those parameters along the two. So I think that's what you asked, right? So just here? Okay, so, yeah, I didn't say it. So that's, this is a burst, a sign Gaussian burst, technically, or the form up there. So it's a sign multiplied by just a square exponential. And now let's, at the middle, let's just, I just picked some central parameters. And now what happens, if we move, if you change the frequency going up, right, increase the frequency, that's the way that it changes the waveform. If you go down, it changes in the other direction and so, so, and the same for the damping time in the other direction. So you should imagine this as a continuous family, of course, right? You can change it a little bit and then you get to that level. But just to say, and the, the effect of the parameters on the waveform in general are going to be correlated, right? So sometimes when you look at the shape, you'd be able to, to absorb some of the, the change in, in one of the parameters with another one and get the same signal roughly, right? Not here though. These are pretty orthogonal. Okay, so, so again, to just the question. So you, you given that noisy signal, it contains a burst. How do you know, how do you know which of these shapes it contains? Can you do subtraction, right? Because I can only do a subtraction, addition, subtraction, multiplication in, in this, in my simple data analysis. And you just ask, so those are residuals. Those are the signal minus the putative, you know, true signal that are going to be in there. So which one is it? You can actually, you can't really tell by eye. But, but what you do is just you see which one of those has the smallest squared integral, the smallest power. We're working with wide noise again here. So all frequencies come in equally. So it's actually the one at the middle. Okay, it's the least noisy. And the actually if you, if you, if you really do this game, you don't quite get the right one. Okay, so the, the, this procedure in the, I actually did it here for that simple signal. So I injected this red sine Gaussian into noise. And then I tried out, I optimized the, the match, the match filter or the subtraction over all possible sine Gaussians for around it in terms of parameters. And the blue one is the one that maximizes, that minimizes the residual. But it's not quite the true one. So why, why is this? Paulo cannot answer. You're not in the school. It's very simple. Maybe I'm using confusing, you know, confusing language, but it's actually very simple. Why, why, yes, on the detector threshold. But sure, yeah, I mean, the signal is not hugely strong. But why am I getting the wrong parameters? Yes. What is the error? Yeah, so, so sure. Okay. That's a little more complicated to do. So let me do it next. But sure, there's an error. But the point is there's an error. Okay, so there's an uncertainty. Yes. Is it a discretization? No, I actually did it continuously. So you could have that when you, if you do just a template bank and simpler, just think simpler. I mean, I, I, I must have confused you. There's noise, right? So there's noise. It's a noise signal. So some of that noise will look like one of the signal will look like actually will look like the difference of one of the signal. We're projecting the data. Yeah, let's show it like that. So let's be very abstract. And let's say that we work on the manifold of this familiar signals. So there are two coordinates here, this alpha and this F. And now every point on this manifold represents one of those signals. Is this clear enough? So I can take a point like that and it becomes one, one thing. Now, what is a, this is a two dimensional manifold. But that signal that I measured there, the output of the detector is a point in a much larger space. You can imagine a space of all possible signals that are, you know, eight seconds long. And so it's a, I don't know, the sampling time is, let's say is 4k. So that's going to be, it's going to be 32,000 points to describe that. So that, that's a much larger space. It's going to be R, right? It's roughly R 32,000. The important thing is that my, this manifold of signals, the pure ones, the perfect one, is embedded in this larger one. Because all of these shapes are a point there. What is the measurement? The thing that I measured is one of these plus noise. Okay? So noise moves me away from these manifold of perfect signals. And then when I, I maxima, I minimize the residual, what is that in this kind of geometric representation? It's projection. It's, right, it's literally projection down to the signal. So I make an error. Why is this error? Because the, the noise that was added has a component that's tangential to this manifold of known signal shapes. So there's, you, you will always get that. Okay? You, it would be smaller and smaller if the signal is very, very strong and noise just does a little bit. But noise, since, you know, it comes in all frequency that it can affect all the point will always a little look like a true signal and will give me an error in the parameters. Another way to look at this is that since there's noise, we're going to be overfitting the problem. Good. So this is the intuitive picture of parameter estimation. There's really not much else to it. It gets a little more complicated in practice because as I was showing you, the noise curve of the detectors is, is not flat. There's a, there's a band of good, good, good sensitivity, which means that for instance, if, and I told you that on the higher frequency side, usually noise comes from how well you can sense the position of your test masses and the low frequency side, it comes from just the, the fundamental, you know, motion, Brownian motion of or, or perturbations on, on the, on the state of the, your reference masses. So sensing and acceleration. Now, since it looks like that, things are a little more complicated than what I showed you. So for instance, if I take this, the same realization of noise for Lisa. So this is kind of, those must be days. I don't know. I don't know if there are seconds. And I put three different signals in the same noise. Those signals actually all have the same amplitude as gravitational waves. But you can tell that one of them is going to be easier to see that the other two, because the second row down there, I'm taking just the, the Fourier transform of signal and noise. And you see that in two of them, the signal is kind of immersed in, in the noise. So the fluctuations of noise are going to, to be able to obscure the signal or to confuse it. But if you're in the middle, I don't know what I did. This noise is not quite a curve like that. I probably did something more square. Because you see the Fourier, Fourier transform just goes to zero, stays zero and doesn't, goes up again. But anyway, the signal in the middle is the one that's going to, to have a, to be a better contrast and a higher SNR. And yeah, I even show it, I guess with those rounded dots up there, that it's a little bit above the noise. Okay? So there's a quantitative way in which you plot, you can plot the, the strength of a signal with respect to noise, which lets you read off the SNR from just that kind of sensitivity plot. So we need to reflect this, this aspect that noise is frequency dependent in the kind of data analysis we do. So we cannot quite just do a simple subtraction. But you need to do some weighting. Okay? And so to do the weighting, you need to understand your noise. So the spectrum, at least in our business, is the main characterization of noise. So the idea of noise that one needs to assume is pretty basic and idealized that noise is a Gaussian process. It's actually colored Gaussian noise. So let's try to then write the probability distribution for it. Where did I put my eraser? So if I have a Gaussian random variable, let's call it x, what is the probability distribution for the variable? It's, yeah? Shout it. It's a Gaussian, sure. So p of x. You need to know what it is though. Okay, so it's some kind of normalization. Actually, we could even write it 2 pi sigma squared e to the minus x squared. Okay, now we're going to make this a little more complicated. If I, yeah, so if I draw, if I sample one of these every second, okay, and they're all the same and I do it for n big n seconds, what is the probability distribution of the entire time series? Yeah, the time series of drawing one random variable every second, for n seconds. What is the probability distribution of all of them at once? Yes, so since we're smart, we're just going to do it at exponent directly, right? Something like that. And you can even generalize it and then, okay, this becomes 2n, I guess. You could even generalize it and let each of them have a different variance if you want. Okay, although you'd have to fix this normalization thing and put a multiplication there, but no worries. Okay, so let's get more sophisticated. Let's go to the frequency domain. Okay, we can do a frequency domain version of this and let's ask, we're going to describe this sample signal xi for i going from 1 to n. In the frequency domain as the probability of the, let's call it the x tilde. Okay, so this now are the frequency domain components of the same signal. So, then what does the probability distribution look like in the frequency domain? Gaussian again, okay, so it's a need to something and what do I put up here? It's going to be, so you think about Fourier transform as something called Parseval's theorem, right? It roughly tells you that the sum of squares in the time domain is the same as time squares in the frequency domain, so it's going to look like something like this and again some normalization factor here. Okay, so they're really similar, but the interesting thing is that if I'm describing noise that is red, okay, so that is that low frequency goes up with with lower frequency goes up, what does that look like in terms of just the samples? So, red noise looks like something like a slow random walk. So, a slow random walk like that, that's what is very correlated one would say. So, if I tell you what the value of my sample was here, I can tell you almost exactly what it is at the next step. This kind of description cannot represent a random set of random variables like that, because these are all uncorrelated, okay? There's a product of probabilities, you can do them one by one, and they all don't know about their neighbor. Instead, this kind of behavior is represented very well in the frequency domain. You have to make a strong assumption, which is technically stationarity, so that the kind of like the random process keeps looking like looks like itself, also as time goes by. But the way you describe something like this is simply by changing this PI and making it large for low frequencies, for the Fourier components that have low frequencies, and making it smaller for high frequencies. If we look at the very blue, okay, spectrum, you go to the other side, that gives me a very fast oscillation. This thing would not be very correlated on short time scales, and again, you'd see it like this, you see that a lot of the power is concentrated in this fast fluctuations. Okay, so I'm making, I'm getting there a little slowly, but the point is that we're there almost. This quantity here is analog, you see, to the variance in the time domain, so this is actually just the power spectrum, okay? The power spectral density, yes. How correct it is in reality? That's a great question, so, and many answers to that. One is you can go to the experimentalist and you can tell that you need to do the best job possible in giving me noise that Gaussian. And the obvious things that they can do wrong, they will give you noise that don't Gaussian. For instance, if you're saturating a measurement, right, a photodiode, if you introduce some kind of transducer that's not quite in the linear range, but becomes non-linear, at the output of it you're going to lose the Gaussianity of what you have. So that's one thing you can say if you're throwing the problem to the experimentalist. Then another answer is that I don't care because Gaussian is the only thing I can do computations with. Okay, so that's a very few theoretical, I'm joking, it's not. That's a good answer in a sense, because it means, however, that you have to be careful. Okay, so you're going to proceed with what you have to do calculations and you're going to remember that it's only true in some idealized sense and that you need to keep checking yourself. So these things, consistency, all those things, coincidence, consistency, and what was the other one. And okay, data quality, data cuts, all those details are because of this. If you really believe that your noise was Gaussian, you wouldn't need any of those. You could do just a single experiment and then when you see something that's strong enough, you know that this is right, probability of false alarm, and if this is right, once a scenario is 20, there's no question. You don't need two LIGOs. Okay, the probability of getting a trog is infinitesimally small. But since it's not, you have to be more careful. And in practice, when one does that empirical estimation of background, you should see a distribution like that. Those triggers just due to noise should go down in log-log space exponentially, but there is some tail. And that means that the theoretical instrument that you use is not quite, doesn't quite respect reality and you need to do more. So best question ever of the day at least. So very well done. So, where were we? Here. Okay, so then we're down to that formula. Okay, so the probability of noise is a product of those exponential Gaussian forms. Exponentials for the noise components in the Fourier domain, you can turn that into an integral and that s there is just the power spectral density, which is what I can measure from the experiment. I can also try to predict it based on all the physics that I have. Short noise, thermal noise, seismic noise at very low frequencies. The good, in practice, you know, that integral is not hugely sensitive to what you do. You get it a little wrong also because you're doing all these checks and so on, you're more or less to the right thing. But so that's perhaps the central formula. This p of n is the central formula of all my business, all our business in gravitational wave parameter estimation because because now I know what the probability of noise is. That means that from this I can also derive a probability of the signal because what I measure is equal to whatever gravitational wave signal is in there plus noise. So that means that the probability of a certain h to be in here, so the probability of a signal h gw, given s, the data let's say in LIGO, equal to what? The probability of noise LIGO equals s minus h. Or again the subtraction that I was showing you. So actually today I don't know if I'm being so trivial that I bore you or very profound, so maybe you know if you really think about it it's profound. And therefore it is going to be proportional to e to the minus, again s minus h, s minus h, over 2. You can look at this as the likelihood of the I wrote this backwards or the basions are going to be kill me. So this is the probability of observing the data given the presence of a certain signal. So this is the likelihood of the data. But I can turn it around using Bayes theorem because what I really want is the probability of my vector of parameters that describe the signal given the data that I have observed. We use the law of compound probabilities. So let me multiply this by the probability of observing the data that I have actually observed. This is a little confusing to think about because after all I saw it but there is an a priori probability of seeing it. And you can turn this around and write it as s given the parameters times the probabilities of those parameters occurring. And now I'll take this and put it under here. I think I have this in the next slide actually. And this piece is just what we derived. So this is based on the probability of noise being given by the difference of data minus the model effectively. And what is this piece here? So who's familiar with Bayes' inference in this audience? Heard about it at least. Just one. Come on. Three. Yes. That would be the prior. So why do you need this? You need this because say you're estimating the you see a gamma reversed and you see also a gravitational wave there. So it's going to be a binary of two Newton stars. Let's say we're in two years this is our business. You start doing this game and you estimate that the most likely parameters for the two masses are three and 1.4 or three and two. So should you believe that? Is that an acceptable response? Well it may be a new discovery that you find an overweight Newton star or so on. But probably if you if you really assume that they're only Newton stars probably just the noise projected onto the signal in such a way that it made it look like it was a heavier black hole. But you should be able to use that information that Newton stars only come with smaller masses. That's in a probabilistic sense where it goes in. Okay in this kind of prior. What's another prior? Another in gravitational observations you may have priors on certainly the masses. Although we don't really know very much about the binaries so it's you don't want to put too much in there. That's really what we want to discover from what went into the sense of gravitational ways. How about the prior in distance? So what's a good prior for the distance of binaries? What's the simplest one? The simplest reasonable one? I think we're here at the earth. You know solar system. You look at yes. Say again sorry. 100 megaparsec. That's one value. Okay so that's a likely possible value but what I want to know is I'm expecting to see a number of binaries and I'm going to when I get them. Okay let's say we're we have godly powers and we will know exactly where they are. They're going to have some distribution in distance. Okay I want to make a histogram here. What will it look like? Say again. Let's let's pass this back. Yeah I don't know if we do something or they're just they see us and then they turn it on. That's my must be heard is yes. Poisson distribution. Poisson distribution. Poisson distribution is what you get in time. Okay for something that that happens. It's the correct answer if we're doing the prior for t. Actually not quite t for the number of events you get in a definitely event but I'm looking at distance. Okay so what's a try again. You get a second try. Yes? Exponential. Why? May I ask? Oh okay sure that's because the galaxy looks like what does it look so minus rho squared sin hz something like that right. Good answer except we see much farther than that. Yes right. That kind of takes into account detection already. So which you don't usually need to do unless you're working really at the edge but yes. But just a little simpler than that. Okay so let's say you're going to see we look far enough that the universe is homogeneous after all we're all cosmologists. I'm not but you are right in the school of cosmology. So if we don't look too far actually cosmology is Euclidean right so just r squared right they're just on spherical shells. If you're going to include cosmology okay you apply cosmological so you do your transverse distance and and and so on so but it does go up like that but it's true that if then you can apply a you can apply a detection basically a selection effect or or and and then this thing will will go down eventually because they get too far to be seen. So that's correct. So these are all the it's all information that we have before we even look at the details of the we look at the details of the experiment. Some of it maybe information that doesn't even come from the strain from the gravitational wave effect. So for instance if we see a counterpart we see some lines I don't know what what's a good line in a GRB but you may get the redshift. Okay and the redshift is interesting because it gives you some of the same information as in the luminosity distance. If you fix your cosmology or it actually lets you if if you if you have a great detection you could even throw in your your cosmological parameters and solve for those as well from this because we have a redshift and a distance. So application of this to cosmology. So all this information goes into a prior. The prior goes into building this posterior probability for my parameter and the data. Okay so what is what is this formula? Base theorem okay this base theorem uh base theorem is one of these cases of you know taking the credit maybe a little too much because Laplace had that already probably Bernoulli even the the early people who are studying probability could do a compound probability like that uh base kind of brought more uh a little more philosophy to it the idea uh and really the interpretation so it's not so much that he had the theorem his idea was you know something already that's a prior you have some knowledge of the um you have some knowledge of the um the universe or some physical system but then you observe it and your observation lets you update okay your prior knowledge to a posterior knowledge that's why it's prior and posterior okay before and after the ingredient in between is the likelihood okay which is an expression of the probability the relative probability of the data as a function of your source parameters uh given your model the the real basions always had uh like like a suffix here comma i or comma m uh to point in in all of these probabilities to point out that all of this is within a model okay your your and that model is an additional assumption that may even come with its own probability you may be working with multiple models give each of them a different probability and uh so state your assumptions right but uh we almost never write it because it's uh one has a model that they like and yes okay so that's a very good question so this p of s is uh known as the evidence or the marginal likelihood or it's marginal because i can actually write this as something else it's there on the screen also but what is this p of s is actually the integral of the numerator overall possible did i put yeah overall possible values of the parameters so what does this tell me in terms of estimating the parameters so finding like the distribution of this posterior it doesn't really tell me much because it's just one number so it's just a normalization so if i can just compute this then i know the you know i know where the peaks in parameter spaces where how parameters are correlated with each other and i i don't have much use for the evidence here i can compute it to that you add the fact that it's usually very very hard to compute this integral because usually we have a large number of parameters and it's it's hard to explore them in a way that gives me an a inaccurate evidence however the evidence is kind of interesting if you're considering multiple models okay so you may say for instance you may say we're analyzing the posterior parameter distributions for an inspiral of two black holes in general activity and we do the same job for an inspiral of two black holes in a massive theory of gravity in a massive graviton theory of gravity okay so those two things you do separately but for each of them you can compute the evidence and then you can compare those two evidences and that compare that's called model comparison you're comparing two different explanations of the data that you observe and possibly two different sets of priors okay and those two numbers those two evidences are directly comparable so you take a ratio you call it the base ratio or announce ratio and it would tell you whether one of the two models is preferred with respect to the other one the problem with that you know we we were talking with that at lunch we surely is that nobody really knows what the number means so the ratio ratio of two posterior so you look at some of these books on probability and inference and they tell you okay if it's one the ratio is one then neither model is preferred by the data okay that's reasonable if it's five one of the model is moderately preferred if it's 50 one of the model is strongly preferred but those are just labels that this you know this statistician who wrote the book chose to put there they don't really mean much unless unless you know for instance how likely those models were were in advance and in physics that that's really really difficult to to do in the book in a textbook you could come up with little experiments like saying okay i have 10 coins one of these coins is is a trick coin that comes out heads all the time okay now i pick a coin at random i flip it five times i get all heads okay so now i'm going to do model comparison right for the probability the i should do something more complicated if i have to have some parameters there but i can do model comparison and i can come up with a ratio that tells me how much more likely one is than the other one okay if you control everything like this and if in particular you know that one out of ten was a trick coin then you can give a more interesting statistical value to this base ratio it will tell you something it will tell you something about if you do this game a hundred times it will tell you how many times you were right not quite the ratio but you can work it out from that in nature i don't know i think we have only one universe although the landscape people tell me differently so in nature what odds ratio are you going to give to ice and str compared to a massive graviton theory it's an impossible question right it's i would give it actually a very good very good odds on g r compared to just because it's so pretty and and it's been confirmed to a party you know 10 to the 13 in many aspects and so on and for that reason you know this is something that people compute but it's it's hard to interpret and unless it's huge or very small okay 437 that means i have still 25 minutes okay so um i will skip this this i had a very nice i think historical presentation of of mont maco chai montecarlo which is the way that in practice we do all this parameter estimation that we do the exploration of these the postures and the life routes over many parameters um i will tell you that the name montecarlo comes actually from a solitaire because uh ulam in 946 was sick at home and was playing solitaire and he came with this idea that he he couldn't win so he wanted to compute what was the probability of winning at this particular game and it was too complicated right it involved too much combinatorics and so on so he actually came up with the idea of one of these new computers how about we program it to just play solitaire many times a hundred times a thousand times and then um and then empirically figure out what the probability is for winning okay so that was the original idea he went to he told for no man about it and for no man said oh we can do something we can do nuclear physics with it okay we can we can do reflection and uh there were los alamos then um who's that guy looking very grim i must say is anybody nicolas metropolis okay so that's nicolas metropolis was a computer builder at los alamos back then and he had built this thing called the maniac and these guys are rosembluth and teller and they wrote this super famous paper in 1953 this is the the birth of the metropolis algorithm uh where the idea is uh and it's the paper is nice because it's there are two tellers and two rosembluths in here so there are two husbands and wives in the in the byline and metropolis i think mostly you know gave them a computer although he's the first total but anyway the idea to this paper is mark of chain Monte Carlo not just Monte Carlo the idea that you're sampling a statistical distribution something like e to the minus e over kt that's very hard to do if you're just drawing random samples random states because most of them will end up having small energies and therefore they're not very useful in the distribution so the turn here was how about we draw the states already with the right probability well that's the hard thing but the genius was to come up with a um to come up with a stochastic way a randomized way to to generate states according to a certain statistical distribution which is the metropolis algorithm which involves proposing a state and choosing it based on a on a rule the metropolis rule and let me skip this why this work but this is the way we explore parameter distributions in the ngr there are so many parameters we cannot afford to look along all axes for multiple values we do wonder randomly in the in parameter space but we try to do it as as smartly as possible we use with this macro chain Monte Carlo methods which are random walks guided in such a way that they realize the probability distribution you need and that's always that I think is is awesome because it means that statistical physics solve the mathematical problem and it does so in a way that's independent of the number of dimensions but in practice it ends up being slow all the time you can never do it if you just do it naively it can never work well enough so there are lots of complications and lots of sophisticated variants on it for instance Hamiltonian macro chain Monte Carlo is a very useful one and a very smart one where you're kind of you're not just doing a random walk anymore but you have effectively a Hamiltonian motion so you're solving the the Hamiltonian equation of motion giving some random kicks and that leads your walk to be much more useful but I'm pointing out here two algorithms and software packages that have been all the rage in the last few years one is this MC so it's an a fine invariant sampler and one is multi-nest okay so that that's these are used actually in LIGO in the LIGO business a fine invariant is is an idea to avoid having problems basically with units of things often in your parameterizations you know the units the the distances in parameters parameter space are not quite commensurate with the real distances in signal space so you have a problem because you're moving too slowly along one dimension and you're moving too rapidly along the other one and and you end up having very losing precision these are fine invariant samples have a way of kind of like being invariant with respect to a fine transformation right so scaling and and rotations in parameter space so that makes them powerful multi-nest is not quite microchain it's it's another technique that is especially was a special device to to compute the evidence in fact and that but it's still it's still a randomized stochastic technique that involves throwing numbers randomly in in space okay so last ten minutes about testing gr with gravitational waves um so I I think this is a place where there's there's a lot of there's a gap between I would say the field theory and the theories of gravity and what we can actually do in in experiment and ngr and I think that we're still working with a very classical framework in terms of testing gr which is the diki diki framework discussed especially especially way by especially way by Cliff will in his book there's a a a green and a yellow version I don't know which one is new where but the idea is you start with newtons equivalence principle okay that the inertial and gravitational mass are the same um and you extend that that's tested incredibly well okay on the earth we know it's true to one part in a lot um to make gr Einstein extended it by adding invariance principles so local Lorentz invariance and also position invariance if you get that um you you pretty much have to have a metric theory a metric theory is one in which small particles test particles move along geodesics so it's a theory in which you have the equivalence of free fall and acceleration of apparent accelerations and gravity and also it's a theory where locally if you're in free fall the old the the old special relativistic physics works in as is with the same equations okay you can take max those equations write them in a local freely falling frame without changing them at all and then if you want to you know use different coordinates you you transform them and then you do general covariance but otherwise locally you know we freely falling physics looks non-gravitational at all um you go a little stronger however you you you get to the strong equivalence principle which is the statement that not just all masses fall in the same way not just all forms of energy such as you know the energy of the a little magnetic field falls in the same way but also gravitational energy also falls like falls freely okay that's harder to test and that's a very strong statement actually that is I think realized only by gr among the the classical theories of gravity it's tested to a part in ten to the four with the laser ranging of the moon okay of the reflectors that the astronauts left to the moon so dicky's idea to test gr was then you test these basic invariance principles and then you build on top of that metric theories that all achieve the um all achieve the same Newtonian limit because we know the Newtonian physics is correct in the limit of small velocity and small gravitational fields uh but then are different at the next order so you build what's known a parametrized post-Newtonian formalism which is Newton plus some potentials with with a small number of free constants with with that the invariance that you have assume and with the assumption of metric theories uh this is the only way that you can do it okay and you come up with this ppn has a number of parameters uh that you can measure then you can devise experiment to measure because they give you predictions for instance for the bang bending on light around a mass or for the gravitational redshift or for uh what's the the other big one or for the motion or in for the processional motion in um around the the sun for instance and these parameters gamma and beta are the ones that are usually the most interesting because they tell you um they tell you how much curvature is produced by mass and they tell you about the non-linearity of the gravitational field the other one do break some symmetry in a sense because they give you prefer directions or prefer positions um and it's um they're a little less interesting because we like invariance so now the problem is that uh you you know general relativity is tested by verifying that these parameters have almost or to to a to a small error just the values predicted by by Einstein um and now once in 19 what was it 78 or so on they actually observed the change of the period of the binary pole star that killed essentially all the metric theories that were different from g r because most of those either don't predict gravitational waves they have it in the wrong form actually i'm saying quite something a little a little different here so i should say gravitational radiation is predicting virtually any surviving okay metric theory given the quadruple formula that embodies lawrence invariance uh there are a few characteristics there of that are um just special to radiation okay there you don't see them in the post Newtonian potentials because they're not about the dynamics of bodies that they're about the propagation of waves and the generation of waves um so for instance you made a speck to have different polarization and different speed and a different effect a different loss of energy different shape shape in that the problem is that people were not able to nobody has been able so far to build the equivalent of ppn of parametrized post Newtonian for gravitational waves effectively the not not the principal framework like that so the tests that we can do are either against the completely different theory of which we don't have very many that are available or the kind of test of consistency of g r with respect to itself so you can test how well it predicts the fact that you see but you don't know whether the the difference if there's any is is explainable by something else okay and so yeah it's Copeland right it's away from that these are all tests that you can do just on the shape of the signal that you observe without having to assume anything about about an alternative theory of gravity so you can take different pieces of the waveform and and compare them you can see whether you after you've taken out subtracted out from the data the best fit that you have you have something left you can see whether the you know the shape the likelihood shape of is is what you expect also and these are all things that we can do and and we've done several of them i'm going to show you the one that i like the best okay this was in this paper test relativity which w 1509 and so what was done here is to take again the best fit waveform in the family of in spiraling binaries with spin that were used to to estimate the parameters take it out of the data and then what's left should be just noise actually should be noise less a little bit which is the overfitting bit right that was mapped into signal but that should be only noise and then we took that residual and we ran another search on it a search for a coherent signal or a coherent residual in it this couldn't be as an inspiral because after all you've taken out the inspired you don't expect inspired minus spiraling is not in spiral so but we ran a burst search for signal that you know was roughly it was signal like had the right polarizations in i had the same polarization in both detectors and had some kind of coherence in frequency and that search actually came up with an snr of seven okay so the residual could be mapped into a burst of a scenario seven so it sounds like a lot but that plot there that histogram shows you what happens what a scenario you get if you do this burst search over just lots of noise only realizations so you get seven just out of noise seven is pretty typical then what you can do is to say however let's say that this this burst that I've reconstructed for noise actually represent the error that I made in my template in in my signal so that it's entirely due to a violation of gr that's what left there it looks coherent maybe I could think up of a theory that actually changes the shape of my own spear in spiral in that direction um so then the nice thing that we realized and that I especially advocated for is that then what you can do is you can compute this thing called a fitting factor fitting factor just a single number between zero and one that describes how well two signals look like each other effectively and by knowing that the the actual detector signal has a at a signal to know what 23 and that this coherent residual at the an snr of seven you can say that the the the match effectively with the in their product or the two signal was greater than 96 that means then that you've confirmed gr using the full prediction of the waveform to four percent and in a sense is the best that you can do with that signal and it's a test that it's entirely doesn't depend on any other model of gravity and that uses the full information so since I like this very much that does it make sense to you do you have questions about this or it is some yes the case of no metric theory yeah no metric theory yeah I have no idea so if there's more violations with respect to gr you know it would still look like gr with some anomalies and so on I mean but otherwise I I sorry no metric theory for instance could be a month right this can this kind of sorry or tevez or or so all of those things tend to to have the conservative dynamics sectors somewhat mapped out but they don't necessarily for instance have a wave equation or so I I don't think that there's any that has a prediction at this point that can be compared to this or as so partly I know very little but partly I know that nobody's coming to us with the waveforms from from a no metric theory that we can we can compare so are we still tie tie no no the game zero zero one zero no for whom for italy oh good good amazing let's go okay so I have I have almost no time left but these are things that so go see that paper that that's a it's a it's a good paper it's it's the first one of its kind so you know he has some failings and some things that are less interesting than others but I think it's it's very very interesting it does attempt to do all this classical basic tests polarization speed and radiation reaction and polarization hopes polarization would be I told you I think on on Tuesday that if you don't impose quite the Einstein's equation in principle you can have six polarization in the waves the two tensors spin two transverse ones but also two scalar effects one transverse and one longitudinal and one vector so we try to see whether we could test the the polarization content of the signal and you cannot because you have only two detectors that are aligned so they only see one projection in effect so for instance with you you can do a test and you say if the wave was entirely scalar with the same phasing but was entirely a scalar which is it's not possible to do in any seriously in any reasonable theory but say then you could not distinguish that from having the actual you the actual two polarizations some things would be different position in the sky would be different and so on but so we need a network even to do this okay we need Virgo at the beginning of next year we need Kagra eventually mass of the graviton is something you can do graviton should not have mass but if they do they will change the velocity of gravitational waves which will be less than the speed of light and actually I was a little mind blown yesterday talking to Stefano liberati because he was telling me that in in most interesting theories the speed is greater speed of gravity is greater than the speed of light and therefore you have two cones and so on so that's not something we could we checked here but if you do that you modify the dispersion relations so the velocity as a function of the energy and that gives you a defacing right so this gravitational wave at this chirp in frequency you can see it as a frequency domain expansion just a sum of components at different frequencies now each one of those is going to have a slightly different velocity so that means with respect to what I to to to the GR solution that those components are going to be to be shifted a little bit and that's going to change the phasing of the signal so you can do that by doing this big exercise and adding one more parameter for the graviton mass or for the wavelength and you end up bounding it so that's the posterior probability distribution for it to to very small values less than 10 to the minus 22 wagon electron volts okay so then finally testing radiation reaction well this this does it actually a little more it tests the entire shape no it tests radiation reaction and the conservative effects dynamics affecting the binary at the same time by doing this a waveform model looks like that okay it's a sum in frequency space of powers of frequencies some logarithm with the many coefficients that are computed in post-Newtonian theory right the ones that that I showed you and some coefficients that are actually fit to numerical relativity waveforms so that entire waveform there is parameterized by physical parameters the masses the spins all real things so if you fix those there's only one waveform okay that you can do the idea here was okay how about we take those coefficients and one by one we let them free so all the other ones follow the the shape that they should have in GR but one of them is allowed to vary and we're going to estimate it with everything else so you add that to the parameter set so it's a somewhat formal exercise because if you do that you don't necessarily reproduce any alternative theory of gravity although you for some values you can some of those coefficient you would get for instance with the I don't know so some of these helicity what are those things called some of the string motivated modifications of GR that have a scalar field some of those do give you an additional term an additional power but anyway that's the that's what you get for the individual parameters where zero is the generative activity value and the the violin plots they'll show you where you bound them around it from the actually both signals both the september signal and the december the christmas signal so you know the interesting thing there is zero pn as well zero pn is the leading order loss of energy to the gravitational to the quadruple formula so you're doing that to maybe 10 percent it's not as good as the binary pulsar okay because that's been going for 15 years and it's it's time very precisely this one is a is only 55 cycle five cycle or eight cycles however the one pn and one point five point pn are tested to about 10 percent okay so that's already a correction to GR on the order of v over c squared and v over c cubed that's that's tested in a strong velocity regime and that's a that's a new a new finding okay that's a that's a new test for comparison this was the what you do there okay so i got to the end and we set this rotating and thanks so much for bearing through this week i i gave you only a small piece of it but i gave you the parts that i like the best so they're probably the most entertaining ones lots of good books reviews and so on and i think if you look just at this detection paper and companion papers they they try to cite the best you know the best papers and the best references for all the various aspects so that there were big fights over what to cite and what not so they're a very good place to start the bibliographic search if you need to and that that's what i do now actually well when i want to see something i say okay what did we cite for that paper so enjoy your weekend you know your travel back i'm i'm i'm glad we could be here this week it was awesome for me