 Hello. Let's start. And I will start with a repeating class slide from previous lecture. Well, last lecture, we went through leads, and now we're going to data analysis. And I do hope we will have 10, 15 minutes at the end to go through puzzle timing array. At least I want to give you an idea of what it is and what the main, what is the attempt and how we're trying to detect rotational waves with puzzle timing array. So on this slide, I just want to emphasize the main assumptions which went into likelihood. I hope I propagated the idea that much filtering could be the best technique to use to search for rotational wave signal of a known form. If you know how a signal looks like, that's probably you should use this information and try to dig the signal out of the noise. Likelihood. So for likelihood, we're making first assumption that your data contains noise and the signal. And second assumption goes into the model for the signal. So here we assume that our noise is Gaussian noise. I think one of the slides, yes, I did. I showed you that noise is not necessarily Gaussian, but on some scales, actually it is. And that's the best what we can do right now. And for the special cases where this does not work, I will come to this as well and I'll show you what can be done. So what likelihood tells us if you subtract signal exactly from data, what you should be left with is noise. You don't know parameters of your signal, and therefore you're trying to subtract many what we call templates. And I also use different letters H as stands for the signal, but H I'm using as the waveform model. It's our model and I will call it template or waveform jumping between these two words. And the key point also here is this inner product. This inner product, well, it's actually this inner product. It is a product between data and the template. It's a correlation. And we also introduce the weights of the inside this correlation to show that our detector is not sensitive equally at different frequencies. So in other words, variance of noise at different frequency could be quite different. In case of LIGO Virgo or LISO, it is higher at very low frequencies and at very high frequencies. And the best is somewhere in the middle, yes, please. There is no coupling between noise frequencies. No, it's basically our assumption if there is a coupling between frequencies, but it's very weak usually and we ignore that. Right, so what you're trying to do to estimate the parameters one of the way is to minimize the likelihood. So you're barring parameters of your template. You subtract it from the data and see which parameters maximizing likelihood. And this estimator of these parameters, this frequentist approach in the detection of the signal in the noise. And this estimate which maximizes your likelihood called maximum likelihood estimator. It's not necessarily equal to two parameters of the signal which was in the data. And the reason for this is, well, the main reason is that the noise corrupts your signal and therefore it's slightly changing its shape and therefore what you're recovering not necessarily will be true parameters. How close it is to these parameters or to true parameters? How close your maximum likelihood estimator to true ones depends on how strong your signal. How much actually noise could influence your signal. So the stronger signal, the closer this estimator to the true value. And also it's unbiased estimator in the meaning that if you take the same signal but many different realization of the noise and you average your estimator over the noise realizations, it tends to true value of the signal. Of course, quite often in our measurements we don't have many realizations. We have to deal with one. And, well, that's what you get. And I want to bring one more point. That's one of the reasons why I put different letter. So actually it is true. What we're using is not in the modeling gravitational wave signal, it is a model. So it has some inaccuracy. How big or how small? There's a big question. And what we actually, our model gravitational wave signal is not necessarily true gravitational wave signal. Even numerical relativity waveform. It has some intrinsic error due to numerical methods. It's the best what we can get, but nevertheless it's not necessarily true. So when it is not true, you can ask your question, what are the parameters which are minimizing correlation? Or let's say in this sense, imagine that you have data without noise, just a signal, and you're trying to maximize likelihood. In other words, you're minimizing this quantity. And you can get the parameters which are bringing h as close as possible to s. And the difference between true parameter and what you have found from your estimate, that's called bias. So this is because you have inaccuracy in your model. And the best estimate of your parameters might not necessarily be true, but with some shift. And I want to emphasize one point that this bias does not depend on signal to noise ratio. Whereas this error does. And at some point, for instance, for Lisa signals are very strong with signal to noise ratio, thousands. This error might dominate your ability to estimate accurately parameter, or for instance, making tests of general relativity, because your model is not accurate enough. Nevertheless, even if you have bias, it might not prevent you from detecting gravitational wave signal. So that's what is called effectiveness. You might create model or template which looks like a true signal, but on expense of the bias. So you might use different parameters. And for different parameters, this looks closer to the true signal. And that's what is called effectiveness. And it's actually used a lot for the search. If you really want to ask a question, how similar your model, gravitational wave model, or any model of your signal to the signal, we introduce faithfulness, where you compute your model at the two parameters. And you compare with something, well, we don't have real signal in a way without noise. But you can compare it, let's say, to numerical relativity data, to model which obtained to the best of our abilities to model gravitational wave signal, solving Einstein equations exactly numerically. So here, you can see I put in denominator something, and that's what makes template normalized. And usually, this quantity is called also overlap. I just want to introduce the jargon. I might not use it, but nevertheless. So if you read the papers, you understand what the faithfulness with is overlap. So overlap is computed between, let's say, signal, or one model and another model. And the both normalize, so overlap varies between minus one and one. One means perfect match, this could be called complete align. And the minus one, it means that you forgot the sign somewhere. Yes, and it's usually related to the loss in signal-to-noise ratio. So if you don't model very well, you also start losing. If your faithfulness is low, you might lose in the signal-to-noise ratio, of course. And you try to model as good as possible. And a lot of afford goes there. Now, as I said, to have maximum likelihood estimator of the parameters, what you want to do is maximize your likelihood. And usually, you have many parameters, so it's a large parameter space. And how to do that? One of the most easiest way is to cover your parameter space with a grid, with a mesh of points in ND. And you want all the points to be on equal distance from each other. And this grid, it's like a fish net, right? If it's too coarse, you will lose fish. You will lose your signal, you will not find it. If your grid is too fine, then you start losing in your efficiency. So you need to compute correlation of your data. You need to compute likelihood at each point of the grid. And if it's too fine, you're doing way too many calculations. And you might wait weeks, years, months in order to get the idea whether there is signal there or not. So it's interplay. You have to decide and choose what is the tolerable level of coarseness and how much of the signal you might tolerate to lose. And the distance between the points on this grid is not a coordinate distance, because we already introduced inner product, which hints which is measure of correlation between signal saw distance. In our case, it's also measure of correlation between points in the parameter space. And let's introduce this distance, this distance square. I think this should be square here, but it doesn't matter. So it's basically a measure of the distance between two templates, normalized templates in the nearby points in the parameter space. You can, this is the definition of this distance, square. Basically, this inner product, which we introduced earlier. Because the derivation of this parameter is small, you can use Taylor expansion and you will get this result. And if you compare and if you look at the interval in geometry, this geometrical approach, you can associate this with the metric in your parameter space. So what you want to place your templates on equal distance to this one, not in the equal distance of your parameters. Because there is a metric which governs actually the distance. So this is in a way of interval proper distance between the points in the parameter space, which tells you how much different points are correlated. Let's take one point, let's look at the two dimensional parameter space. I'm not specifying what parameters are there. And it's really just a cartoon. We choose a point in this parameter space here. And we fix this distance from this point to any other points in this sense to be quite small value. And we place all the templates, which are at the distance no more than this one from this point. And they usually form some kind of ellipse into D, or hyper ellipse in the ND. And this is what we will call the volume of a template. So basically, anything with this precision, anything falling in there will have quite high correlation with this template. And so you want to place your ellipses. First, we need to choose what is tolerable distance between points in your grid. And then those ellipses will be everywhere. And the ellipse, its shape, size, and angle, how much is rotated, is a function of your point in the parameter space. So ellipse here will be this form. A ellipse somewhere over here might have a different shape and a completely different shape. And in the ellipse, there are two principal directions. This direction is direction of strong correlation. It means that to get signal, which is our template, uncorrelated with this one, you need to go quite far in this direction and this direction of the strong correlation. So even small shift in the parameter in this parameter will make your two templates, two waveforms, unrecognizable and uncorrelated with correlation close to zero. For instance, in M1, M2, such a parameter here will be chirp mass. Chirp mass is a very sensitive parameter. That's what we measure the best. And even small change in chirp mass will modify your gravitational wave signal quite a lot. Its shape, you can see it by eye. But for instance, other direction could be a spin of the black holes. Waveform, gravitational wave signal, rather weakly depends on the spin. And so you need to change spin quite significantly in order to have a waveform, which is quite, looks quite different. And different or similar, we measure by using correlation or this inner product, as we discussed here. And here is example of the grid points which are used in the analysis in mass 1, mass 2. And as you see, there is a very densely populated areas at the small masses and the large mass ratio. Those are signals which have many, many cycles in the frequency band of LIGO-VIRGO. And therefore, changing parameter a little bit will change number of cycles quite a lot. And therefore, you need a very fine grid of the points to cover it. If you're going to high masses, this is equal mass line. And that's actually a template which triggered detection of very first gravitational wave signal. And there's a little circle there. You see the number of points there is quite small. Those are heavy masses. The signal in LIGO-VIRGO band, it's only about 15, 20 cycles before merger. So it's a very short signal. And therefore, in a way, slight change in the parameter does not change the signal drastically. And therefore, you need a fewer number of points in this region. Is that clear? So this is actually a method used to search for gravitational wave signal, what we call online, with low latency, so that we start to have a quick look to get the candidates for gravitational wave signal for further investigation later. And this quite efficient method, despite there are so many points, it can run very fast. Now imagine that you have a candidate or you have gravitational wave detection. You want to look a bit closer at the data. And here we're switching from frequentist analysis to Bayesian analysis. Bayesian analysis is very computationally expensive. And because LIGO-VIRGO data is terabytes of data, it's very hard to apply it to each chunk of data separately or on a whole data at all. It's impossible. And therefore, it's applied to the small parts of the data where we have identified the candidates for gravitational wave signal from the previous grid-based analysis. There is something to this high likelihood. You want to look at this part of the data a bit closer. The key point of Bayesian analysis, it's quite different treatment. If frequentist we were saying there is a signal given by God and it's just corrupted, and we're trying to do our best to estimate its parameters, here it's quite different philosophy. Here we're saying, well, we will treat parameters of our signal as a random variables. And we're trying to estimate distribution of this parameter based on the data. But before you start looking at the data, you need to ask yourself, what is the prior knowledge you have? What can you say about these parameters or model, even the model? And you have to apply this prior before you start looking at the data. So this pi, I always will refer to the prior. You can apply prior to a model, and you might have several models you want to try. It could be models within general relativity if black holes have spins or no spinning. If spins are orthogonal to the orbital plane or they have arbitrary orientation, it could be more general models, general relativity versus some other model which introduces deviation from general relativity. And you're trying to, first you need to start assigning some prior to this. Then you look at the data. This is your likelihood. That's data updates, your knowledge, your prior knowledge. And you get the posterior. So for the model selection, you want to estimate this thing. Probability of the model I given the data. And usually, not always, but it's very common that a model will be parameterized by some set of parameters. Here, I index refers to the model, and vector refers that it could be a dimensional parameter space. And for given model, we can again apply a Bayesian theorem. We have a prior attached to the parameters. It's quite important thing. If you don't know what it is, you should use probably a non-informative prior. For instance, you don't know where your gravitational wave source on the sky, so you might put uniform on the sky prior. But if you have some information, like for neutron stars, for instance, you know that masses of neutron stars should be between 1 and roughly 3 solar mass. You should use that information in your search for binary neutron stars. You always have some information about prior. You always, it might be non-informative prior. So going from minus infinity to plus infinity in the worst case. But even there, it's very rare when it goes from minus to plus infinity. You always have some bounds coming from somewhere. Yes, your posterior depends on your prior. And it depends on your prior knowledge. So you have to deal with this. Some people think it's a weakness of Bayesian approach. Some people think it's a strength of Bayesian approach. Yes, it's you're updating your prior information. And my prior information might be different from your prior information, and you might get different answers. But this is encoded in this theory. This is the idea of this theory. It must be like this. That's what it says. This is basically likelihood. And likelihood tells you kind of likelihood of the observing data D given model MI and set of parameters within this model theta. And you get posterior. The numerator here and there play a role of the normalization factors. And here it's placed not only normalization factor, but it also stays here. So it is called evidence. Evidence of model MI. You can play this game with different models. And this is very important parameter. This basically will tell you how likely one or another model given the data you are looking at. So what you want to estimate is this probability of the data given a model I. And this is basically marginalized posterior over all the parameters. Sorry, yes. This is your posterior. It's marginalized over all the parameters. And this probability of model I given the data. It's given by this formula. Basically, I have rewritten here. Doesn't want these two formula. So quite often, we want to answer this question. And it's by far non-facial because there is a normalization factor. So it's very hard to get absolute value for this. Because to know P of D, probability of the data, you should have a complete set of models. And it's very hard to have them. So quite often, a set of models actually even continues. And it's almost infinite set. So in order to avoid this problem, computation of P D here, people introduce what is called odd ratio. It's a ratio of the probabilities of model A over model B given the observational data. So that we avoid, we don't know the absolute values of this, but we can say which one is more likely, ratio of probabilities. And ratio of probabilities given by ratio of evidence. That's what's called base factor, this expression. Evidence of model MI. And the ratio of your prior odds. So if you believe that one model is more likely than others before you even analyze the data, you should use this. Often it put to one. Saying, why would I analyze the data if I prefer one or the other? But nevertheless, you should use if you have some prior information about that one model could be more preferred, you should use it here. But as I said, usually people put this equal to one and they concentrate on base factor. So what we want to obtain is, what do we want to obtain? We want to obtain evidence if you want to do model selection, if you want to test several models and see which of these models are better supported by data. And we also want to get the posterior. What posterior tells us is a, do I have it here? Yes, posterior here, evidence there. Posterior tells us for a given model. So you choose a model, you have a set of parameters and you have the distribution of the parameters. This tells you, given your prior knowledge about distribution of parameters, what you think it could be, data will tell you whether you're right, wrong, if it's, or it tells you nothing, basically. Your posterior in some parameters could reproduce the prior. But in any case, you want to have this probability distribution function for your parameters given the observations. This tells you everything. If you want maximum likelihood, if you want maximum posterior, if you want mean value, medium value, there's basically distribution. All the information contained there. If you want to get point estimator, like mean value, you can compute it. If you want medium value, you can compute it. But once you have this, it's up to you what to choose. I will very briefly tell you about a specific way of computing this posterior called Markov chain Monte Carlo approach. And I'll very briefly introduce a specific way of making Markov chain Monte Carlo. I'm not going into details, we're just idea throwing at you. So we want to construct Markov chain. Markov chain in the stochastic process where the next point of the chain has the knowledge only about previous points in the chain but not about any others. And we want to construct such a Markov chain which moves in the direction of the maximum likelihood. That's what this formula tells us. And we want to introduce transitional probabilities. So the way of moving from one point to another. So if you start with one point, we need to understand how to move from this point to the next points in the parameter space. So this is a chain which moves in your parameter space. And we want this chain to move in the direction predominantly of maximum likelihood. And we want this transitional probability, the way to move from one point to another to satisfy a very specific balance equation. These are our transitional probability and there is this P, sorry I used the same letter, probably I should have used different one but here is the conditional probability, here is not. This is the distribution you want to sample. In our case it's a posterior, it is this distribution we want to sample. So your transitional probability must satisfy this equation. If it does, then theorem tells us that after some time your chain starts drawing points from the desired distribution P, in our case it's posterior. So one of the important point is to construct this transitional probability which satisfies this balance equation. And one way of doing it was suggested by two statisticians, Metropolis and Hastings. First it was raised by Metropolis and Hastings extended this. And yes, what they're saying, well let's introduce first what they call proposal distribution. It's a proposal distribution, it's completely in your hands. It's completely arbitrary. It's a way of jumping from one point to another. It could be Gaussian, it could be uniform, it could be anything you want. The key point that your end result does not depend on what you use here. And then we build the chain by introducing the acceptance rate probability, this alpha. So acceptance probability going from theta k to theta k plus one. K here, index of the chain. Imagine the chain, I know it's our, we know it's also the parameter space. And it's a minimum between one and this ratio. And this ratio is what is called Metropolis-Hastings ratio. And it contains three parts. One is the ratio of priors. If your priors is uniform, that it's one. If it's non-uniform, you have to take care of it. The next term is the ratio of proposals which you have adopted here. It's in your hands. It could be symmetric or not. So probability going from k to k plus one or from k plus one to back to k could be equal. Easy examples, uniform. Then this ratio is equal to one but your proposal could be complicated enough so it's not one. And of course, you want to move in a direction of the maximum likelihood. So this ratio of the likelihood is one of the most important term in this ratio. So for instance, if this uniform, this uniform, what it tells us, of course, it's not good proposals uniform but nevertheless, let's just assume for a second. Then if next point has higher likelihood, so this ratio is above one, you always accept this point. Probability of acceptance of this point is one. If it is less than one, then you compute this quantity alpha and you might accept or reject it but you accept it with probability alpha. What does it mean? So imagine that this number is less than one. In practice, you're drawing another number. You may name it beta, let's call it beta. Uniform from zero to one and you compare your alpha with this uniform with your number beta. If alpha is larger than beta, you accept the point. If alpha is less than beta, you reject this point. And you start with a random point in your parameter space and you build the chain. I mentioned that your end result does not depend on the proposal distribution. Nevertheless, the efficiency of your algorithm depends a lot on this proposal distribution Q. So ideally, your Q must be equal to your answer. So it's chicken and egg problem. So to get the good proposal, you need to know your answer. To get your answer, you need to suggest proposal. So if you have good guess, you should use it. If not, you can try several of them and see which works better or which works worse. I want to give you an example of this taken from one of the book, Bayesian books. The chain is trying to sample here Gaussian distribution with a sigma, I think one. And this is a good chain. So basically you start with some random point. It was not the best point. It's wandered a little bit and after some point, you see it's pretty good sampling of the Gaussian distribution. Now, what happened if you use the proposal, also Gaussian distribution, but here it was sigma one, but you choose sigma variance of your Gaussian proposal, point zero zero one, very tiny. You will find that each time you're trying the point, it's accepted. You're trying the point, it's accepted. And you might be happy saying, oh, great, my chain is moving. But actually, if you look at this, it does move, but variance, actual variance, is defined by this one. So it will take a lot of time to explore, to go to this edge and then back. In other words, in mathematical words, it says that yes, you have long chain, many, many members in the chain, but all your points are highly correlated, extremely highly correlated. And to check that, you can look at the autocorrelation length. What is autocorrelation? It's, you take the data, you multiply with itself. This is zero lag and usually it's normalized to one. This autocorrelation here. And then you shift by one point and see how similar the data after shift to itself. Two points, three points, four points, six points, et cetera, et cetera. And you do it, shifting, keep shifting it, and you're plotting this product of shifted data with itself as a function of your shift, of your lag. And for instance, the black curve saying that actually after only 10, 20 shifts, data is completely different. And the red curve tells you, even at 200, 300 points, data still have self-similarity. It means that your points in your chain are correlated, quite similar to each other. So, and then the question here, how many actually independent points you have here? So you might have million points, but if you have autocorrelation length, 1,000 points, it means that every 1,000 points in your chain is actually uncorrelated and makes sense. So here you're spending a lot of time calculating likelihood and it becomes not very efficient. Other example is when you're trying to make a proposal with high variance, let's say the variance was here one, but you're asking variance to be 10 or 20. What happens there is that you're trying, next points, it overshoots your posterior and it's rejected, it's rejected at each step. And it stays basically for a long time at one point until it finds by chance something better. So again, this rejection rate is very high. So the rate of accepted point in your chain, very small. What I want to say here is, yes, your end result will not depend on your proposal distribution, but to get a good chain at the end, the good sampling of your posterior distribution, you might need to wait years. So theorem does work, but it doesn't tell you how long you have to wait. So efficiency, a lot of effort is put into constructing the proposal distribution, which gives you good something rate, good acceptance rate and, well, you see what I'm saying. Another problem is if your posterior is a multimodal, so there are several maxima in your posterior. This is also quite problematic because chain usually goes into the direction of maximum and it might get stuck in one of the maximum. It has probability non-zero to go down, cross the big valley and find another maximum. However, for this to happen, you might again wait for Hubble time. And there are other techniques which modify slightly how your chain behaves like simulated angelic parallel temporary. I'm not gonna talk about this at all, but you need to take care. The simple one will not really work properly if you have multimodal posterior distribution. That's what I want to say. You need to use extra tricks, they're known, but nevertheless. So at the end, you will have samples from your posterior. So it's a numerical probability distribution function. It's not analytic, it's numerical. You basically sampling from unknown distribution given the data. So let's look at a quick movie. It's a wrong movie, but nevertheless, it might give you idea. So there is a data there, there is a template and they're trying to fit template or waveform into the data. So first there is a fit in the distance. So distance gives you scaling and amplitude. Then there is a, they're trying to fit a mass. Mass tells you how stretch or compressed your signal is. And at the end, you adjust other parameters. It is wrong in the meaning that it is not done like that one direction at the time. It's usually done simultaneously, but this gives you roughly ideas that some parameters are responsible for the amplitude, some other is responsible for the face. And yeah, you can use this. So what at the end you will get is again, if you remember at last lecture, I showed you maximum likelihood estimation of the gravitational wave signal. It was single line, which fit in the data. In reality, when you have posterior distribution for each posterior distribution of your parameter set, you can compute a line here. And at the end you have not a single line, but rather broad line, which encapsulates the possible, all possible parameters which signal could have. So it has finite width. You can see it finite width here. Or you can draw it in terms of the indeed posterior distribution function for your parameters. In particular here, there are two parameters given here, chi effective and chi P. I will not go into details. What is each of them? I'll just tell you. So if this is L, this is orbital plane, two bodies, and if spins are arbitrary, let's call it M1, S1, chi effective corresponds to combination of the spins projected on the S parallel. Let's call it. These components parallel or to the L or orthogonal to the orbital plane. Chi effective is combination of other components which are constructed to find perpendicular. And chi P is usually these components determine better from gravitational wave data, rather than that one. And this is encoded here and that's why I wanted to show you this plot. So you see this little green line over there and here, this thin line, this is prior. This is what you used as your prior knowledge. And then you applied your search on the data and data told you, well, actually I didn't tell you much about this parameter. The posterior almost reproduces your prior. It means that data doesn't have any additional information to what you already knew. For this parameter, data actually gave you different answer. Okay, that was your prior knowledge and the posterior you see is much more confined in the parameter space. Moreover, it tells you that most likely value of chi effective is negative. It means that this component of your spin most likely looking underneath the orbital plane. Okay, and just example that sometimes data could tell you new information, sometimes it actually doesn't. Well, let's have a look at the other a few examples of parameter estimation. There are three, I will consider it only on two signals here. One is the very first gravitational wave event. This frequency representation of the signal, the widths corresponds to uncertainty in parameters. You can see that we have a little bit of an inspiral here. There is a merger here and it all happens within the quite more sensitive part of the bound. For other signal, it says this one, it's a longish in time. It's also propagate quite further in frequency and a merger happens where noise already rising so there is not much merger we can see. And this heavier mass, this lighter mass. And let's look at the parameter M1, M2, how well we can estimate them. So for this signal, it is this contrast there. There is similar contrast for another signal which is not shown there. But nevertheless, what I want to say is quite symmetric. This line is a equal mass line so you can flip it over, you know this total reflection. It's your convention and you're saying M1 equal to M1 larger than M2. And this one corresponds to this event and you see banana shape. So why is that? Why for one signal we can measure, you know, rather nice ellipse for other we see this banana shape. It comes because for this signal over there, we see a lot of inspiral this part and a little bit of merger. So we see this part of the signal and we hardly see anything here. For other signal, we see a little bit of inspiral and the merger. And the key point here, if you see only inspiral, it depends heavily on the best measured parameters I told you is chirp mass. If you don't see the merger, what you're measuring is this inspiral part, you're measuring chirp mass very well. If you see merger as well, merger depends on the total mass, M1 plus M2. It's a different combination of the masses. If you measure both of them, you start breaking the genocent and you can measure individual masses and that's what's happening here. Here you see merger, so you see inspiral a little bit and merger and that's regular shape. Here you see a lot of inspiral and hardly ever just a little bit of merger, you have this banana shape. An extreme example is the neutron stars. There's really a line of equal chirp mass because there you see only inspiral, okay? So knowledge of your source and the gravitational wave signal helps you to already predict what you could expect in your parameter estimation. This could be used as your prior, for instance. A little bit about neutron stars. So these black holes which were detected, these purple circles are black holes known from X-ray binaries and I will talk now a little bit, just a little bit about this binary neutron star merger. For those guys, they have finite size, they all have finite size, but nevertheless, they're deformable. So what was the first question? Yes, prior on the masses. For neutron stars, actually between one and three. It was a bit extended beyond three and below one. But in principle, you would expect that below one, you would rather form a white dwarf and above three, you should form black hole. So in a way for this binary, they were extended, okay? Below one and up to five, I think, in the search, but what you got at the end is support only really for what you can see here. So for black holes, it is uniform. Yes, it's uniform in the mass ratio and in chirp masses in some sense, but it's, yes, you don't put whatever it corresponds to in masses, so you usually use chirp mass or total mass and mass ratio uniform in these parameters. Now, while you're measuring one parameter or another, it really depends on your gravitational wave signal. So for instance, chirp mass, I think I already mentioned this, this enters at leading order, it's a Newtonian order in your phasing. We are extremely sensitive to the phase of gravitational wave signal and that's why this parameter, we measure the best. Mass ratio comes later on and therefore it's not as well-determined. It's still determining it, but not very well, otherwise you would have exactly the line. Here, nevertheless, it has a width. About spins, the same story. Leading order becomes projection, spin orbital term, so it's an inner product between L and S. And this component comes at a higher order which is spin-spin coupling, so spin-spin coupling. So it's really hierarchy of the, we oversee hierarchy as they enter the phase of gravitational wave signal. Of course, when it comes to the merger, things change and you might hope that some of this degeneracy is broken and some effects will be stronger, but you know if they were weak in the beginning, they become stronger, but not as strong in some sense. Yes, and for neutron stars, one of the idea which I think there was one of the questions during the discussion session about equation of state. So equation of state enters the gravitational wave signal through deformability parameter and what is called usually lambda one and lambda two. So you, from the equation of state, you can derive this universality relation, I love q, of q of lab number and the equation of state. So basically what I want to say is that they enter in the, yes, in the wave form through this parameter of deformability parameter lambda one and lambda two and that's the thing which we can measure or try to measure. They enter at very high post-nutrient order. Nevertheless, for neutron star, you have many, many cycles and this tiny effect propagates and gets accumulated and as you see, it's not very well measured. So this blue spread tells you how accurately you can measure. Nevertheless, you can say something. And you can constrain from this side and you can rule out some equation of state for neutron stars. So this direction makes neutron star less compact, this more compact. This is black hole. So zero, zero is a black hole. Remember this. And one of the question was, can we actually rule out black hole, black hole? So we know that this system is neutron star only based on the masses. We know that black holes should have heavier mass but nevertheless, one could ask, who knows, maybe it was Primordial black holes or something else, not from the stars. Yes, you see electromagnetic counterpart but can you just say from gravitational wave data? Yes and no. If you assume that these two stars, let's call them still neutron stars, have weak rotation and weak in the meaning that, well, it's still fast but not very fast. So all the neutron stars, all the pulses which we see there are not very strongly rotators. So they're rather rotating, not very fast. If you make this assumption, then you can rule out zero. So you can rule out black hole, black hole solution, black hole, black hole equation of state. If you assume that with unusual neutron stars and they can rotate very fast, then unfortunately, you cannot rule out zero, zero solution. That's the story. So based on your prior knowledge, you get different answers. And the question is now, what do you believe more? A few words and more about spins. So the spins could give us a clue how these black holes, these four black holes were formed. And very briefly, it's again, what is shown here, the semispheres for heavy body and lighter body. And it's shown spins. So how far, what we're looking here at the dark spots, okay? These are more likely, these two-dimensional probability distribution function. And two-dimensional, one parameter is magnitude of the spin. So it's the further away from zero means the higher spin of the black hole is. And angular distribution will tell us how well the spin is aligned with respect to the orbital angle momentum. So zero here would corresponds to its perfect alignment. Pi there would corresponds to anti-alignment. 90 degrees means the spin lies completely in orbital plane. I just want to show the zoo of what we got so far. There is no uniform answer to what we see now. The very first event, first of all, it shows that most likely spin was weak and we have no clue how it was oriented. For Boxing Day event, actually we see that spin is rather significant. So we could exclude values below 0.2. Orientation is a bit harder. There is a dark spot here, but nevertheless, as you see, there is still a lot of support for the spin to be orthogonal to the orbital plane. For this event, it's a January event. Story is different. There is more support underneath of the plane. So it means spin of the heavy body most likely was pointing underneath there. But very exactly, it's again, very hard to say. It could be completely anti-aligned or it could be somewhere in this area. So it's three completely different systems. Constitutionality. So we were lucky to see binary neutron star. It was very loud signal, but in one of the detector in Livingstone, we had this. This is a huge glitch. And it happened just right in the middle of the signal. So what do we do there? Throw away the data? No, well, you shouldn't. One of the easy way is just simply apply the window which is cut out to this part of the data and you try to use the rest. Just throw away part of the data. Or you can rethink and reconstruct your likelihood function and say, now my data model is gravitational wave signal which you can see here is a chirp, there's a frequency this time and you see rising frequency. Gaussian noise plus that thing. And so basically you take into account this non-stationarity until your likelihood model in the model of your noise. And you're trying to model this and in a way, because if you remember, likelihood is based on the data minus noise minus H. So if you put your noise as a Gaussian plus glitch, in a way, effectively this part removes this thing from the data. And you can try to feed this simultaneously with your signal. It was partially done. There was analysis of the data plus noise, Gaussian noise plus a glitch and it was removed but it was not done exactly properly. It was done first like that and then like this. You see, first the glitch was removed and then data was re-analyzed. The problem is this, your glitch removal software will remove also part of the signal. In a way, you need to put everything in to do it. I just want to say that if your data non-stationary you want to modify probably model of your data model of your noise and you can do it again properly. Now I want to say a few words about testing general theory. There were a lot of questions about massive graviton. I've already, I think, answered several times. The main effect which used to search for modification of gravity due to massive graviton is by dispersion. So we're looking at the propagation effect that different frequency of gravitational wave signals propagate with different speed. The higher frequency propagate faster, the lower frequency propagate slower. And you can try to use this effect. You can modify your waveform, your model to take into account this effect and then you search for it. And what data tells you, it's basically says I don't see much but nevertheless because you don't see much and at some point you should see it. It can tell you about upper limit on the mass of graviton or lower limit on the constant wavelength associated with this. So this is your posterior. Basically saying anything above this is so large that we are not sensitive to it. And anything above this we would be able to see but we don't see it. In other tests which are performed, actually almost all the tests which are performed are consistency tests because we don't have complete waveform rotational wave signal in non-GR theory. And you can construct phenomenological deviations from general relativity and trying to search or constrain them but otherwise, so everything what is done, let's modify GR this way with some maybe motivation and search for these deviations. Does data support non-zero deviations or not? Quite often zero is well within the posterior distribution which you get and then you can say that yeah well my data is consistent with GR. It doesn't prove that it's consistent with something else because in a way we don't have something else. But so far we can say it's consistent with GR. Or you can, well this one of the example you can split your signal in two parts. This yellow part which is in spiral, pre-merger and the green part which is merger and post-merger. You can try to estimate parameters of gravitational wave signal from each part independently and then compare them with each other and with the parameters you got from the full signal and that was shown here. So the yellow part gives you this dark dashed line called in spiral. The green part of the signal gives you something, this purple semi-dashed line. So you see they have quite good region of overlap. This projected into two parameters, final mass and final spin of a black hole after it's being formed. And I think Victor next week tells you more about this, how to compute those things. And it's also overlaps with the parameter with the posterior distribution which you get from the full signal. So it's again a consistency test. You're just checking whether different parts of your signals consistent with each other within GR and GR. That's all what I want to say about data analysis. Other questions here. So I have time, it's very good. I will switch to pass the timing already, otherwise. So spins entering, yeah, spins entering in two ways. The entering in the phase formula and for instance the easy example would be if spins are aligned, this configuration, the wave form will be longer. And signal will be stronger. So for the same masses, if you have spins aligned like that, the signal will be stronger and longer. It's encoded into phase, basically you'll have additional cycles there. If it's anti-aligned, you can guess that signal will be shorter and weaker. That's the easiest example to tell you that actually spins entering the phase and they defined number of, for instance, of cycles which is a gravitational wave signal spins in a band. If you have precession, the additional effect would be modulation of the whole signal. So your signal will not look like, I will show you the envelope, okay? That's your usual one, you know, here. But it will look more like the oscillations in between. So this second effect through which you can see the effect of the spins. Especially if there's this orthogonal component. Well, there are oscillations here, but it's just amplitude, how amplitude behaves. And no, it's, yeah, yes, basically it depends. This modulation depends on the component of the spin in the orbital plane, okay? Pulsar timing? Good. What is the main idea about pulsar timing? So if you remember for Lisa, I told you about that in principle, when we looked at the single arm response that theoretically we have, we need only one arm to measure a gravitational wave signal. But we cannot do that because there is a noise and main source of the noise is a laser frequency noise. It's not stable, frequency fluctuates so much that we will never be able to see gravitational wave signal and that's why we need several arms and interferometry to cancel this noise. Now I'm coming to pulsar timing and saying, okay, here we do have one arm. So we have pulsar in one end and pulsar here works as a clock. Tick, tock, tick, tock. And then sending these ticks and tacks to us. I'll tell you about how it does at very regular intervals. Very stable, okay? And these ticks and tacks they propagate these electromagnetic waves. It's not laser, it's radio pulses and they propagate in the gravitational wave field so they're blue and red shifted. And so at the end we receive ticks and tacks with irregular intervals. That's the basic idea. Now instead of one arm you have many of them, each pulsar can be seen as each arm and there is no laser as I said there is a radio signal and it's traveling in only one direction. So it's a nature provided detector over for gravitational waves. I will go a little bit more into details but basic principle is this. The range for which pulsar timing is sensitive and another hertz bond between 10 to the minus nine and 10 to the minus seven hertz. A bit, few words about the pulsars. So those are neutron stars. If you have a star with a mass more than seven solar mass but let's say less than about 20, at the end of its evolution it will explode and create neutron star. Neutron star when it's created it has very strong magnetic field and fast rotation. Then it's losing its rotation through the emission of various radiation. A lot of radiation comes from the magnetic poles. For us the one which is important is radio emission. So there is a radio beam coming from the polar, magnetic polar regions. And when it swaps across line of sight it's like lighthouse, you see the pulse. Pulse, pulse, et cetera, so it's nothing pulse. Okay, and those pulses here I basically showed here. There's a very first pulse, which was discovered. It's really like lighthouse. Yes, what we need is not just normal pulses, we need millisecond pulses, why? Because they have extremely stable period of rotation. They're very good clocks. The pulsar clock, so the stability of their clock based on the pulsar, better than terrestrial standards on a long-term scale. On a short time it's they're losing, but on a scale of 10s, 20 years they're more stable. What is a wide assos stable? Those are actually all pulses. They quite often, this usual diagram called PP, the PP dot diagram, this period of the pulses, this derivative of the period of the pulses and all millisecond pulses are usually here and majority of them in binary systems. There are some of them which are not. If you want, I can tell you later. You ask me, I'll tell you why they're not in binaries. And the mechanism, how they were produced, so old and fast spinning is because they were indeed in binary at some point, at least in some point, maybe still in the binary. And they moved a bit faster than its companion and they started creating the mass. So they were dragging the gas from its companion from the star and forming a little accretion disk and the falling gas donated its angular momentum to the neutron star making faster, faster, faster. And that's why it's there. We have very old, very boring neutron star, with still quite strong magnetic field, very fast rotation, and it doesn't glitch much because it's too old, it's already settled down. And there's a perfect guys for us. These are very good clocks and that's what we are using. Each radio pulse which we observe is different in structure. Due to various reasons, one is the internal structure of the neutron star, another is propagation effects. But if you're averaging all these profiles, radio profiles of the pulses, you will get something extremely stable. So if you're averaging over this hour, next hour, next hour, this doesn't change. Each of them could be quite different. And this profile is used to correlate with each individual and to estimate time of arrival. That's a key point. You want to estimate time of arrival of each pulse. Now, if you know the, and we do know a period of the pulse, or we can predict arrival time of each pulse. And we can compare it. So the red one is predicted, let's say, no, whatever. Let's say red is predicted and what is observed will be blue one. And we see the difference between what we predicted and what we observe. And that is what is called residuals in time of arrival. Okay? And there are several reasons why we expect that our prediction does not match observation. I'll try to tell you some of them and we need to model them. So the residuals, they could look like this, like that, like that. So for instance, if you did not estimate very well position of the pulsar on the sky, then all your residuals will be modulated by one year periodicity. It's because you think pulsar is there, but actually it's a little bit off. And you can actually adjust position of your pulsar to eliminate this. This one of the way of measuring the position of the pulsar on the sky with even better accuracy than by, than just triangulation. If you did not take into account that actually period of pulsar changes, so you need to take into account first and second derivative of the period of the pulsar, of course it's emitting radio emission, probably other emissions, so it slows down. The energy extracted from its spin. So if you don't take into account, you have residuals of this form. So you have to take into account. If you did not take into account proper motion of the pulsar, so it moves somewhere somehow, then you will have residuals of this form. So you need to take into account many of these factors in order to have residuals of this form, like a white noise a bit. In addition, radio signal propagates in interstellar medium. So it means there is a gas between us and the pulsar. Moreover, it's not the same gas. So first of all, Earth, radiotelescopes moves around sun. So you see pulsar with different angles. So the radiopulse coming through the different interstellar medium. In addition, pulsar moves itself. We're moving in a galactic plane, et cetera, et cetera. So there are variations in dispersion. Time-dependent variations in dispersion. Too much. And you construct the model which takes into account all these effects in order to predict where, sorry, not where, when each pulse should arrive. And this should take into account period of pulsar and its first second derivative, so it changes in time. You need to take into account that difference between clock which is used and the radiotelescope and terrestrial standards. Then if you have several radiotelescopes, you probably want to reduce all these observations to solar system body center, take into account all the relativistic effects. You want to take into account this dispersion measurement. It's not only constant, but actually it's time-dependent. You need to take into account what is else. Yes. Doppler motion. So it's a relative motion of the pulsar and the radiotelescope plus gravitational redshift caused by sun and planets or, as I said, millisecond pulsar quite often in the binary system. So a redshift due to its companion. You need to take into account Shapiro delay, this extra time which is taken due to propagation of electromagnetic field in the curved space time. So as I said, again, in the binary system, there'll be companion and quite often it's not negligible companion. It's at least solar mass, sometimes larger. So the space time is curved and there is a delay due to that. Now you took everything into account and you're saying, well, now my model is complicated. I have this prediction for time of arrival, this time of observation for time of arrival and I take residuals which are to the best of pain, all it should contain only noise. Of course, there is noise in the radiometer. There is noise in the pulsar. Hopefully very small errors because I cannot determine these parameters to absolute accuracy. So there are some errors still exist and due to gravitational wave. Is it clear? Now I already discussed this parameter omega L. If you remember, we talked about this during when I was talking about LIGO Virgo that it's much less than one which means gravitational wave length much larger than size of your instrument. Then omega L comparable to one in case of LISA, but for LISA, so you can ask a bit different question actually. You can say this epsilon, okay, which is omega L basically here. I restored speed of light. At which frequency it is equal to one and you will see that the LIGO is 12 kilohertz. It means that it's always true. For LISA, it is 50 millihertz. It's inside the band. So it's some frequencies in LISA, a long wavelength, some of them comparable. And in case of PTA, it's 0.029 hertz. So for PTA, it's opposite. So gravitational wave length always much shorter, significantly shorter than the size of your device. Size of your device is basically distance between earth and the pulsar. For those systems, we have to use the formula which we have derived already. This did not assume anything about omega L and we use this in case of LISA already. So nothing new. But instead of just delta nu to be fluctuations or change in the frequency of your laser, now delta nu is fluctuations in the frequency of arrival of your pulses. Instead of frequency of the laser, you have frequency of ticks and tucks in your data, roughly speaking. Otherwise, it's exactly the same. And delta H is this expression. It's again H at the time of emission of radio signal minus H, which is the gravitational wave signal at time of observation on the earth. The big difference is in case of LISA, it was eight seconds. This is the time difference. In case of pulsar timing rate, the distance is a few kiloparsec, which means the distance of the light travel time could be around 10,000 years. So in other words, this term is sometimes called earth's term because it's a time of observation on the earth and this term calls pulsar term. So first of all, the pulsar term depends on the distance to the pulsar. So it's different for each pulsar. This term, it will be the same for all pulsars. It does not carry any information about pulsars. And this completely coherent term from each observation of different pulsars. You observe 40 pulsars and it will be the same contribution for each of them. This term has different contributions. So it's non-coherent term. It's not the same term for each pulsar observation. And there's also this factor, geometrical factor, and is the direction to the pulsar, k is the direction of propagation of gravitational wave, and if gravitational wave signal source, sorry, behind pulsar, the whole response is completely equal to zero exactly. So that's what called sometimes that radio pulse surfing the gravitational wave. So number of blue and red shifts, they basically cancel out. Another way to look at this difference, so this earth term term and pulsar term, we are looking at the, first of all, the sources, which are very broad, but I will talk about this. Let's look at the binary source. Just H-I-J is this signal, this cartoonish. The earth term corresponds to this part of the signal, and the pulsar term corresponds to this part of the signal. So we see different parts of the gravitational wave signal which are separated by thousand years in its orbital evolution. This radio telescopes, I think it's, we can skip this square kilometer array. How to improve the sensitivity of this pulsar timing array? First of all, you want to have more pulses, good pulses, then your sensitivity scales as a root square of number of pulses, just n measurements. Then if you have better device, then you can, the strength of your radio pulse increasing. If you have strong pulse, it means you can measure time of arrival more accurately. It means that you are timing much better. So there are two ways, one, improving instruments, and second, by have more pulses. Now I want to say a few words about the sources. One of the, because I like binaries, I have to say about the stuff with the binaries, okay? So the, oh, sorry, I'm running over the time. Can you let me finish, it's really two slides and I'm done. Okay, thank you. Binary sources. Those are supermassive black holes in the local universe, but it's not the same, it's almost the same things as for ELISA, but at the orbit with period of years. So they were very broad orbits. And because of energy release at such broad orbits as tiny, the signal which we see is monochromatic. Over 20 years of observation is still monochromatic, but monochromatic, but frequency of pulse and earth term will be different because they separated by 1,000 years in orbital evolution. You see? So this term will be monochromatic, this also, but it's the signal contains of two parts. You need to understand that. If I place, just take a physical population of the binary black holes, supermassive black holes in local universe, each of them will be dot because the monochromatic, this frequency, this amplitude of gravitational wave signal, I will sum up them and I will get this blue curve. And this blue curve could be decomposed in two parts. The smooth component, it will be stochastic gravitational wave signal from the superposition of many, many, many, many monochromatic signals. And occasionally if I have source nearby, I will have this spike. So it means monochromatic signal on top of the stochastic one from the source which is rather close or more massive. So I might be able, I might want to search for stochastic signal and monochromatic signal. Other sources, of course, of cosmological origin. This signal spans everywhere from what people trying to measure with CMB polarization and up to frequency of LIGO. And here is a try attempt to measure the also stochastic gravitational signal from early universe in nanohertz band. The key point, and I think I will stop, yeah, I'll stop here after that, is the following. For stochastic gravitational wave signal. It's a noise like source, okay? In your deviation, in your residuals, it's a stochastic process, your residuals are random. But there is an element of randomness which are common to all pulses. So if you can relate the data pairwise from these pulses, residuals, and from these pulses, you can relate, and there'll be a correlated part due to gravitational wave signal. And it's not only correlated, but it's correlated in a certain way. So if your stochastic gravitational wave signal isotropic, then correlation between pairwise of the signals depends only on the angular distance between the pulses. So if you try to correlate data, pulse of data from these pulses with these pulses, then from that, from that, and you start putting points on this correlation, you compute correlations versus angular distance between the pulses, you will get, they should lie near this curve which is called Halings-Down's curve. It's very intrinsic to gravitational wave signal. And that's the signature which we are looking. And let me just say a few words at the end. This is just the beginning of gravitational wave astronomy, okay? I'm not that old, but I know it's really yours. Because I hope to see a little bit, but for instance, Lisa, it will be 2000, data taking properly will be 2035. So I will be a problem in the age of retirement. And so it is your mission, it's your data. This, the gravitational wave in PTA is not detected and not because there is no gravitational wave signal. It's definitely, it's there. It's just integration time. So the period of the gravitational wave signal is years, okay? So in order to integrate these signals, you need to have 30, 40 years of observations. At the moment we have 20. Again, it's yours. So it might be quite exciting time for you. I hope to see a part of it as well, but nevertheless. And I also want to thank organizers for inviting me and it's great opportunity and great experience and it's great organization. And I want to thank all of you for being here and for very helpful and stimulating discussion in your questions. Thank you very much.