 Well, thank you for inviting me, Dion. This talk isn't very long. I gave it once, so I know it should be only about 10, 15 minutes. So if any of you wants to stand up and deliver a five minute harangue about the finetti, like the guy at my last talk, then you'll have plenty of time. The talk wasn't for a networks conference, so I have this slide explaining about networks. I'm just dealing with undirected networks here. That's all I want to say about that. The problem that I got was we have a network, and each node in it has an attitude score, which is given to us between 0 and 10. But we don't know the attitude scores for most of the nodes. So one of the problems is that I don't know what the data represents, but I assume you could think of it as being a network of criminals and the attitudes how bad they are, something like that. And what we want to do, we don't know the attitude score to many of the nodes, so I want to guess what those should be, and that's called imputation. So there are various reasons why this is difficult. In this particular case, so as I said, I'm not too sure what the data is supposed to represent because it's secret. Now, I don't know what a typical data set looks like because I only had one example data set that was declassified enough to play with. And for that data set, we only have 395 nodes and 272 of the attitude scores are missing. So that's quite a lot of missing attitude scores in this network. And most importantly, the client wants to know for some reason what the attitude score should be for missing nodes, but I don't know why they want to know. And obviously, if you don't have the final thing that you're trying to find out in mind when you're doing it, it's not clear how you should proceed. It makes it more challenging. So welcome to the floak and dagger world of international tax. Very exciting. So my approach, rather than using the data because I didn't have very much, was to do a model based on general theory and then give it to the client and see whether it was useful. So certainly, because we have a network here, it must be expected that your attitude is somehow related to the attitude of the people to whom you're connected. So we want to use that in some way. Now, there's this standard model for how an epidemic spreads on a network, which is that it goes through time. At each particular time step, the nodes in your network is either infected or not infected. So it's just a 0-1 thing. And then the probability for you of being infected in the next time step is given by the number of your neighbors who are infected if you're not infected already. So that's an idea for how an epidemic spreads on a network. I think it's quite a standard model. And hopefully, you could treat these attitude scores as maybe the probability that you're infected, something like that. So the attitude scores are in the range from 0 to 10. So we could think of them as normalized on the interval from 0 to 1. And maybe we could use a model like this in some way to try and find out what the missing attitude score should be. You can't directly use this model because it doesn't make sense in the context of the problem. But what I did was just use it as inspiration. So I thought maybe if we knew the probabilities of everyone being infected, or which were the infected individuals, we could think of our attitude scores as coming from that when it was advanced by one time step into the future. So that was the idea behind this. So what we do is we assume we have some measure of degree of infection with values in the interval from 0 to 1 for each of the nodes in our network. And then we assume the attitude scores are given by something like this, where your attitude depends both on how infected you are and how infected your neighbors are. So that's what this is saying. And those have been normalized. They lie in the range from 0 to 10, which is what we wanted. So the idea is you find these RIs, which are sort of latent information, and then reconstruct the attitude score from those. And this will give you an attitude score for every node, not just the ones which you already had. So you choose the RIs in such a way that they generate the attitude scores that you have. And then from the same RIs, you try and reconstruct the attitude scores of the missing, the nodes for which the attitude is missing. This is a very simple model, because you can express the attitude scores just as a matrix times the RIs. So the idea is we find the RIs. And then this matrix is known because it's just given by part of the adjacency matrix that I graph. And then once we find the RIs, we can use the equation from the previous slide to get the attitude scores from the missing nodes. So to find the RIs, we have to solve this problem. We want a matrix B times R is equal to A, with the constraint that the RIs have to be between 0 and 1. It's quite easy to solve. I didn't realize, but it's a convex problem. So you just need to find a local minimum, and it will be a global minimum. So you can do that numerically. There's probably a better way of doing it, but I haven't found out about that yet. And then you impute your missing attitude scores like that. And what's good about it is if you start it from a random choice of RIs and then find the minimizer, then, for example, if you had a node that was completely separate from the rest of the network, its RI would never be affected. So by starting with random starting points, you can get a feel for how uncertain the results are. For example, if you're completely uncertain, then you'll find if you do it maybe 100 or 1,000 times, you'll get a uniform distribution for the attitude scores for that particular node. So that's quite convenient. Here's an example. It's not real data. It's just a simulated network where I've randomly put in some attitude scores. So here the red ones are the ones with high attitude scores, and the very green ones are the ones with low attitude scores. And then I've left out a bunch of attitude scores, and this is what the result is of the imputation. So if you look at it, you should see that most of them look reasonably sensible. Like for example, this one here is quite brown here. The one next to it is sort of reddish-browns. That's probably OK. It doesn't perfectly reconstruct the original attitude scores, but it's pretty close. And anyway, we don't really want to measure it using in-sample performance, but obviously we like it to match the ones we've already seen. So that's basically how it works. And the real problem was how to evaluate the results, because as I said, we only had one data set. So all I could really do was give it to the client and ask if it was OK. Now, one thing is we're actually fitting all these RIs. So there are n of those, one for each node in the network. So that is going to be a massively overfitted model. So just looking at how well it fits the known attitude scores is not going to be very useful because it will probably fit them quite well. Usually, we measure the performance of the algorithm using cross-validation in some way. But so for example, we take a network, give it some attitude scores, delete some of them, and then see if we could recover the deleted ones, or how well we could recover the deleted ones. But in this case, that's going to be quite tough, because there's only one example. I'm not too sure how you should do this with network data, what you should actually be deleting. So I was a bit confused about that, so I didn't really do that. And then importantly, we also want to think about how we should evaluate the performance of our imputation method anyway. Like, is it as good as a 7 or not? Sum of squared errors isn't very useful as a measure of how well it performs, because if you square the error, everything is going to come out to be a 5, basically, because you're only looking at values between 0 and 10. And sum of absolute errors is problematic for other reasons. And also, maybe the end user doesn't even care. They might just categorize things as small, medium, or big. They might not care where it is on a scale from 0 to 10. So you have to think about that as well. But all that being said, it was OK. As far as I know, the feedback I got was it will be a real practical use straight away. So I was quite happy that it turned out to be useful, which was the main thing. Yeah, so that's all I wanted to say. Thanks. Well, I think it's a social-looking network, sort of scale-free with a core of highly connected individuals and then some preferential attachment type things. So I'm expecting that it will be applied to similar networks. And the one I tried to simulate here was sort of like that. I just started by simulating a network with preferential attachment, and it didn't look very good. So I started sticking edges in the middle there, and then it became a bit of a mess. So I stopped before. I wanted to make sure it had some of these tendrils sticking out. I think that's the kind of network that it might be applied to. Attitudes and extricent agents, whether you thought of using information about the connections between your neighbors, so me knowing five bad guys may not take my attitude as much as me knowing a group of five guys who are all of a gang. Yeah, that's a great idea. And the original version, I think, that the guy was doing was using network statistics. And I certainly think we could probably incorporate that as well and use more than one model. So that's what he was originally doing. But I don't know if it was quite the same as yours, but he was using all these things like clustering coefficients and things to try and do the imputation. You can imagine that there's two very different explanations probably both going on for the attitude. One is that the connections are formed between the other would be that they're actually emulating. And both your model seems to allow both interpretations. But if you thought about which way you might, whether you would want to adjust the model in different directions, if you thought of this as more of the connections are useful just because their individuals are likely to have core data characteristics typically form that work in that way or because they're actually influencing each other? Well, it's very hard to tell not knowing what kind of data it's going to be applied to. That's the big problem. Certainly if we knew, I'm sure whatever data set it will be applied to will be coming from somebody who knows a lot about the subject area. So they might be able to choose a model based on that kind of knowledge. But the way I was presented with the problem, it might not even be about tax. In fact, I think the networks come from the police, but I'm not totally sure. So yeah, the lack of the main knowledge is a big problem. I'm sure if we had some more of that available then we could find a better model. Yeah, I mean if you were actually using it to catch criminals. There was a tuning parameter in the formula, but I just said it to be one. Yeah? One is a nice number. Sorry? Well, I think if it was zero then it would just be a free for all, right? So that sort of measures the amount, the extent to which you're influenced by your neighbors. If you made this beta very large, then the influence of the neighbors is, well, I did try it with beta being infinity, actually, because I left out this RI term. That model wasn't so good. I got feedback about both of those. So the beta equals infinity and beta equals one versions I actually presented to the client. And we preferred the one with this extra term here. We could look at large values of beta that were bigger than one, maybe, and think about increasing the influence of the neighbors. There's a huge number of pieces of information that are missing from the beta model that if you're able to ask for one more, would you go and ask for something in general, like what is this network describing? So that's not actually giving you that map of information that's telling you maybe about the process. And all you'd be asking is if it's in there. No, I think probably the main knowledge is the most important thing in any statistical problem. So that would be the one thing. It's pretty extreme, isn't it? It's hard enough in social network research to get beta. There's a lot of proprietary stuff from the manual company when we get the problem straight. I don't know. That would be more effective, though. If you finish with a crash, we'll thank Richard again.