 Good afternoon, everyone. Thanks for joining us on the third and as yet final social network analysis webinar by the UK data service I'm German McDonald a research associate here at the University of Manchester with the data service and thanks very much again today, we're going to focus on Some of the core techniques and methods of analysis and measures that we can apply to social network data So some of you have probably joined us before and thank you very much You'll have noticed we've been running these series of training events Which are part of what we've been calling our new forms of data training series This is probably more accurately described as our computational social science training series So currently we've got one more coding demonstration. So this is this a live Demonstration of how to do text mining in Python. So tomorrow afternoon. We've got the final session with my colleague Julia Kazmaier tomorrow We've got some past webinars that you can view the recording and use all of the other training materials that we provide also So we have the previous two social network analysis webinars and some of Julia's previous text mining webinars also But if you go to the UK data service Events page you can look at the past events and you can gain access to all of those training resources for free But today we're going to focus on Probably what we're all most interested in when it comes to any form of social inquiry the analysis So very quickly for just two or three minutes. We'll do a refresher of what we mean by social network analysis Just so we're all on the same page when we start implementing some of the methods and measures through a live coding demonstration Shortly, so we focus on analyzing social network data So we're not going to focus on analyzing social media Data, even though that data is itself structured as a network And but the techniques and the measures we cover can be applied to any data set that captures the connections or the relational Attributes of a group of people a group of organizations or any set of units of analysis And we look at the analysis from two perspectives We look at describing the network overall so that what network is formed by entities and the connections and That network has properties and can we measure some of those properties some obvious ones would include the You know size of the network, you know, how many connections exist and plus we look at some more intermediate level analysis also We can also approach it from the perspective of the nodes So the nodes are the entities that exist in a network and we can look at the describing the nodes that form Our network so we can look at how central how important some nodes are We can look at whether no nodes play particular roles whether some are brokers some act as hubs within the network Whether some nodes are positioned closer and you know occupy a strategic position in the network Also, so that'll all form a kind of live coding demonstration where we can take our time working through the measures And I'll show you how you can implement those measures using Python and a social network analysis package in that language And then we'll come out. We'll take some some questions and I'll point you again to some further learning and resources I'll address this more at the end But while this is the final webinar in our social network analysis Web in our series that doesn't mean it's the end of our training provision when it comes to social network analysis So at the end if you have some questions and when we send you the feedback form It would be good if you could say, you know, what further training you would like to see whether it's relating to social network analysis Or the Python programming language, or maybe you want to see the same material But converted into the or programming language, for example So please be as honest and as forthright as you want at the end. I will do our best to develop some new training materials So very quickly why this training series if you're here for the previous two, you know Why we're doing this training series social network analysis, you know offers a lot of potential to social researchers It's incredibly rich and methodological approach and a lot of our Online lives especially, but even some of our offline lives are really characterized by our relationships and the patterns That these relationships form at an aggregate level The problem is social network analysis Derives itself from graph theory, which is a branch of mathematics Which itself informed network theory, which then informed social network analysis and theory Therefore what it means is social network analysis is quite a technical and mathematical and abstract Approach methodological approach. So while it has a lot of concepts that do map It's a sociological Phenomena and concepts of interest the language used is very technical. It can be very off-putting So our intention with this training series is to demystify and to clarify some of these technical terms and algorithms So a very quick refresher on what we mean by social network analysis So as I alluded to it's a methodological and a conceptual Toolbox, so it's a very broad and rich methodological approach and it allows you to measure and to describe and to analyze patterns in relational structures in The social world in essence people form connections those connections then aggregate up and form networks that we can analyze And of course, it's not just people so despite the name social network analysis You may be interested in how organizations and are connected I've seen some really good studies that we'll reference later that looked at animal networks food sharing networks among jack-tall crows for example as a really and Very instructive surprisingly instructive example of social network and analysis So this allows us to measure and describe and analyze Connections and patterns in the social world A relation itself is a distinctive type of connection or tie that exists between two entities We're probably all thankfully familiar with familial connections siblings share a familial connection parents and children cousins uncles and aunts Etc. But of course we can be connected in so many other ways. We can be colleagues. We can be gym buddies. We can be You know flatmates we can be spouses and Whatever else you can think of really with your imagination So relation is just a distinctive type of connection that forms between two entities and Thus those relations become the building blocks and of networks So a network really is an aggregation of all those patterns of connections to form some sort of overall coherent network and social network analysis thus is concerned with and most appropriate and For data that captures these relations between units of analysis So we can in previous webinars. We looked at, you know, Twitter data, which you know, by default And is structured as network data by that. I mean tweets are liked by different people tweets are shared by other people And I can follow other people's accounts on Twitter So, you know the very nature of the interactions on the Twitter platform give rise to network data But as we'll see in this example today and previous examples, I showed you and Traditional data administrative data social survey data will contain information on how units of analysis are connected And we can restructure that data so that it looks like a network and then we can analyze it using social network and Analysis again, that's best demonstrated which we'll do so in a couple of moments So to keep these kind of key terms and concepts in your head as we progress and a network is constructed from two main building blocks So there are the entities that are or could be connected in a network So these are the people the organizations the animals the countries the places and whoever the units of analysis are in your study and The connections that exist or could exist between these entities on a very Simple way and we can describe networks using two building blocks And a network then is an aggregation or a collection of these entities and the connections that are formed between them Very simple example a family tree is the type of network It contains individuals who are our entities and these individuals are related through some type of familial Tie so some type of connection This is a real social network that we looked at in the first webinar It's from my own research area of charitable organizations And these are Manchester based Charities and these are all the connections that exist Between them so I won't go into we won't be analyzing this network in particular But something reasonably similar to do with charities and But visualization is an insufficient, but it's it's an interesting first step in the analysis So just by even looking at this we can see that there's a central cluster or Aggregation of charities that are densely connected and then we have some on the outside who only have one or two connections to other organizations So let's get stuck straight into some of the analysis So what we have here is if you've joined us before something called a Jupiter notebook a Jupiter notebook is basically It's like a word document if you want to think of it that way But the document can contain more than just the narrative It contains code and it contains the output associated and with some code also And I find them quite a useful computational tool So for a lot of my research I use stata the You know social science statistical software package and you can you know have a Jupiter notebook which contains stata code You can have a Jupiter notebook as we'll see today, which contains a Python code It can contain our code Julia and there's lots of other Languages and but that's just so you have a little bit of context of what we're going to do today So the first thing we need to do to set up You know our notebook and Python for our analysis is we need to load in the packages that we need So I won't go through this but at a later date if you'd like we can have a coding demonstration Where we actually go through the meaning of each of these lines But in general what I'm doing is I'm just loading in all the different methods and functions that I need to analyze network data So I'll do that so I'll load in all the packages that I need and I will load in the data that I need so here's an example of some Funding organizations in the UK and the network that forms between them So here on the X are the Rows here we have different organizations So these will be individual funders So one of these will be children in need for example There'll be the Lloyd's TSB foundation So the banking organization has a foundation that provides funding to charitable organizations The big lottery will be one of these funders Etc. We have the same set of organizations Along the top here as a column and then basically each cell in this data set indicates whether these funders are connected to each other So for example this funder here and this funder here and Both fund the same for organizations So each funder might fund a couple of hundred of organizations each But there are four in particular that they both provided funding to The same organization here has funded thirty five of the same charities as this funder here for example We'll dip into the data a bit more when we produce the analysis I just wanted to give you a little bit of a sense of What the data entails and this is open data. So the underlying data set From which I created this network data is available on the GitHub repository, which I'll show you in a moment And so you could recreate this analysis yourself and or you could extend it and do do what you like with it But basically this data set contains all of the organizations funded by about 83 different funders in response to COVID-19 So there's lots of charitable organizations in the UK who thanks to the public health emergency Have been targeted with funding by these 83 funders and these funders are connected if they fund at least one of the same Organizations so there's at least one organization in common that they've both provided funding to So what we do is we load in that data into Python into the networks and Network analysis package and the first simple thing we want to do before we dive into Specific measures and methods of analysis is we just want to get a sense of The number of nodes in the network i.e. the number of funders And we want a sense of the number of edges i.e. the number of connections or ties in the network So we can run the code here Previous results and where they are already as you could see so we have a network containing 83 nodes And there are 640 connections between these nodes And on average and a funder is connected to about 15 other funder organizations So that seems like quite a lot But when you consider that there are 83 funders and there's about 10 or 11,000 different organizations That have been funded there's going to be some overlap between who they funded So that you can see that already just from such simple measures we can get a sense of well Either all of these funders have 15 Connections or there's maybe a couple of funders who are connected to lots of others And then there's a larger number of funders who don't have any connections At all and we're going to explore that as we progress through the analysis So let's first look at the network level measures So what kind of properties does our network of funders have and how can we analyze and describe these properties? The first thing we'll do is we'll just get a quick, you know overview of what the network looks like So we'll produce a network visualization You may have heard these referred to as sociograms in social network analysis Or just network visualizations or network graphs and there's lots of different terms for them Make that a little bit smaller Excellent, so the first thing we can do is we can say we want, you know a random layout of this network So here's how it looks like so we can see that all of these circles Our dots represent the nodes in the network and the lines represent the edges or the connections between these nodes so visualizing is Often a an insufficient and it's often unnecessary frankly when it comes to analyzing networks But you can produce one at the exploratory stage and we can still learn a couple of a couple of interesting things So for example, we can see that there are at least one two Three funders who have no connections whatsoever in the network. You can see that there are no lines coming to or from Three of these nodes so far So at least that tells us okay There are some funders who didn't fund the same organizations as any of the other funders in the network So we would call these isolates. So these are isolated nodes We can also get a sense that there are a couple of nodes who are really Densely connected so we can see lots of lines going in all sorts of directions from this node and here And we can see that there are some that have been positioned on the periphery of the network Which have one two three? Yeah, maybe three or four connections as well So we can see when we had our average measure previously of 15 connections between For each individual funder. We can see that that's you know Quite heavily skewed by a small number of organizations who have lots of connections And a much larger group of organizations who have very few or none no connections at all To highlight why visualization can act can be quite unrevealing and unnecessary is this is the exact same network Just vision just represented using a different Algorithm so the previous one was what's known as a random layout and in the networks Python package We can say we want a spring layout and you can see that this again shows us some of the isolates And we know the scaling is quite off is quite difficult to you know to distinguish what's going on in this clump here We can use what's called the Kamada kawaii layout as well to look at the network This seems like somewhat of an improvement over the random layout But again, it's it's not telling us very much that we didn't already know and there are four or five if not more And visualization Algorithms that you can use in the networks package. So hopefully that conveys just how Unrevealing visualizing so I'm sure we've all seen some excellent And you know infographics containing network visualizations and they can be quite good as public communication tools But for the hard graft of understanding the properties of a network and visualization is really insufficient And is really unnecessary in lots of cases So the first bit of analysis we want to do is get a sense of the size of the network and we saw some of these figures already Figure so we can get the same information again, we can get the basic information Our network has 83 nodes 640 edges or connections and on average a funder is connected to 15 other funder organizations But that's just a really basic stepping stone to more interesting measures of analysis So the next thing we're going to look at is what's called the degree distribution So this basically shows a histogram Of all the different connections in the network So as I said, we can clearly see that some funders had lots of connections, you know dozens of connections Some funders had none and a lot of funders had you know two or three or four or five Different connections so we can plot the degree distribution which gives us a sense of the Distribution of degrees in the network, so it's called degree what that basically means is connections or ties or edges It's just a different word for it So we can see here in our visualization that a very Well, let's say we'll start on this end so a very small number of funders so about two or three funders Have between 50 and 60 connections So they're very very small number of funders in our network and who are intensely or very, you know Considerably connected to other organizations, but typically what we see is you know the vast majority of funders have you know fewer than 20 Connections and that's still a lot. I think for a real world, you know social network to have so many ties or connections in the network and but again, you can see that they're about, you know, 26 27 Organizations who have you know one connection? So again, we can see that you know the distribution is heavily skewed Very small number have a lot of connections and most have you know reasonably few connections We can get a sense then of how many isolates or how many nodes that have no connections exist in the network By calling on the number of isolates and methods in the networks package So there are eight so there are eight funders in the network who are not connected to anybody else So that leaves us with 75 funders who share some connections between each other The next thing we want to look at in terms of the network is its density So what we're doing here is we know how many connections exist in total. There's 600 and something connections We've then looked at how those connections are distributed a very small number have lots of connections most have you know Few or none no connections Now we want to take that information and produce a summary statistic known as the density of the network So the question we're asking here is how cohesive or how dense is this network? Put another way how many of the possible connections that could exist in this network have actually been realized So there's lots of nodes these nodes could all be connected to each other It could be a fully, you know saturated network where every funder is connected to every other funder That's unrealistic. So we want a sense of compared to that picture How many connections did we actually see? realized So again, the networks package is very concise. This is really good. It's a good feature of this Python package We just call on the density function and it gives us our summary statistic So we can see that the network density statistic gives us a figure of point one nine if we round up to two decimal places We can interpret that as about 19% of all possible connections that could be formed in the network have actually been Realized and again, you'll probably agree with me that that's quite a lot for what some real-world data so funders this gives an idea of the fact that What's probably happening is funders are deliberately targeting the same organizations and again that makes sense It's a COVID-19 response fund It's a program to support certain organizations So we would expect to see some overlaps of multiple funders and providing grants to the same organization So again density is a really good overall summary It tells us of all of all the possible connections that could exist and how many have actually come to pass Taking the same, you know information again, what we want to do is have a look at how Clustered the network is so we know there's lots of connections and we know what proportion of those connections Have actually been realized and now we want to get a sense of To what extent our nodes in the network clustered together. So put another way what we wanted to say is Do we see groups of funders coming together? So is our network characterized by you know Pairs of funders, you know connecting to each other, but there's never more than two funders connected Okay, so we don't see any triangles or quad triangles or anything like that and if a connection is this it's between two funders And that's as far as it goes So we can get a sense of how clustered the network is using something called the transitivity measure So to give you a sense of what we mean by transitivity And as I said previously your network could be characterized by you know, lots of connections between pairs of nodes So here we have a kind of a fictional example here We have three nodes and we can see that two connections exist between these three nodes But what's missing is a connection between these two here So if that connection was realized we would see what's known as a triad in the network So a trio of nodes and who are all connected to each other So the clustering measure that we're going to Produce just now gives us a sense of the probability That when we see this situation here That this happens next so if two if there's a group of funders and there's connections between some of them What's the probability that they then say oh actually there's one connection missing let's form that connection So that's what our transitivity measure gives us Because it's an idea of how clustered are the probability of groups forming and in our network and it specifically Relates to triads so groups of three nodes What's the probability of closing you know of forming a triangle of closing those connections? So you can see our transitivity measure and our clustering measure is quite high It's a point four nine if we round it again What this means is forty nine percent of possible triads have been realized I'll just go back again So when we see this situation here Forty nine percent of the time This is what happened so essentially half of the time we see triangles formed and when we have this situation Just here so again, that's that's that's quite high and but as I said previously This is quite a targeted package of support and again, you know if two funders are supporting an organization There's a reasonable likelihood that the third one and would also come in and also support that organization So we can see already that our network is you know, it's quite you know dense. There's You know a lot of organizations are getting funding from the same funders. It's quite an interesting real-world network so far So the next thing we want to do now is if we think back to our visualization of the network We could kind of get a sense of you know, if you were to journey from one side of it to the other How far is that? journey So how many you know, how long would it take you essentially to go from the left-hand side of the network to the right-hand side? Or from the top to the bottom and it's not really important how we orientate ourselves with that It's just more important to think you know, how many steps does it take to go from one side of the network to the other? So diameter just like in Mathematics, it's you know, it's what's the what's the length of the network from one end to the other? So we can try and calculate the diameter measure For our funder network and if we do that in this case you can see we get an error term So what happens here is if we want to say well, how far is it from a node on the left-hand side to a node on the furthest right-hand side? The problem arises if we have isolates in the network and we've seen that we have eight So in graph theory It's just impossible to calculate how far a node is that's isolated from a node that's connected on the left-hand side Because if a node doesn't have a connection to another There's no way of getting to that other node. It's infinite the distance and that exists between them So we need a little bit of a workaround when we calculate the diameter of a network. We need to do something Which is basically breaking our network down into a component and I'll explain what a component is Just immediately following this measure. But now that we have a Component in our network now we can calculate the diameter And the figure we get back is four So what this captures is there's basically Four steps and on the longest journey in the network. So from one end to the other it takes four steps So basically that's four connections between a node situated on the outer Extreme of the network to a node at the opposite Outer extreme of the network and that's relatively few Okay, so if I'm a node at one end and I want to you know get information to a funder Who's over the other end of the network? It's going to take me four steps Basically, it's going to take me four other funders to go through to pass the message On so when you think of diameter and we're going to look at this in a little bit more detail in a moment You know think of six degrees of separation, you know, how far apart our nodes in the network I'm basically the furthest distance in the network. So this is the diameter is four So that's the basically the longest journey it takes to go from one node to the other So I just mentioned components in the network when we were measuring the diameter a Component basically is a subset of the network So it's a subset of nodes and it's a particular subset where every node is connected to every other Now this is either directly or indirectly and we'll see how that looks in just a moment Networks in the real world tend to have more than one component. So basically there's There's a handful if not more Of kind of subgroups within your within your network who are all connected to each other the key point with a component is By definition, it cannot be connected to any other component Because remember it's a situation that arises when that all of the nodes are connected to all of the other nodes in a component By definition that means they can't be connected to any other nodes in a different component So these are kind of subgroups that have no interaction So if a network has five components those five components are distinct and there's no connection Direct or indirect between those components. They're kind of like islands if you want to think of it that way So let's explore how many network and components there are in our funding network So we see that there are nine Different components in our funders and network But this is actually a bit misleading So this is a little thing you need to watch when you use the networks package in Python Isolates are counted as components when you calculate this measure here So as we saw previously an isolate is a node with with no connections So basically Python treats that isolate as a component, you know itself This would make sense in the small number of situations where a node, you know and Can be connected to itself. So there's something called a self loop in a network. That's where I'm connected to myself Somehow and I can't think of an example right off the top of my head, but well, if we think of Twitter if you're familiar with Twitter and You can retweet and you can like your own tweets So actually that's an example of a connection to yourself. So you can republish or repost something you've previously Written and if you do that that obviously makes sense then to say that you're your own distinct component But in many other situations, it doesn't really make sense to say well, I'm connected to myself. Therefore I form a component We can confirm what I've what my suspicion was that really what we have is one large component and and eight isolates and you can see here what I've done is I've taken out all of the components in my graph and I've converted them into networks themselves and I printed the same information and and as you can see There are nine components But all of them bar one Identified the isolates. So basically in my network, I have one large component where everybody's connected to everyone else and the exception are The eight that have no connections to anybody else So if you want we can visualize the largest component obviously it looks very much like the overall network Except this time you can see there's no isolates. So there's no dot on its own Without any lines going to it. Those have been removed and this is the this is essentially the network So if you were not interested in any of the isolates and this is really in the network We've got 75 funders and the you know, 600 and something connections and that exists So why it's important to identify components, this is not really an academic It's actually crucial as we've seen a lot of the network analysis measures and can only be calculated Using components because as I said, you know, it makes no sense to say What's the distance between an isolated node and one that's connected, you know over the other side? you know, it's just It's infinity the distance between a non-connected node and another and there's lots of other measures as well Particularly if you want to make comparisons between networks Usually those comparisons are made between the largest components in each network also Continuing with our theme of breaking our network down into sub groups And there's a term which you're probably familiar with in every day and language, which is a clique So a clique is very similar to a component in that it represents a subset of the network And so ie it's where every node is connected to every other But the crucial difference compared to a component is that all of these nodes and Share direct ties so each node has a direct link to every other node in the clique While as we saw with the components everybody is connected, but you know, there's not the same number of connections So we can see that this node is connected to this node But only by going an indirect route, you know to hear first somewhere here here here and then you know It has to make a journey. It's indirectly connected Clique on the other hand would mean that there has to be a direct connection for between these two For them to be part of a clique So again similar to what we did with components we can say right Tell me the number of cliques that exist in the network and there's quite a number So there's 251 which is quite a lot, but a clique, you know has a very low Barrier to entry, you know two or three nodes on their own if they were all connected like as we saw in a triangle Well, that's a clique. So every node is connected to every other in the triad. So that's an example Really what a clique is, you know, we want to look We want to look at larger groups of nodes. So are there groups of 10 12 15 funders who are all connected to each other So they all fund the same organizations So what we'll do is we'll now pick out the largest clique just like we did for the largest component and And we'll take a look at its summary statistics. So the largest clique So the largest subset of connected nodes in the network. There are 12 and there are 66 connections and the average number of connections is 11 and in this case because it's a clique that means that each of those 12 nodes has 11 connections Not that it's an average and it's skewed by some having a lot of connections and some having none By definition, it wouldn't be a clique unless they were all connected to each other And I think this is better conveyed, you know, visually and So as you can see compared to a component every single node in the clique is connected to every single other So there's basically 11 lines going from every single node to each of the others in the clique And obviously then using the concept of the clique We can start talking about, you know, that there are certain subgroups in the network, and you know who are You know a lot more densely connected. We can say then that that's conditions for, you know Very efficient information sharing or we can say that, you know, that forms a very coherent subgroup You know, or maybe that's a Homogenous group of funders. Maybe these 12 funders all target, you know, the relief of poverty or something You know, there must be some You know, connection linking them, of course It could just be as well random chance that we see cliques like this in the network also So we need to keep that in mind as well So that's a run-through of some of the most common, you know, kind of basic and intermediate Network level measures. What if we now focus on the nodes themselves? So we have funders in the network. Can we describe, you know, their properties as they exist in the network? So how connected are individual funders in the network? And we can look at some other Measures as well of, you know, how far apart our, you know Individual funders are some, you know, really close to most others or some on the outskirts of the network as well So what we want to talk about first is this idea of, you know, centrality So, you know, you can interpret this, you know, in a geographic, you know, sensor and in terms of geometry, you know There's a center point in the network, you know There's a funder who acts as a hub who's connected to the vast majority of other funders in the network Maybe some nodes act as hubs or brokers, you know Facilitating relationships between other funders in the network, etc So it's a measure of how important a node is in a network and there's different ways of measuring centrality We're going to look at three different measures So I gave one example, a node may act as a hub, so lots of connections run into that individual node Some nodes act as brokers, so some funders sit in between the relationships of other funders I will see how that looks visually as well And some nodes may just be Proximate or positioned very closely to most other nodes And being central in a network usually confers a lot of advantages to a given node So there's different measures of centrality, so we're going to look at three just now So the first we're going to look at is degree centrality I remember that when we say degree, we're talking about the number of ties or the number of connections In the network overall, but now that we're talking about nodes, we're talking about the individual number of ties per node So for a given funder, you know, does it have one or ten or sixty connections, for example And degree centrality is a measure of how popular or how well connected a node is in the network And it's usually standardised as well Because if you had a network where there were, you know, thousands of nodes and an individual node, you know Had a hundred connections, you could say, right, well, that's quite important But if you're, excuse me, if your network only had a hundred and ten nodes and a given node had a hundred Connections, clearly that one is more central or important Than a node in a different network with the same number of ties, for example We don't need to worry too much about that, just realise that, you know There's some standardisation or normalisation applied to a lot of these measures So what we do with our Python code is we basically, you know, we capture the number of connections per funder Then we sort them and I just ask, you know, Python to print the top 20 So here are the top 20 funders by the number of connections they have This is the official ID of the National Lottery Community Fund So this funder has connections to 61 of the other 82 funders in the network So this is an incredibly well connected funding organisation The second might be children in need, I'm not quite sure You could look all these up yourself, the original data set, as I said, is in the data folder In the repository, which I'll show you in a moment So here we get a sense of who is the most important node in the network Well, if we use our degree centrality calculation We can see that it is the National Lottery Community Fund But as you've noticed, this is the raw measure of the number of degrees Degree centrality, as I said, is normalised, it's calculated a little bit differently But we still get the same ordering of results So the degree centrality measure here is 0.74 So again, the best connected or the most important or the most popular node in the network Is this one here, which again is the National Lottery Community Fund And conceptually, the best connected nodes in the network can be considered as hubs So you can consider them as the most popular But you can also consider them as hubs where a vast number of the connections run in or run through a given node There's a different measure of centrality which we alluded to recently Which was this idea of brokerage in the network So this is the idea of whether there's a node in the network that facilitates indirect connections between other nodes So we know this National Lottery Community Fund is the best connected But does it also sit in the middle of all the other connections that exist between funders in the network as well So let's consider a very simple example before we calculate the measure Here we have a very simple three node network We've got three individuals here, Josie, Jane and John In this network, Jane acts as the broker So if Jane wasn't in this network, these lines here wouldn't exist So there's no direct line between John and Josie So if Jane didn't exist or removed from the network, this indirect connection here would disappear So this is a very simple example of how Jane acts as a broker in the network So we can apply the same idea again to the funders network We can say, you know, what are the top 20 broker nodes in the network And probably unsurprisingly, there's a lot of consistency between the degree centrality measure and the betweenness centrality measure Where again, we see the National Lottery Community Fund has the highest betweenness centrality score Which again, basically it's the proportion of times that the lottery funder sits in the middle of these direct ties So if you swap out Jane with the National Lottery, it's the proportion of times you see the National Lottery in this role here So there's a lot of coherence, we can see that basically the ordering of funders in the network by the betweenness centrality measure is much the same as the degree centrality measure also So again, so we've spoken about who is the best connected, the most popular node in the network Now we've seen the one that acts as a broker, the one that kind of facilitates the indirect ties that exist in the network And this is related to something you've probably heard before, which is this concept of a structural hole So this is a scenario again, where there's a lack of direct contact or a tie between two entities And therefore a broker can fill this gap and ensure a connection forms between two nodes Again I've just kind of referenced it a moment ago, what would happen if Jane right here, what would happen if she disappeared If she disappeared, there would be a structural hole between these two nodes here So again we can ask Python to calculate the structural hole constraint measure And again this just tells us which organizations act as a broker So basically if this organization here was taking out of the network we would see quite a few indirect ties disappear So again it's a related idea, it's almost the inverse of what we were talking about betweenness centrality This is where we see the potential for structural holes to occur So again if you were to remove these organizations what would happen, we'd see lots of ties disappear in the network And our final measure of centrality is known as closeness, and this captures the idea of proximity in a network So basically is there a node in the network that's situated to most of the others So basically there are very few steps from one node to a set of other nodes If one node is very proximate or very close to many others then it occupies a strategic position in the network Because that node will have a lot of influence over the nodes it's connected to And it allows it to diffuse knowledge or materials or resources throughout the network very easily Taking a simple example if you had one node that was at most two steps away from the majority of other nodes And then you had a node that was on average five steps away That would be less close than the node that is on average two steps away Because that node in just two steps like a friend of a friend can get information or materials or resources to those nodes a lot easier than the other one And again you can see there's thankfully a lot of similarities in both the terms of how we're calculating the different centrality scores And what they actually mean So again unsurprisingly the National Lottery Community Fund is also closest to most other nodes in the network Again that means it's in a position to diffuse or to spread resources to a lot of other organizations also But there's two others as well that fulfill this role again So basically there's a high degree of consistency in our network between the best connected funders which we measure using degree centrality Those nodes that have greater potential for brokering connections which is our betweenness centrality measure And the nodes that are most proximate are situated very close to lots of other nodes which is our closeness centrality measure also So the final thing we're going to look at as part of our node level analysis and it's the last thing we'll do in general is this idea of distance And we saw one measure earlier when we looked at the network overall we looked at the diameter so what's the furthest distance between a node on one side of the network and a node on the other side We can now calculate this measure basically for individual nodes themselves so for a given node how far away is it from every other node in the network And distance measures basically gives a sense of how reachable a given node is So again if we think back to nodes that share a direct tie so if there's a line drawn between two nodes we can say they're separated by a distance of one So there's one step between those two nodes If there are two nodes that possess an indirect tie then they're separated by a distance of two or more So this is again the idea of a friend of a friend you're separated by a distance of two It takes you two steps to go from you to your friend who then transfers the information to their friend And of course you can have distances of three or four or five etc So the measure of distance we're going to look at is something called the geodesic distance Basically this represents the shortest or optimal or most efficient path between two nodes in the network So basically it's the shortest distance it's the shortest number of steps that exist between any given pair of nodes in the network So the distance we can say you know what a path actually looks like between two nodes So we can consider a starting point up here so this is let's say that's a funder in our network and this is a funder that we want to reach Then we can say right well this funder needs to go here so that's step one, step two, step three, four, five, six, seven So we can see that the distance the geodesic distance between this node here and this one here is seven So lots of other ways of getting to that node you know not in this example but let's say this node was connected to this one and this one was connected back again So you could say we go from here and then we go here and we go back around and back around etc and that's obviously adding extra steps So we're not interested in all the different ways you can get between two nodes We're interested in the quickest shortest number of steps between a pair of nodes in the network So we can see if we were interested in two particular nodes so let's say this is one funder and it is this is one funder in our network Here is another and here are all the different ways you can get between them So there's no direct line between the two of them so there's no distance one between them so they're separated by at least two steps or more We're going to calculate what it is exactly in just a moment but you can kind of get a sense visually of the number of steps So this funder here has about five or six different connections so you could take this path to here and then maybe to here and here and here and then out etc But in all those different configurations all those different journeys there's going to be the shortest most efficient path between them And here it is so we can ask the networks to calculate the most efficient path between this funder here and this one here And here's the path actually written out so we start with this person Well this organization and then we need to go to this one then to this one and finally we reach our destination here And as you probably noticed I don't even need to actually calculate the geodesic distance We know that the shortest path is you know three so step one, step two, step three Well that is the shortest path there's actually multiple ways of doing that so there's multiple shortest paths between those two nodes So for example again here are our source and target funders so you can go to this one second and this one third Or you could go to this one second and this one again etc so there's basically eight ways of going between these two nodes That's we can consider the most efficient way But just in case you didn't believe me or just by looking at it visually you can actually ask it to calculate what's the length of the shortest path So the geodesic distance between these two nodes is three So I was going to cover this but actually I'm going to take questions instead because I see we're coming up to the end There's a couple of techniques that you've probably heard such as exponential random graph models, stochastic actor oriented models, relational event models etc In this notebook that I've put together we'll be sharing the link with you and I'll show you just now actually as well where the link is There's a bit of material on what one of these advanced approaches is It gives you a look at some data and a paper that kind of explains what's going on But as I said at the beginning if you are interested in more advanced statistical modelling approaches to social networks Please let us know and we can try and put those training materials together