 Okay, good afternoon everyone. We'll begin. Welcome to our social network analysis webinar where we cover the fundamental concepts of this methodology. I'm Dermot McDonald, I'm a research associate here at the UK Data Service in the University of Manchester. I'm joined by my colleague Alice Bloom who's listening in and will be keeping an eye on things for us. Let's get going. So today we'll cover social network analysis. So we're going to set the building blocks for allowing us to manipulate and analyze data. There will be a little bit of analysis and a little bit of a coding demonstration near the end, so that's quite good, but mainly we'll focus on defining and clarifying some of the key concepts and terms associated with social network analysis. So we look at some of the key concepts and we'll have a look at how networks are represented in data form and how are they actually stored, how do we actually visualize network data, and that area is actually really interesting and quite distinctive from what you're used to. Excellent. So to give you the why, so why are you actually here for this training series, and this one in particular, I quite like this quote by Scott, which many who have seen the potential offered by network analysis have found it difficult to come to grips with the highly technical and mathematical language that necessarily characterizes much of the discussion in the technical literature. I'm sure maybe some of you have come across social network analysis before, social networks, you may have seen some of the terminology, transitivity, assortitivity, nodes, ties, edges, arcs, vertices. There's a lot of technical abstract language and terms that are used. So today in particular, and this entire training series is about demystifying all of these technical abstract terms and translating it into very practical applied knowledge that you can use in your social research. So that's what you need to hold me to account for. So I said that a lot of you will probably be aware of social networks and how they look and how they're visualized. Even in the fictional social world, social networks are a feature. I'm not much or any bit of a Star Wars fan, but I think this is quite a good example. Here we've got a social network of characters in the first six movies, I think, not the newer ones. So each of these circles represent some of the characters, and these characters are joined by lines. And these lines exist if two characters speak in the same. So you can see that there's definitely a line between Luke and Chewbacca, and it's quite a thick line, which means that Luke and Chewbacca spoke, I don't know, but spoke in quite a lot of the same scenes together and to each other as well. You can kind of see on the outer, the periphery of the social network that there's lots of characters who are not really worth naming and are only connected to one other character, et cetera. So even in such a trivial example, and apologies, Star Wars fans, we can see some of the core features of social networks. We can see that there are entities that form the network, and then there are patterns in the connections between these entities also. So for the rest of this, we're going to use some real data, some real social networks, but we're also going to explore many of the same concepts that apply to fictional real social networks. You name it. So let's go through some of the key aspects of social network analysis. So what is it? So how would we define it? So it's essentially a methodology. It's quite a broad, rich methodology. It's best described as a toolbox. I quite like that. It's a very practical methodology. It allows you to measure and describe and to analyze patterns in social structures. So basically, if we think of the way in which individuals are embedded in societies or embedded in organizations, embedded in events, we can analyze those connections between people to other people, people to those organizations, and as a result, to other organizations. So if we think of the social world as a network of connections, then we can apply social network analysis and the measures and algorithms and the methods that it provides to understand these patterns. A relation is a distinctive type of connection or a tie between two entities. A very simple example we can all think of is a familial relation. So a brother and sister share a sibling relation. Cousins share a cousin relation. Friends, a friendship tie, colleagues, a collegial relation, etc. So if you hear a relation, think of a type of connection or tie that exists between two entities. And then these relations that constitute the building blocks of networks. So if networks are constituted of relations between entities, social network analysis concerns itself with the analysis of those relations and the patterns that form between our units of analysis. So why should you consider SNA? It's not quite text mining or it's not quite web scraping in that it's not the most popular zeitgeist method out there, but it is increasingly popular. It has a long running history in the social sciences going back to the 1920s, 1930s. But why should you consider it for your research right now? Those of you who are quantitative researchers will recognize this incredibly simple statistical model. So we have an outcome here, which we call Y. And we can explain or predict Y using an initial guess, an explanatory factor and a little term that captures the fact that we can't perfectly predict an outcome. There's always random luck or random chance. But in essence, we've got an outcome, something we're interested in explaining. And we've got some kind of factor or we've got some kind of variable that we think explains that outcome. So how does this apply to social network analysis? Well, maybe the social network is the thing or the phenomenon that you were trying to describe and explain. So it's your Y variable in the framework that I've just outlined. So for example, you might be interested in political networks on Twitter. So there may be different Twitter accounts that tweet certain political messages on Twitter and they retweet other accounts and all that Twitter activity forms a network. So the thing itself is a network. You're not trying to coerce it to fit a network structure. If you're a social scientist interested in urban planning, for example, you might be interested in the London Underground network. That is literally a network of train stations and rail lines, for example. COVID-19, which of course we do need to mention to some extent, we see with kids going back to school that they're forming bubbles. So kind of units of kids that can't really interact with other year groups. So now schooling has become a network. So who you interact with, which teachers, which pupils, they form individual networks, which sit within an individual school, which sits within, you know, a local authority. So lots of social phenomenon nowadays are just networks. They're defined as networks and they operate as networks. From another perspective, maybe it's features and properties of social networks that you're interested in. And if you know how well somebody is connected within a network, that then helps you explain a wide variable. So it helps to explain an outcome. So there was about 12 years ago, there was a really interesting literature review done by some public health scholars, I think at Harvard. And they were looking at the impact of a person's social network on a range of health outcomes, mental health, physical health outcomes. And they concluded that it is vitally important, you know, how well connected somebody is, how isolated somebody is to a range of health outcomes. Now that sounds very obvious if we talk about mental health, if you have a wider network of friends, maybe you're more active, more stimulation, and you've better mental health outcomes. But they even found that your social network characteristics actually predicted the spread of biological negative health outcomes. So obviously very obvious things like catching diseases. But they were actually, you know, physical negative outcomes that arose from people's social network characteristics. So if someone was particularly isolated, that tended to be linked with really poor physical and mental health outcomes. So I've gone on about that, but it was a really fascinating study about 12 years ago. And I just couldn't believe how important, you know, some aspects of a person's social network was for their physical health, it was just incredible. And the co-authors, you know, summarize it here that an enormous range of physical and mental health issues are really linked to the number of connections a person has and the size of their social network. So that was fascinating. So why SNA? Maybe you have something that is a network and you want the appropriate tools to analyze it. Or maybe you've got an outcome and you think it's important to know how well connected somebody is. And therefore, you need measures and calculations that tell you something about a person's social network. And you can use those variables and factors to explain something else. So when should you use it? And so I'm not really talking temporarily here, I'm talking about identifying the opportunity and the appropriate conditions to use it. So very simply, if you are dealing with relational data, social network analysis is appropriate. What do we mean by that? So we just mean data or a data set that captures relationships and connections between your units of analysis. So your units of analysis are just the things and the entities that you're interested in analyzing. So for me, I'm a charity researcher, the units of analysis and my studies tend to be individual organizations. They'll be the people you interview, they'll be the communities you do some ethnography in, etc. So if you want information on how your units of analysis are connected and related. So this is in contrast to what most of us are used to, which is attributional data. So you've got units of analysis, and then you capture characteristics, demographics about those units of analysis. So let's take a quick look at very simple fictional example. Here we have some attributional data. Our units of analysis are individuals. So we have five here. And then we capture information on their attributes, their sex, their age, their employment status. If we had relational data on the same units of analysis, this is what our data set would look like. So we can see here, we still have the same units of analysis, every row is a person. But now every column is a person also. And the values for each column and row, tell us how these people are connected. So John and Joan are friends. John and Jenny are colleagues. And John doesn't know Juliet or Jack. And you can read down the way or across the way. And it doesn't matter in this particular example. But you can see how the data structure, and so we're talking about two spreadsheets here, if that's how you want to think about it. But the type of information we're collecting is different. On the top left, attributional data on the bottom right, relational data. And in the next webinar, I'll show you how to convert this top left one to this bottom right. And finally, what are the typical kind of steps in a social network analysis? So what you typically tend to do is you try to identify and visualize patterns of relations between units of analysis. So like a very silly Star Wars example earlier, and that's kind of a thing you could do and typically do do in a social network analysis. You also want to examine the structural properties and characteristics of these relations. But this I mean, you're interested in measures of how well connected somebody is in a network, you're interested in how many strong ties there are in a network versus how many weak ties there are in the network. And so on lots of different measures capturing the patterns and the relations in a social network. And then as I said, you might want to take some of these measures. So you want to, you know, describe your network. And are some of these network characteristics associated with outcomes experienced by your units of analysis. And so we'll go through an example later where we look at how charities are connected to each other, are the best connected charities, you know, the ones that raise the most money from the public, for example, these are questions we can begin to ask with some social network data. So as a result of all this, so mainly if we have relational data, instead of attributional data, then SNA requires distinctive data structures, methods of analysis and data visualization techniques. So this is going back to why we're doing this training series is that some of the data may look familiar, some of the visualizations may look familiar. But the way you organize your data, the way you analyze it and the way you visualize it are quite distinctive. And there's new terms, abstract concepts, and that require some explanation. And how do we implement it? So just a very basic framework for doing this type of social network analysis, we always begin with a carefully articulated research question. And that research question either focuses on explaining a social network for its own purposes. So again, it's the thing we're trying to explain, or we want to understand the social network. So we can use that understanding to predict or explain something else. So then we want to say, okay, we want some network data. And what units of analysis and what types of relations are we interested in? So who is connected in the network? And which relationships matter? Because people are connected to each other in multiple ways. If you and I work together, we may also be friends, we may have gone to the same university, our spouses may be friends, etc. So we can be connected in multiple ways. And so at the beginning, you think, who are we interested in? And how are they connected? Knowing that we want to find or we want to create a data set that provides this relational information on the units of analysis. As I say, we've got, you know, lots of social networks that are opening up their data, Twitter in particular, Instagram and Facebook have tightened up a wee bit, Spotify is a good one that provides lots of open access data, it's really, really good. So then we need to collect some data. And then finally, we want to summarize the network and its key features. So how big is the network? How dense it is? Is it how cohesive? Are there holes in the network? Are there certain people playing certain roles? Social network analysis just provides a plethora of methods and measures, which you'll see over the next two webinars. It's crazy. You can, if there's something you can think of analyzing about a network, there is a method and there is some Python code that allows you to do that. So let's now define some of our key concepts and get these locked in in our heads. So if you were only to take away one single slide in your head from the session, it would be this one. So a network, whether we're talking about a social, physical, biological network, whatever it is, you can think of an information network, it's constructed from two main building blocks. There are the entities that are or could be connected in a network. So these are the things, the people, the countries, the events, the computers, the train stations, whatever, these are the things that can be connected. And then there are the connections that exist or could exist between these entities. So you'll notice I've put in the qualifier entities that are or could be connected and connections that could exist. And that's important to make that distinction. We're not just interested in all the people who are connected, but we're also obviously interested in the people in a network who are on their own, but who could potentially be connected to others through the network. So then a network really is an aggregation or it's a collection of all these entities, and the connections that exist between them. So for example, a family tree is a network, very simple one. It contains individuals, though maybe if you're that type of person, your dog is included in the family tree, that's up to you. And these individuals are related through some type of familial tie. So the sum connection that joins these people together. This is something I found on the BBC website quite recently. It's the blood and marriage connections of the UK royal family. And a family tree tends to be hierarchical. That's a logical way of, of organizing it. But just like the Star Wars example we saw earlier, you could arrange this, you know, in a star shaped or circular shape, because it's the same components, right? We've got entities, members of the royal family, and we've got connections between them, parents to their kids, kids, their grandparents. So we've got a queen up here is obviously connected to her four kids, but through one of them to these grandkids and through both grandkids to different sets of great grandkids. So a family tree is a very recognizable, hierarchical type of network. But let's think broader than the royal family, because there are lots of different entities. So to use some kind of formal language, we're going to refer to the entities in a social network as nodes. So nodes is a term that comes from network theory, which social network analysis is, is draws heavily upon. And you may have also heard nodes referred to as actors or agents. So that's a very specific social network analysis term. And vertices, vertex, that might be a term you've heard, that's from geometry, or you may have heard of points. So that's from graph theory and mathematics, and points and lines, you know, connected together. For consistency, we're going to say nodes. So the things that could and are joined together in a network are known as nodes. And these nodes, as I said, it depends on your research study, individuals, organizations, countries, animals, events, computers, yeah, you name it. If you can define it sensibly, any type of entity can be a node in your network analysis. I read a really interesting paper recently by someone who was at Manchester, actually, and it looked at the food sharing network of I think jackdaws, so type of crow. And it was fascinating. So it showed the connections between certain crows. And it had the timing. So it showed which crow fed the other one first, and then how long it took for that crow to reciprocate an incredibly complex, rich network from something like 20 crows observed over a week. It was absolutely astonishing. So there's a lot of potential. And if you can correctly define your nodes and apply social network analysis, it's it's it is really interesting. So there are different types of nodes in a network. If you have a particular interest in a single node, that's tends to be referred to as the ego or the focal node. Now, that focal node may emerge from the analysis. So maybe you have no idea who the main player is in a network, and you do some analysis, and you visualize it. And suddenly, you're like, Oh, that person is clearly connected to everybody else or to most other people. That's the main person. But what you tend to find is that a priori, you just define who the ego is. So you may be interested in the 100 largest companies on the UK stock market, for example. So you can say the ego node is the company ranked number one. And then you're interested in how that company is connected to others. And or you could say I'm interested in company number two, for example. So it's up to you to define. So if you have an ego or a focal node, then all the other nodes that are immediately around that node are known as altars. So you've got ego is our main focus. Altars are the secondary focus. And I'll show you a quick example in a second. And then if we take into account the different ways in which nodes can be connected, then we get different types and different flavors of nodes. So if you've got two nodes that are or could be connected, we call those a dyad. And if there are three triad, and I'm sure these terms are now becoming obvious what they mean. If you have four, it's a tetrad, five, a pentad, six, a hexad, and then I run out. I can't remember what seven is. So let's take a quick look at what an ego network looks like. And you've probably seen visualizations like this, you've got a keynote in the middle. And then you've got the altars are the secondary nodes that you're interested in and spread around. So here we have, let's say this is just the CEO of a company or something or, you know, the matriarch in a family or just somebody who is the center of a network. And then you can see who that person is connected to. And most interestingly as well in an ego network is you're interested in how the altars are connected to each other as well. So you can see that most altars are only connected to the ego, but some of them do form little networks of their own. Also, a dyad is very simple, you've got two nodes and they're directly connected. A triad, like a triangle, three nodes, three connections between these people. And that brings us neatly on to our second major building block in a network, which are the connections. So connections are relations and we're going to formally call ties in a network. You may have also heard ties called links or lines or edges. And we'll see the term edge used as well in Python. But we're going to use tie ties a bit of a broader term to use and it goes quite well with nodes. So we have nodes and ties in a network. There is a multitude of different types of ties. So just like we can have lots of different nodes, and those nodes can be connected in whatever ways you can think of in the social world, people can be related through blood, friendships, attending the same, let's say nightclub to use a COVID-19 example again, people are part of the same gym, living the same halls, work for the same companies, etc. And it's also possible for two entities or two nodes to be connected in multiple different ways. So as I said before, you know, two co workers may also be married and may have gone to the same university and go to the same gym and yeah, whatever. So people can be quite densely connected to each other. So the key point is that it's really important to acknowledge that your data will almost certainly only capture a sample of all the possible ties that exist between your nodes. So again, Twitter data, very rich, we'll explore that in two weeks time, but that's still only captures the certain ways in which people interact and Twitter accounts interact on Twitter. So two Twitter accounts might, you know, retweet each other's content all the time and share links on all this, but they may also be really densely connected outside of Twitter, maybe those two people actually meet up in the real world, work together, etc. So a data set will not do everything, it'll only show you how people are connected in a limited number of ways. So ties then have two dimensions that help us distinguish between them. The first we'll call a numeration or strength of the type. So firstly, a connection between two nodes can be binary. This simply means people are friends or not people are married or not. And companies are, you know, in the same industry or not. So we're not interested in how strongly people are connected. Just simply our two people connected. Yes, they are. They have a binary tie. But if we are interested in the strength of the connection, we can have a valued tie. So maybe two people are friends, but maybe compared to two other friends, the first group contact each other 20 times last week, while the other group of friends only spoke once very briefly on the phone. So we would say the first pair of friends are more strongly connected than the second, for example. And I realize my words are not visual, but I will show you an example of those types of ties. So basically a value tie, it can be assigned a numerical weight or a value to show how strong that tie is. And secondly, we have a directionality dimension. So ties can be undirected. So I'm married to my wife. That's a binary tie, but it's also undirected. It's not like I married her first, or she married me first. There's no flow. There's no direction of the tie. The tie is just simply there. It didn't originate with me. It didn't originate with her. And the tie is undirected. But of course, you can have directed ties. So if you donate money to a charity, for example, you're connected to that charity, but the connection began with you donating money. Or in another way, maybe the charity contacted you first. And as a result, you then gave money. So then you'd have the direction going both ways. So there's a tie that exists between you and the organization. And, and there's a direction either going from you because you donated first, or they solicited a donation. So therefore, the tie originates with them. So that's a bit abstract. Best thing. Let's look at some examples. So let's say we have four people. The four people are friends, but some friends contact each other more often than not. For example, so let's say we have some undirected and binary ties. So we have our four people, John, Josie, Jane and Jim. And in the previous week, John spoke to Josie, indicated by the undirected and binary tie here. John also spoke to Jane, but he didn't speak to Jim. Similarly, Jim has spoken to Josie and Jane, etc. So there's no indication of how strong the ties are. And there's no direction. We're not saying that Jane tends to contact John first. But we can introduce those elements as well. So let's stick with undirected. There's no source of the friendship. And two people are connected. But let's try and distinguish the strength of those connections. So these numbers refer to how many times those people contact each other in the previous week. So Jane spoke to Jim 12 times. Josie spoke to Jim 20 times. Hence why that line is shaded thicker. John spoke to Josie 10 times and John spoke to Jane five times. So again, there's no direction. There's no source of the contact as yet. Two people are connected. But now we can try and distinguish between how strong those connections are. So let's introduce some directionality. So again, let's just say our two people connected. But let's say which person tends to initiate contact. So John tends to be the one who initiates contact with Jane. And he's the one who it tends to initiate contact with Josie. In Jane's case, she tends to initiate contact with Jim. Well, Jim doesn't initiate contact with anyone. You can see there's no arrow going from Jim. So again, these people are connected. But there's some directionality. John initiates the contact. And again, we can combine these dimensions to create these four types of ties. Here we have a directed and valued tie. So again, who contacts whom first? At this time, weighted by how many times they've contacted each other. So Jane tends to contact Jim 12 times, for example. So far, when we've spoken about connections and ties, we're implicitly meaning direct ties. Okay, so two people are directly connected, directly share a relation to some extent. So like here, so Jane and John are friends. But there are things, there are indirect ties. So Josie is directly connected to Jane. Jane is directly connected to John. But we can say that John is indirectly connected to Josie through Jane. So in, you know, common kind of parlance, we could say that Josie is a friend of a friend. So John is friends with Jane, Josie's friends with Jane. Therefore, John and Josie, you can say are mutual acquaintances or share a mutual acquaintance, or they're friends of friends. I do really regret using all the J words for names right now. But as a rule of thumb, when you're reading books about social network analysis, you're reading papers, you're coming across it. If you see the word connection or tie, you implicitly assume it's a direct connection, people directly share some kind of relation, unless otherwise stated. And if it is otherwise stated, then it's an indirect type. So people can be friends of friends in the network. And this is what Granaveter in 1973 called, you know, weak ties. So kind of mutual acquaintances, people you kind of know through somebody else, those connections can matter also. So we've got direct ties and indirect ties as well. So how are networks represented? So how do we actually store the information? And how do we, you know, visualize the relational data that we've collected? So networks can be represented using three formats. So you've got matrices, we have what are known as edge lists. And we have graphs, which we've seen previously. And graphs are also known as sociograms and social network analysis. These are the visualizer visualizations that we've seen previously. So the first and probably most common type of representation is a matrix, very simple. A matrix is an arrangement of elements into rows and columns. That sounds a bit abstract. But social networks can be represented as matrices also. So every row is a node, every column is a node, and then every value indicates whether a tie exists between two nodes or not. So let's make these kind of abstract data structures concrete. So let's take a small but real social network as an example. So my wife is part of a book sharing network with some of her family members. She can send some books to other people in the network. She can do that unprompted, or maybe someone sends her a book first, and then she reciprocates. So this is an example of a directed binary network. So the book originates with somebody. Somebody initiates the contact by sending a book. But at the moment, we're just interested in the binary ties. So if my wife sends somebody a book, they're connected, not interested in how many books were sent, just the fact that one book was sent means these people are connected. So here we can represent this network using a matrix. And again, a social survey administrative data, data that has rows and columns and cells, which has values for those rows and columns. That's what a matrix is very simply. So if you open up Microsoft Excel, you download one of the large scale social surveys. You know, the data is arranged in matrix format, so don't be don't be put off by the word matrix whatsoever. So here's some, this is a real network, but I can't say for certain that these are real ties, but I've tried to be as accurate as possible. So here we have my wife. And if we read across the rows, these are the people she sent books to. So my wife is the source of the book. And these people are the receivers. And I'm not counting my wife buying a book for herself as a connection. But there are certain instances where people can be connected to themselves. And we'll explore examples later. So my wife didn't send her aunt book. She sent her cousin one didn't send her. Look at my cousin, you can see she sent my wife four books and sent her grants and books, etc. So if you wanted to calculate how many books my wife sent to others in total, it's really easy if the data are in matrix format. We can just find my wife on the row and sum across and we can see that in total my wife sent three books. If we wanted to know how many she received again, the receivers are the targets of the ties are on the column. So if we read down, you can see that my wife received 10 books. So she sent three and received 10. So storing network data in a matrix, it's not just useful for storage or, you know, it's recognizable. It's necessary for actually doing lots of calculations in social network analysis. And again, this is an adjacency matrix. Again, an edge list is a very simple kind of relation or extension of a matrix. It takes all the information we have in the matrix and it just has a list of all the connections that exist. So we can see from previously that my wife sent her cousin two books. So that tie exists. So my wife was a source, the cousin was the target, and the weight is two. So my wife sent the cousin two books. My wife sent my sister, et cetera, and we can read down. So in an edge list, it's simply a list of all the ties that exist in the network. And the ties are represented as pairs of nodes. And again, this data exists in a spreadsheet. And you'll see that when we go through some of the data examples later on and in the rest of the lessons. And third and finally, before we get to our analysis, we can also represent networks as graphs or sociograms. Graph is a very formal thing in graph theory. So in mathematics, a graph is a set of lines connecting points. So that's why in social network visualizations that the nodes are represented as circles. So it's just, you know, it goes back to graph theory, and points are connected by lines. Hanuman and Riddle provide some really good advice and clarity on using graphs and sociograms for networks. So you represent the nodes as circles, and you represent the ties as lines, and you use arrowheads if the tie is directed. And then you can use, you know, things like colors and shapes and sizes to differentiate nodes by their attributes or their network characteristics. So if you think back to the Star Wars example we had, some of the characters, their circles were bigger. And the circles were bigger if that character appeared in lots of scenes and the circles were small if the character appeared in, say, one scene across six movies, for example. And you can also use the same techniques to differentiate the ties themselves. So if you think back previously to our, the four examples I gave of the friendship network, you can make the lines thicker to show that some ties are stronger between individuals. You can use different color lines if there are different types of ties between two people, et cetera. So this is an example of how we can visualize a social network. So this is a real network, but there are fictional ties. So any music fans, this is Bruce Springsteen's band, the E Street band. These are the current members. And I've just made up the fact that some people like each other and don't. So I've made it up that Bruce likes Roy, Bruce likes Max, I think is the keyboardist, Bruce likes his wife, which is obvious, and Bruce likes Stevie Van Zandt, for example. But you can see there's no connection between Bruce and the saxophonist, for example. So we've got nodes represented by circles. They're all the same type of node, member of the band, so I haven't colored the nodes. But for example, I could use color coding to differentiate between different age groups or different sexes, for example. And I've made up that poor Gary here isn't connected to anybody, nobody likes him. So this is an example of how we can very sensibly visualize a social network. Or if we think back to the real network that my wife is part of, these are, you know, the ties that exist. So you can see there's a reciprocal directed tie between my wife and my sister, because they both sent each other books. And you can see that my aunt has sent my wife a book, so a connection exists. But my wife hasn't sent one back in return. Hence why there's no arrowhead here. So visualization can be really appealing, it can be really powerful. You've probably noticed that unless it's a very, very small network, visualization gets very, very messy as we're about to see. So if you're going to do social network analysis, excuse me, focus on the numbers, focus on producing numeric measures and calculations of network properties and leave visualization either to the very beginning or to the very end. It's not a crucial, it's not a necessary activity with social network analysis. But it is appealing and it can be quite nice. So let's end with a quick analysis for five minutes. So I have a research question. Excuse me. So what degree of board interlock occurs in the UK charity sector? So board interlock is a kind of long running phenomenon of interest in organizational sociology. It's the degree to which organizations are connected through shared board members. So if I, you know, act as the director of company A and the director of company B, we can say that company A and company B are connected. So if you think back to our approach, next I need to define the nodes and connections. So I'm interested in registered charities. These are organizations that have charity status and are regulated by the charity commission. And the connection I'm interested in is that they share a trustee in common. Obviously charities can be connected in lots of different ways. They can share office space. They can lobby the government together. They can be connected in lots of ways. But I'm just interested in one way, which is sharing board members. Is there a data set I can get my hands on? Yes there is. There are current trustees of charities who are headquartered in Manchester. So there's open data I can get. And then in terms of analysis, how big is the network? How cohesive is the network? And which charities are the best connected in that network? As well. So let's switch to the live code demonstration. People have probably seen this before. If you're a return, a returnee, I'll explain what a Jupyter notebook is later on or at a different time. But it allows us to mix code narrative and results all in the same document. So let's go to our example. So we want to do this analysis in Python. So there's a couple of preliminary steps. This notebook is available to you where it explains exactly what's going on. So for now, I'm just going to focus on the output. So I won't spend time explaining the individual lines of code, but I can do that maybe either later or I can do it as part of a different training series, if you'd like. So we begin with data on the current trustees of a group of Manchester charities. So you can see what that looks like. So we have a data set with 2700 individuals. Well, 2700 observations and four variables. Here's an example. So we've got a person here. Again, this is real data and it's open data. So hence why I can use the person's name. So Robbie here is on the board of three charities. And here's the unique ID of each organization. So this is what we would call attribution data. It's a data set of trustees and for each trustee we capture some basic information, their name, how many trusteeships and who they're connected to. Because we have this variable here, that provides relational data on charities in Manchester. So because this person is on the board of these three organizations, we can see that charity 101, etc. This one and this one are all connected. So these ones are connected through their board. So already we have three charities that we know are connected to each other through a common trustee. So the first task in any of these social network analysis workflows is to extract that relational information. And at the end we'll get an adjacency matrix. So as you can see before every row is a node and every column is a node as well. Hence that's why we call it a node by node matrix. And the ties that exist between charities are undirected and binary. So simply does a charity share a trustee with another. That's all we're interested in. We're not interested in how many trustees just simply that there is a connection. And I don't think it makes much sense to say that the connection originates with one charity. If we had better information then we would know when a trustee first joined the board of a charity. And then that might allow us to introduce a direction. So I start on the board of company A and then two years later I joined the board of company B for example. But we're ignoring that we're keeping it simple. We're interested in the binary undirected ties. So I want a network of charities. I've got 1123 organizations. So I'll have 1123 rows and 1123 columns. So we can use some clever Python code to create our adjacency matrix. And as you can see here, every row is a charity and every column a charity also. And the values are the cells in this matrix. Tell us whether these charities are connected. And again, you know, we don't count a charity being connected to itself. That doesn't make any sense in this context. But for this first charity here, we can see they're not connected, not connected, etc. But quite a few of them are. So you could probably predict there's lots of zeros. Most charities are not connected to most other charities. But actually in this network, they're all connected to at least one other, which is quite interesting. So as I showed earlier, you can sum the rows or you can sum the columns to find out how many connections a given charity has. So if we execute this code here, we can see that on average, a charity is connected to three other organizations. If we use the mean, if we use the median, typically a charity is connected to two other organizations. But there's at least one charity that's connected to 23 others. And that's quite interesting that these are current trustees, these are not historical data. So there is some charity in Manchester connected to 23 others, which is quite interesting. And we're going to find out who that is. So the final thing we do before I produce some results is we take that matrix, we plug it into a module called networks in Python. Networks is basically a library of methods and measures and calculations we can use to work with network data. Yep. So basically we take our charity matrix, we put it into a network object. We can forget all that for now, we can learn that at a different time. So let's start summarizing the network. So let's get a sense of how big the network is in terms of nodes and ties. So we can print some information about the network. So we can see that there are 1123 nodes. We knew that, you know, from previously. But now we know how many connections there are. So Python calls it edges, but we are going to call it ties or connections. So these 1100 charities have about 1500 connections between them. So not a lot of connections in this network. I won't do the maths in front of you, but there's the potential for around 500,000 connections. So if every charity was connected to every other, we'd expect there to be about 500,000 ties, but there aren't. So there's only about 0.003 of a percent of ties that have actually been realized in the network. And on average, a charity has three connections. So we saw that previously. So the first question we can ask is how cohesive or dense is this network? So how many possible connections between charities have been realized? So we've got a measure that ranges from between one, meaning every single connection between every single charity is present or zero, meaning no charity manages to connect to anybody else. The relancer obviously falls between zero and one in the real in the real world and often very, very close to zero. So in our network, as I said, nearly about 0.003 of a percent. So very few charities actually managed to form connections in the network. We can also ask ourselves about clustering. So to what extent do nodes form little groups together? Put another way, do nodes tend to form triads? So do we get groups of three charities all connecting to each other? So we get lots of local clusters, basically, if that makes sense. So transitivity is one such measure. Basically, we take the potential for triads, and we see how many triads actually formed. It's much easier to just show you this. So a potential triad is a situation that looks like this. So we've got three nodes and two connections exist between the nodes, but this one is missing right here. So these two people again, have a mutual acquaintance, but they don't actually know each other. So that's a potential triad. What we really want to see is a realized, so an actual triad. And basically, the transitivity measure tells us the ratio of these, the ratio of the realized triads to potential triads. So we can calculate that really quickly. We get a figure here. So we get 61 percent. So basically, when there are three nodes and there are two connections between those nodes, about 60 percent of the time, those other two nodes managed to connect together to each other. So transitivity is a good measure of, you know, a friend of a friend actually becoming a friend, if that makes sense. I'll explain all these in webinar three. This is just to give you a taste, don't worry. So finally, we have some node level measures as well. So which nodes possess the most connections? That's an interesting question. That's called centrality. So which charities have the most connections in the network? So if we take a charity, not quite at random, but we'll take charity 225116, it has 12 connections. So it's connected to 12 other charities. We could look at the top 20. So this charity here is the one that has 23 connections. This one has 22 and there's a big group have 12 connections also. We can visualize those connections using a histogram also. So the vast majority of charities in our network have one or two connections, about 100 of two or three connections, you know, about 50 have more than five connections, et cetera, as well. And finally, because I know you'll want to see this, but it'll prove my point, visualizing a network is not essential, but nevertheless, here is the charity and trustee network. As you can see, there is a big kind of cluster in the middle. Lots of organizations around it and then on the periphery, lots of charities that don't have many connections. To prove my point, here's the exact same data using just a different drawing. This is the exact same network. But this time I've just arranged the nodes in a circle around the side. And here are all the connections between the nodes. So as you can see, visualization, unless it's an incredibly small and simple network, it just really isn't that revealing.