 Good afternoon, everyone. Welcome to the second webinar in our social network analysis series. I'm Dermot McDonald, and thank you for joining us, and thank you as well to people who have joined us before. So this training is part of our new forms of data training series, which is also part of our computational social science training series as well. So we have some upcoming webinars you may be interested in. The third and final webinar of the social network analysis series is on in two weeks time, this time two weeks, where we focus on analysis. So we take the fundamental concepts we learned two weeks ago, take some of the data we're going to collect and explore today, and start producing some basic and intermediate level analysis of a social network data set. Alongside this, we've got our continuously running coding demonstrations series. So this month we're focusing on text mining. My colleague, Dr. Julia Kazmier has been running this, and you can learn how to do some basic, intermediate and advanced text mining in Python. So that's very interesting and free as well, and that's happening. And we've got some past webinars. You can find all these on our events page. These are all on YouTube, and there's some further coding and learning resources you can access for free also. But back to today, what we're going to focus on is the collection, the cleaning and the kind of repurposing or the transforming of data so we can conduct some social network analysis. So very briefly, I'll just do a quick refresher on the key concepts. We'll talk about APIs very briefly. So these are the kind of the data platforms through which we get social network information and data. Then we're going to look at an example. So I'll talk you through the main steps of getting data from Twitter, their API, as well as do a quick coding demonstration of making some requests for data and cleaning it up and saving it for later use. We look at social network data from a different perspective, which is looking at data about social networks and connections that exist in a traditional social survey or a traditional administrative data set. So trying to extract relational network information from data that we're probably very familiar with using in our research and teaching. And then we'll take some questions at the end. You've probably noticed by now, if you've joined us a number of times, you can submit questions. Unfortunately, you can't see the questions other people submit. Therefore, I'll go through as many as possible. At the end, the ones I don't, we'll type up a frequently asked questions and we'll post that as well on our publicly available GitHub repository. So to be concise, why are we doing this training series? Social network analysis is an enormously interesting and rich approach to social science research. But those of you who have probably have some experience of it have probably found that it is quite a technical and mathematical methodology. So it can be quite an unwelcoming, quite an abstract field of social science methodology. And the purpose of this training series is to try and simplify and demystify social network analysis as a methodology. Because once you do, once you get over the technical language and nomenclature, a lot of the analysis and a lot of the data structures are very intuitive, very familiar and very, very rich. So it's a good area to get interested in once we get over that initial technical hump. So a brief refresher on what we're talking about when we say social network analysis. What it is, as I've probably mentioned, it's a methodology. So it's a methodological and conceptual toolbox. So it's quite broad. You can approach the analysis of network data from a lot of different perspectives using lots of different analytical methods measures and algorithms, for example. And the purpose of it is allow you to to describe and to measure and to analyze networks and networks are composed of patterns in the relational structures in the social world. Relational structures are just simply the connections that exist between entities and how these connections add up to form an overall network. A relation itself is a distinctive type of connection or a tie, as we've been calling it between two entities. A very common relation, of course, is a familial relation, brothers and sisters, parents and children, aunts and uncles, etc. So that's a particular type of connection. But you can be colleagues with somebody, you can be spouses, you can be familiar associates, acquaintances, two companies can share the same office, etc. There's lots of different types of relations in the social world that we're interested in. And these relations or connections are thus the building blocks of networks. And hence social network analysis, then, is concerned with and most appropriate for the analysis of data capturing relations between your units of analysis. And we've defined units of analysis very broadly. It's social network analysis. So there needs to be some connection to sociology or the social world or social research. And the social network analysis comes from network analysis. More broadly, your entities can be people, animals, countries, organizations, computers, planets, you name it, whatever you think can be connected and can form part of your analysis. So the key point is we're interested in how people or things are connected and how these connections form patterns and these patterns form overall networks or aggregations that we can analyze. So in a very, very brief way of describing it, a network where they were talking about social, physical, biological, etc. It comes from two main building blocks. So the entities are the things that are or could be connected in a network and the connections that exist or could exist between these entities. So a network is this aggregation or collection of these entities and the connections that exist between them. So again, a family tree is a type of network containing individuals that are then connected or related through some type of familial tie. If you joined us for the previous webinar, we had a look at, you know, a real network. So these are organizations in the UK specifically charities and these are the connections that exist between them. So these are Manchester registered charities in the UK. There was about 1100 of them and they're connected if they have somebody who serves as a trustee for both organizations. And typically you see this is the type of visual depiction of a network that is most common in social network analysis. We've got a big, you know, clump of charities here in the middle who are quite densely connected and we've got some charities on the periphery who are only connected to one other organization. So it's a very brief refresher of social networks. We have the previous webinar if you'd like to go through that. But today we're going to focus on getting social network data and from two perspectives, one from a social media platform, i.e. Twitter, and extracting network data from traditional social survey or administrative data sets. And we'll return to this charity example later in the webinar. But first you need to understand the data infrastructure or the platforms through which you get data about social networks, particularly from social media platforms. And these are typically known as application programming interfaces or for short, just simply API's. So the technical definition of an API is it's a set of functions and procedures allowing the creation of applications that access the features or data of an operating system application or other service. That's a very, I think, unfriendly definition. It's technically correct. In essence, an API is an intermediary between software applications. So if I design a smartphone application and I need real-time traffic data from Transport Scotland, for example, I could do a lot of hard work of writing programming code in the same language as the database that holds the traffic information. Or if it exists, I can use an intermediary that takes a very simplified set of instructions and requests and that itself will go and get the data for me. So API's are these kind of middlemen or intermediaries or translators between one software application or a programming script on one side and a database or some other software application on the other. If the kind of technical definitions aren't too clear, then really conceptually an API in terms of data is a socket. So basically you plug in your programming script or your smartphone application or your website or whatever it is you have that needs data, you plug that in to the API and the API returns the data to you. So you don't need to know very much about the database, what language the database is written in, what infrastructure it uses. That doesn't matter. The API knows how to get the data and you simply ask the API for the data that you're interested in. And that is a socket in one of my rooms that I painted. So hence it's quite a messy job. That's not where my skills lie. So a lot of data about social networks are from social media platforms and come from API's. So you need to learn how to interact with them in order to get your data. So we're going to take an example today. We're going to use Twitter. I think it's interesting for its own sake as a source of social network data. Of course, there are many others. Facebook does allow access to its data through an API, but it's become more restrictive recently. It's a bit more difficult to use. The same for Instagram that's really tightened up. But there are some that are more friendly than Twitter also Spotify is quite a good one to get social network data from as well. And various newspapers as well. So in a previous webinar, we looked at the Guardian API, for example, and you can look at connections between articles in terms of, you know, topics or resharing or hashtags, etc. But today we'll focus on Twitter. It's had a recent upgrade to its API. So it's worth delving into some of the details about that. So if you're unfamiliar with what Twitter is itself, so it's one of the world's most popular and social networking platforms or micro blogging. So it's about the sharing of kind of brief, concise pieces of text content. But of course, it's much, much broader. Now you can share and videos and images and clips and etc. So it's got a user base, certainly in the hundreds of millions. But it is notoriously tricky to pin down the exact usage of this platform. Twitter does prevent you from trying to reverse engineer, you know, and details about the platform, how many people use it. But we can safely say it's somewhere in the hundreds of millions of unique users per month. So it generates a lot of data about social networks. In terms of usage, users can post, which is known as tweeting their own content. They can repost or retweet the content of others. They can like the content of others. They can follow other users. And there's lots of other functionality you can use on the Twitter platform. And Twitter then allows restricted access to the data it holds on the above activities through its API. So Twitter makes volumous information available. But in terms of, you know, the proportion of information that it holds, it allows restricted access. So you can't just connect to the Twitter API and get all tweets for one account or all tweets for all accounts that you're interested in. And it's just simply not that detailed, but it's incredibly detailed. And it's almost certainly more than enough for research purposes. So what do we know about the Twitter API itself? So it allows programmatic access to the Twitter platform. So in one way, that allows you to use Twitter yourself. So to tweet and to retweet and to follow other accounts automatically or in a scheduled way. And you've probably heard these referred to as bots. So Twitter bots. We've heard the nefarious use of Twitter bots in election campaigns and election interfering, unfortunately, in recent years. In essence, you connect to the Twitter API with some information and you say, anytime somebody follows me, could you automatically reply to them saying, thank you very much for following my account. So that would be an example of you connecting to the Twitter API and giving it some instructions to act on your behalf. So that just shows that APIs are much broader than just accessing data. They allow you to access functionality. But today we're going to be interested, and as we usually are for research purposes, we're getting our hands on data. So we can issue instructions or requests to the Twitter API to get data on certain topics about certain people from certain tweets, etc. There are different levels of access known as tiers to the API. In general, there's a standard level of access. This is free. This is for individual use. And this usage would carry with it the most restricted conditions. So there'll be X amount of requests you can make per 15 minutes. There'll be, I think it's 500,000 tweets. You can request every 30 day periods. There's restrictions on the sharing of the data that you download, etc. If your research project carries some funding, you can also upgrade to the premium, or you can do it yourself, obviously. This is still kind of geared towards individual use or researcher use, but you'll get much more generous restrictions. So I don't know exactly, but it could be one million tweets every 30 days instead of 500,000, and maybe it's different limits per 15 minutes, etc. And then there's an enterprise version, which just again, if you're an organization and you need access to Twitter data or functionality as a core component of a service offering or a product that you sell, there's enterprise level access also. And that's probably something I can't imagine you're interested in as researchers, but it's just a good view to be aware of the different levels of access. And in terms of getting your hands on data itself, the API provides what are known as endpoints. Endpoints are simply the data tables or the information that you want to request itself. So if you think of, you know, a spreadsheet, a spreadsheet can have multiple sheets within it. So maybe sheet one contains user information and sheet two contains tweet information and sheet three contains retweet information, etc. Basically, an endpoint would be sheet one. A second endpoint would be, you know, the second spreadsheet, etc. So an endpoint is the API kind of technical term for the spreadsheet or the table or the information that you're interested in. At endpoints, if you request them manually, they're basically web addresses. So it'll be something like HTTPS and Twitter dot API forward slash tweets. That's not literally what it is, but that's an example of what an endpoint looks like. And if you were to request that web address, and hopefully if successful, it would return the data that exists at that endpoint to you. But we'll see how that actually works in a moment with the coding demonstration. Just, you know, get it into your head that if you ever hear endpoint, it means the kind of data table or the database or the spreadsheet of information that you're interested in. So how do we actually use or interact with the Twitter API? So there are a number of general steps that you go through before you can start requesting data. So there's a registration stage. So basically, you need to set yourself up as a user on the Twitter developer platform or developer portal, as it's also called. For this, you need a Twitter account and some verified details. So for me, I need my Twitter handle, so my app and my username, plus a verified phone number. And once I have those two things, I can start to register on the developer portal. When you use APIs, again, the language can be quite off-putting. So APIs are traditionally aimed at software developers. So you need some access to the data or functionality of another platform. So the API assumes your developer who's creating a smartphone application or a website or a digital product or something like that. But developer platform also can be used by researchers. It just means that you basically register yourself with the API, Twitter knows who you are and what you're using the API for. So the API will ask you to explicitly create a project and what are more, what are known as applications. So again, if I was a software developer, I would create a project name, so whatever the project could be. And then under that project, I might have three or four different smartphone applications that are related. For you as a researcher, you might create a project which is the name of your paper or your research study. And then the application might just be simply something specific like getting tweets to do with Brexit, for example. You have a lot of freedom in terms of what you write. Twitter is not looking for certain projects or for certain keywords. But you do need to be explicit and you do need to be upfront about why you want to use the Twitter API. Then once you get accepted as a developer and your project gets approved, this can take, I've heard, you know, nightmare stories of two, three weeks. For me, recently it took about eight hours to be approved to do some requesting of data from the Twitter API for research purposes. So if you want to do things this afternoon, you've probably left it too late, but surely if you give it a day or so, Twitter will accept your application. And once you're accepted, basically Twitter says, OK, here's your username, here's your password. And in some cases, here are some secret keys that you need to provide when requesting certain types of data. So these are known as authentication keys or credentials. And these basically allow the Twitter API to recognize that it is you using the API at a given time. Then you select your level of access. So do you want the standard version or do you want to pay for premium or the enterprise version? And the more interesting bit is, thankfully, then you can start requesting data. These kind of general steps apply to lots of other social media platform and just APIs in general. So it's worth recognizing that you can't immediately use a lot of APIs as a registration process. You tell them explicitly what the use case is of your project. You get some username and password credentials and you select what level of access you would like. So I have a little video here that we can just kind of move through. So this is what the Twitter developer portal looks like. So you go to the Twitter developer portal, you click register for an account. It asks you what type of user you are. So are you professional, developing software products? Are you an academic, et cetera? So you can see then it tries to pick up my information. So this is my Twitter account that I have anyway. I have a verified phone number. My email address, where do I live, et cetera. And we can move on from that. And then, yeah, so we just give them some information. And this is the key kind of section here. So here is where we're registering our use case. What do we actually want to use Twitter for itself? So basically, Twitter asks for about a paragraph's worth of information on why you want to use the Twitter API. So I don't try and fudge this or lie or hide what I'm doing. I'm quite explicit. My research interests are to do with UK charities. Some of you probably realize that by now. So I'm saying I want some Twitter API data so I can track the kind of the solicitation for donations by charitable organizations. So then we've got a set of specific questions. So, you know, are you just actually hoarding data? Are you going to analyze it? I say, yes, I'm going to classify responses to calls for donations, you know, as positive, neutral, or negative, for example. So yeah, and then there's some questions about, you know, do you need to use any of the functionality? So am I downloading data? Or do I want to actually create a bot myself? So after I download data, I do some analysis and then my account starts actually posting the findings of my research, for example. So I write an answer to that. I fill all these fields out. There are four fields. It's not too onerous. It takes about five minutes to fill out, for example. And yeah, and then do you plan to share data or do you plan on, you know, releasing data that you collect? Very often this is not the case, though maybe sometimes if a journal wants the actual underlying data itself, maybe, you know, you have cause to share it. In this example, I decided to play it safe. I said, look, when I do the aggregate analysis of tweets, I'll be probably sharing that data set. And I'll be obviously reporting the findings in some academic papers. So you get a chance to look over your answers very quickly. You sign up to the developer policy. So these are, you know, the restrictions and the fair use of the Twitter API. It's usually standard, you know, digital product kind of use cases and terms and conditions. But things to look out for if you read the developer policy would be, yeah, the sharing of information. So you're not allowed to share data you download with government departments, for example. That shouldn't be a problem, but I have had someone contact me saying, well, do universities count? They don't. But, you know, there may be certain institutions. Maybe you're seconded somewhere and you do some researches and Twitter data and they want it. But in general, the developer policy is, you know, uncontroversial for research purposes. So we've done a registration. We've gotten set up. Twitter API knows, you know, who I am. So let's actually dig into the data itself. So some of you will be familiar if you've joined us before. And we tend to use these Python notebooks where we can mix, you know, some text and we can mix some programming code. Also, this is available in the repository link that we sent you previously and it'll be sent to you in a couple of days. Also, so for now, you don't need to follow along. I'll just be doing a very brief coding demonstration. But if you'd like to have a more in-depth coding demonstration, we do have these continuously running series. So in the feedback to today, if you feel like a social network coding demonstration series would be good, then please let us know it's something we love to do. So let's take a look. So I've registered with the Twitter API recently. That's why I recorded that video because I don't need to do that process again. I'm an approved quote-unquote developer on the Twitter API platform. So let's take a little look through what we can do now that we are registered. So anytime we use Python, we need to just configure it correctly. So we need these Python modules or Python packages here. So we run this code. That loads in everything we need for Python and now we can get going with the requesting of data. So when I registered for the Twitter API, it gave me a unique username, a unique password and a couple of secret keys which sometimes are necessary for requesting certain types of data. Maybe more sensitive or more confidential data. You need to also provide a secret key. A kind of childish but actually useful analogy is to think of having a treehouse and having a password and a username and a password and a secret word to get into the treehouse. Surprisingly worked for me. So just to show you what your credentials tend to look like, these are fake credentials because obviously these are specific to you and are actually reasonably confidential. If someone got your details, started making very rapid or spurious or malicious attempts at requesting data, you would be kicked off the Twitter platform. So we won't actually be sharing any of our details today. But when you do eventually register yourself, then you will have your own credentials and you can swap them in here. So for example, these are my credentials. I've made this up. My username is this and my password is that. And there are some extra secret keys that sometimes I might need to use. So I'm going to load in my own credentials. It's good practice when you're doing this to keep your credentials in a separate file. And that way you load in the details you need. So I'm never writing my username as a string in here, for example. It's always protected. So I'm going to load in my own credentials. And just again, if you're going to use this code yourself at a later date, you must supply your own credentials. I unfortunately can't share mine with you. So we've gone and we've taken our credentials from wherever they are on our laptop or on our Dropbox or iCloud or what have you. Now we're going to connect to the API. And the first step is to tell it who we are. So we're going to use a Python package or module called Tweepy. I'm not sure. I'm guessing it's Tweepy. I prefer Tweepy. This basically provides a very kind of simplified and easily understood way of using the Twitter API. So basically, Tweepy saves you writing lots of code yourself. So when you want a certain data set, there's an easy way of doing this using Tweepy, for example. So for actually telling the API who we actually are, we can use a method here. So we tell Tweepy, OK, this is my username. This is my password and then connect to the API using these credentials. So we do that. So we haven't asked for any output. That's why you're not seeing any. Basically, if you don't get an error, you've successfully connected to the Twitter API. To check if it's actually working, we can make an example request to the API. So we're going to do a search request and we're going to look for my first name, DearMid, and we're going to find the first 10 results. And then we're going to look at the content of the tweet itself. So we can run this in real time. So you can see I get, for some reason, lots of Japanese results, which is quite interesting. I think there's an anime or a manga character who has my first name, which is quite strange because I have a very Irish first name, but apparently it's a character also. So these are the first 10 results. And you can see the kind of results are done in real time. So maybe they don't change that quickly. But anytime there's a new tweet mentioning my name, it'll appear here in this list. OK, so we've gotten the credentials. We've passed those onto the Twitter API. So let's actually get stuck into getting some data. So one of the first requests you'll tend to make is to get the kind of metadata associated with an account. So as you can see, you can get quite a lot of metadata just to do with my Twitter account. So basically it gives you kind of a list of my recent tweets and my kind of biographies. You can see my screen name here. You can see where I've said I'm based. You can see my job title that I've put up there. I've got some links to some web pages. How many people follow me? How many friends count? Not sure what friends count means actually. I don't think friend is a name or a term used by Twitter. But anyway, it's a metadata field that we can get. You can see what I've done recently. So there's something I did today, quarter to 11. I retweeted someone else, unfortunately, about Brexit, which is exercising me at the moment. So a very easy, quick request you can make to the Twitter API is for your own actual information. You can request any publicly available account. So here's Boris Johnson, the Prime Minister of the United Kingdom's Twitter account. And again, we can pick out the same information. So how he describes himself and his recent activity should be down here as well. So quarter past three, he tweeted something about criminal sentencing. So even just a simple request like this for the metadata associated with an account, captures some of the content and captures interesting descriptions about the account itself. And there's lots of different methods for requesting data that TweetPie provides. So there are links elsewhere in this notebook to the full list of kind of data and things that you can request through the API. Let's give ourselves some focus just now. Let's pick a UK charity. Let's take its Twitter account. Let's get as many tweets as possible as we can get. And then we'll save it for future analysis, you know, which would be looking at solicitations for donations and how people respond it. So today we'll focus on the Royal National Lifeboat Institution, or commonly known as the RNLI. Very valuable in terms of the work it does. UK charity, it does some excellent international work as well. And it focuses on saving lives around British and international coasts. So in my opinion, a very, very worthy charitable organization. So one of the methods I can use for getting data is the user timeline method from TweetPie. And that gives me the 20 most recent tweets or retweets associated with the RNLI account. So I can run this now in real time. You probably just saw previously there were older results from about an hour and a half ago when I previously ran this, but we can run it in real time as well. So it gives us quite a lot of fairly difficult to parse kind of results. But actually what we're really interested is there's a field called underscore JSON equals and then pretty much everything inside of this is the content and the metadata that we want. So its most recent tweet was at quarter past three in the afternoon today. Here's the unique ID of that tweet and here's what they posted. So what would you say to rescuers that saved your life? Here's the moment that Amanda, et cetera, et cetera, and then a link associated with the tweet. There's also information about any hashtags associated with that tweet. You can see that there were no hashtags included with this tweet just here, but there usually are and that information is captured. Also, and there's some fields it's difficult to see, but there's some fields capturing if users were tagged in the tweet. So if the tweet was deliberately posted to somebody else or included somebody else, we can see who it was aimed at. Now we can start to see the social network aspect. We're just not asking for text back from Twitter. We're also getting a sense of who's interacting with the text. Who's the tweet or the content directed at? Who responds? We can start building up the network using this information. We can pick out the fields that we're interested in so we can reduce the mess and the difficult to parse results and we can pick out some fields that we're interested in. So for the 20 most recent tweets by this charity, tell me what date they posted the tweet and give me the actual content of the tweet itself. So here, you know, it's, there we go. So we're picking up a tweet from half to today. This is one I just read out. Actually, maybe that's been retweeted by somebody. Here we've got then one 15 minutes before that. Again, here's the content of the tweet. So now we're seeing how we can pull out really interesting information from the Twitter API. So one issue is that the Twitter API, as most APIs do, they restrict the number of results that are returned. So the user timeline method gives you the 20 most recent. This is to stop you saying, right, give me all tweets by, you know, this account. If we take somebody like the president of the United States who tweets, I'm not sure, but dozens and dozens of times per day, that can be hundreds of thousands of tweets. And if all of us were working on this project now, we're all sending the same request for the tweets from the same account. And that puts a lot of strain on the Twitter API. So APIs in general trying to restrict the number of results. That doesn't mean you can't get more than 20. It just means for each individual request, it only gives you back 20 at a time. So Tweet Pie gives you a way of overcoming this using a cursor method. We don't have to worry too much about how that works now, but in future we can do more technical demonstrations. But using this approach and it'll take probably about 10 or 20 seconds. This will go back in time and not just give me the 20 most recent tweets by the RNLI, but it'll actually go back. I think the restriction is about 3,200 tweets that it can recover. And then that hard limit of 3,200, I think maybe that extends if you pay for the premium version. And certainly if you had the enterprise version, that would, you know, probably greatly increase the number of tweets that you can recover. But this shows how you need to start thinking strategically about your data collection. So if you had a programming script that ran once a month and collected the 3,200 most recent tweets and then you had that run every month, then you could see over a year how you probably wouldn't miss out on much information. Yeah, so if we look at the... It's like typo there. So if we look at the results, so we can see that there are 3,200 tweets that we've managed to recover. And if we look at the first tweet that we recovered, again, here's the one from half past two. We've seen that a number of times. Here's its ID. Here's the text. Any hashtags, there weren't any links that were shared in the tweet we've captured. And then we've got metadata about the account itself. So Twitter gives you back volume of information. So there really are no excuses of not having rich enough data. You'll definitely get lots of good data. So I ran this earlier. You can see I bumped into a slight mistake. So let's... I'd like to call that a deliberate mistake. Unfortunately, it wasn't. Basically, I'm working through the full list of results that I produced. And I just want three fields. I want the date the tweet was posted, the unique ID of the tweet, and the content of the tweet itself. And I can store that into a separate list of results. And here I have now for each tweet its date, its ID, and the content it contains. So that's really good. So what I'd like to do is I'd like to save all of those results so we can analyze it later. So I can take the data that I've requested and I can spit that out to a file on my machine. If you think the data structure that we've seen, which is squiggly brackets, and it's like a field name and then a command, and the field value is a bit unfriendly, we can convert the data to something we're more familiar with, what we would call a data frame in R, or just a variable by case matrix, or just a regular data set where every row is an observation and every column is a variable and every cell is a value for that variable. So we can sample from that. We can see tweet number 1733 was posted on the 18th of April this year. It's a unique ID here and the content of the tweet. And then we can save that as well. So we can save to a CSV file with kind of more traditional data structure. So that's a very quick look at requesting data from the Twitter API. You can work through this notebook yourself. There's a bit more information and context to what we're doing that you can read through and you can execute this code online yourself as well if you have your own Twitter credentials. So in summary, again, there are general steps and approaches that we can extract from what we've just done. So let's take a look at those major steps that you can employ for your own use of the Twitter API. So you need to register for a Twitter API developer account. And once you do, as I was showing, you create a project or an application as it's called that requires use of that API. Again, unless you're doing something very... As I say, use the word dodgy but something suspect and you're trying to hide it as academic work. I mean, if you just honestly describe your research paper, your research project and what you need the data for, it'll almost certainly get approved. It's just the process looks a lot more intimidating than it is. But Twitter wants people using their Twitter API. I mean, that's the point of it. Then we can use the Tweetpy Python package for interacting with the Twitter API. This is just a very simplified, friendly set of methods and approaches for interacting with the Twitter API. And it saves you writing a lot of Python code yourself. Similar packages exist if you use the OR programming language. I'm totally agnostic whether you use OR or Python. I don't think one is better than the other. It's just your personal preference. But of course, you can still use Python to collect data and OR to analyze data. I think that's a sensible disaggregation of workflow. But you could do the whole thing in OR. The whole thing in Python, it's completely up to you. So once you have your account and you have your credentials, you can use Tweetpy then to connect to the Twitter API. And we saw how to do that. We can make various requests for data. We can clean the data that comes back and we can save it out for later analysis as well. So that's a look at how we would interact with the Twitter API. Of course, relational data or social network data can exist in other data sets also. So this very quick example is going to show you some traditional administrative data that happens to have information about how charities are connected. And I'll demonstrate a couple of kind of clever neat tricks for converting that into social network data. So again, we just load in the packages that Python needs to work with data and to work with network data. So what I do is I load in some data that I have about the trustees of charities in Manchester. This is all publicly available information, hence why I'm happy to show individual names and actual company numbers, charity numbers right here. So if we take the first individual, this person here is on the board of three different charities and here are the unique IDs. So if we conceive of a connection between charities as sharing a trustee, we can see that this, this and this charity, these are all connected through this one individual. So this data set contains relational information on how charities in Manchester are connected. So now our challenge is to, you know, extract this information so we have a data set that contains information about the connections between these organizations. So how do we do that? How do we take out this information from the data set? So we define what we want at the end. So if you joined us last week, we just, we define something called an adjacency matrix. This is simply a data set where every row is an entity or a node as it's called. Every column in the data set is also that same set of nodes. And then every cell in the data set or in the matrix basically tells you whether those two nodes are connected or not. So to give that a bit more content, in our example, every row in this data set represents a charity. Every column also represents the same set of charities. And then in each cell would be a one or a zero indicating whether those two charities were connected. And we'll see what that looks like in just a short moment. So how do we create an adjacency matrix as it's called? Well, the way to conceptualize of what we're doing is basically we want a cross tabulation of all those charity numbers that we saw. So a cross tabulation is just how many times do we see two categories occur at the same time? So how many people are both male and employed, for example? So we can cross tabulate our charity numbers as well to see how many times charity A occurs with charity B. So how many times are these two charities connected? So the data trick basically is to merge the data set with itself. And having done that, we can see what the results look like. So on the left is our original data set. So here we can see the first trustee that we have and the three charities that this person is a trustee of. And when we merge the data set with itself, what you see is we're basically replicating the charity numbers each time. So we're looking at every possible combination of these three charity numbers for this individual. So we can see that we have this charity here. It's connected to itself, which is obvious, but also not very revealing. This charity is also connected to this one and this one. And again, this charity is connected to this, et cetera. So the cross tabulation, then, of this information basically gives us our adjacency matrix. So now we have a new data set. This data set has, for its roles, every charity number. Every column represents the same set of charities. And in each cell represents how many times these two charities are connected. So this charity is connected to itself twice. So that means there are two trustees on this chart that shares two trustees in common. Again, that doesn't make sense that a charity is connected to itself and we'll deal with that shortly. But if we take this charity here and this one here, you can see that they share one trustee in common. So charity A maybe has 10 people on the board. Charity B maybe has 15. And of those total 25 people, one person is on the board of both organizations. And again, how this is done exactly in Python, if you'd like, we can do more involved coding demonstrations. But I'm just trying to get you to think of the general trick or the general process of going from a traditional data set which has some relational information into a network data set and what it actually looks like. So there are a couple of other kind of things we can look at. So as I said previously, this charity and this one are connected and they're connected through two different trustees. So Charity X and Charity Y basically have two trustees in common. So before I show the final result and we move back to taking some questions, I want to remove what are known as self-loops. So it makes no sense to say a charity shares trustees with itself, that is just self-evidently obvious. So we want to get rid of those self-loops as they're known in social network analysis. And I want to convert to what are known as binary relations. So I'm not interested in how many trustees are shared in common. I just want to know, do they share trustees in common? Yes or no. And there's two approaches to doing that. So basically I fill the new data set. So all of the values on the diagonal I set to zero. So these are all the connections, the self-loop connections. And then anywhere I find a value that's greater than or equal to one, I replace that with a value one. So if two charities had 10 trustees in common, now once I make this change, that'll just appear as a one, that they do share at least one trustee in common. That's a lot of talking. It's a lot easier just to look at the result that are produced. So again, I've removed all the self-loops. So if you work down through this diagonal, that captures how charities are connected to themselves. That used to have the value two. I've now replaced it with the value zero. So no connection exists between a charity and itself. You'll remember as well that this value used to be two. So these two charities share two trustees in common. So if you're not interested in the amount, I just want to say yes or no are they connected. And once I do that, I've got my network dataset known as an adjacency matrix. I can put that into my network analysis software in Python and then I can start doing some analysis. So there are 1,100 charities. There's 1,500 connections between them. And on average, a charity is connected to three others. So then you can see how we can go forward. We can do more in-depth analysis, which is what we're going to cover in two weeks time.