 Welcome to the second of our live coding demonstrations with the UK Data Service. My name is Dermot McDonnell. I'm a research fellow at the University of Birmingham, but formerly of the UK Data Service. And with my colleague, Julia Kazemeyer, today who will be helping me facilitate, we will go through some of the materials we developed for our social network analysis webinar series. This is a series of three webinars, so some kind of lectures and some live demonstrations and some further teaching materials, et cetera. The purpose of these two coding demonstrations, the previous one was two weeks ago, if you are unable to join us, is to just go a bit more hands-on with the actual coding involved in doing social network analysis. So we're going to take our time today for about half an hour, 35 minutes. We're going to get stuck into using Python to do some social network analysis. So if you're completely new to Python or completely new to social network analysis, then today's is kind of made that at you. We won't teach you Python from scratch, but we'll show you some basic commands and some basic coding that you can use. And today we want to focus on getting some social network data, specifically Twitter data. That's what we're going to do today. So again, thank you very much. If you have questions as we go along today, then please type them into YouTube if you're signed in. I encourage you to sign in just so you can leave comments. But of course, not a necessity. Today really, you can just watch what I'm doing and you can execute the exact same code at the same time as we go through today's session. So my brilliant assistant, Julia, should be posting the link to the materials that we're going to use today. As I say, you can execute the exact same code that I'm using. We're all going to use an online version that's openly available to anybody. And where you can find it, if you're unfamiliar with what we've done before, is the UK Data Service has a profile on a platform called GitHub. This is a coding repository so people can share their code and work on it collaboratively. The Data Service has its own profile and underneath we've got lots of different training series that we've done before. So we've done some text mining, we've done some introduction to Python, code demos, we've done some web creating, et cetera. So today we're going to do some social network analysis. So you don't need to worry about the link. Just know Julia has posted a link in the chat box so you can just go directly to that. So we've got a lot of different materials. We've got some advice on installing Python on your machine or reading lists, some webinars that you could watch if you want to get into some of the theory of social network analysis. But today we're just going to get straight into the actual coding itself. So basically we're going to launch what are called Jupyter Notebooks. If you're unfamiliar with what these are, you can think of them like regular paper notebooks where you can write in whatever language you want, whatever language you write or speak in. Jupyter Notebooks are electronic notebooks where you can write different programming languages. So today we're going to use Python, but if you're more than our user, you could write a horror code in this notebook. If you're a quote unquote proper programmer, you know you could write in C sharp, visual basic, Julia, lots of different languages and you can use as well. So if you click the link and you should see something very similar, basically I've just made the repository live. So now it's a live coding repository and we can start executing our Python code. So last week if you joined us, we kind of looked at some kind of fundamental concepts and techniques of analysis in social network analysis as a field. Today I realized we probably advertised today as doing the analysis, but since I did it two weeks ago, today we're going to focus on getting some social network data against specifically Twitter. So what we want to do is when we get this view here, we want the UKDS SNA getting data notebooks. So we just want to click on this notebook here and that should launch the code that I want to use today. Excellent. So this is a Jupyter Notebook. This is what we use to develop our training series. As you can see, they're openly available and they're interactive, so not only will you be able to execute the code today, but you can change it however you like and it'll always remain unedited, unchanged. So any changes you make today, you can go to the file and you can go to Save As and you can download your own copy with your own changes. But this version we have right in front of us now will always remain unedited on the repository and you can launch this whenever you want. If you make mistakes and you delete everything, you can just go back, relaunch it using the link we've given you and go again. So what we want to do is there's lots of material on background and concepts. Today we just want to get stuck into the code. So in the table of contents, just at the beginning of the notebook, we want to go to section four. So collecting social network and data. So if you've got this notebook to launch, then again, feel free to work through yourself and execute code in sync with me. If you haven't got it launched, then don't worry too much about the link. As I said, it's all there for you to practice later and you can just follow what I'm doing just now. Excellent, so we want to go to section four. I've just been told there's a little bit of echo on my end. It's to do with the completely bare room that I'm in. No carpet, no furniture. So I don't think there's much I can do at the moment about that. Apologies. Excellent, so we're in section four of the notebook and this is how we interact with it. So a notebook is made up of three different types of cells. So if you click on this heading here, you can see that the cell is highlighted. If we double click on it, we can see that it's just basically a cell for typing in a text. So I can say this is a text cell. If you wanna execute it, you can go to the run button here and that basically just executes the cells. So it just prints the text in a slightly formatted manner. If I double click again, you can just delete the text that I just put in. And once again, I can click run or if you've got a keyboard in front of you, you can do control enter. So CTRL and the carriage return or the enter button and that executes the cell also. We're not gonna do any typing text today. We're more interested in the code. So what you can see here is this kind of grayed out cell with an IEN square parenthesis. That indicates that it's a code cell. So IEN stands for input. So the notebook wants you to type some Python code into it. So that's what we're gonna begin with now. So if you're using Python to do some social network analysis, so that that's doing the actual analysis itself, collecting data, et cetera. Anytime you launch Python, no matter how you use it, you basically have to import all the functions and commands that you need for this session into your Python instance. So today we're using a Jupyter notebook. All of these different commands and packages that are all installed on the machine or online, we basically have to tell Python, hey, bring this into this session. I need to execute these commands today. So good thing about Python is it's quite English language based. It uses simple legible terms to describe commands. So the import command imports the functions that you need for today. So just like the text cell, we can click anywhere on the code cell. You can see it sometimes it's highlighted in blue, sometimes in green, which means you're on editing mode. So again, I can click the run button or control enter on your keyboard. And what we've done is we've asked the notebook to execute the Python code stored in that cell. So if we just get a number in the parentheses, that means that the code is executed successfully. If, for example, we made a typo and then we try to execute the code, you can see an error message underneath the code cell. And in this instance, there's no Python module, so there's no set of code or commands called twp, you know, the typo that I made. So I'll come to my typo and execute the code again. And now Python has everything it needs to begin collecting Twitter data. And as you've probably noticed, one of the key modules or packages that we need is something specifically written for Twitter data that's called twp-py or twp-p, or I guess it's technically a twp-py, but I'm gonna say twp. And we've got a package for handling a particular data format called JSON. We've got a package for working with data sets or data frames in Python called pandas. You've probably come across that before. And we've got a basic module as well for working with dates and times. So this is apologies if this is teaching us very rudimentary things that you may already know, but it's important to think of doing any analysis in Python in very sequential logical terms. So you have to get it set up correctly to begin with and then we can start doing more interesting things. So what we want to do is we want to connect to Twitter's database, otherwise known as an API. This is called an application programming interface. An API is basically like a middleman in between you wanting to request data and the database itself, which actually stores the information you want. It's kind of like a translator or an intermediary. I've also seen an API described as a plug socket. So the electricity is being provided to your home through the wires. You've got a device on your desk and it's through the socket that you connect to the electricity supply. So an API is like that, an API is like a socket. I've got my machine here that wants Twitter data. Twitter has its database that stores Twitter data and we want to connect to it. Very whistled up, ToroOne API is, again, we cover it in more detail in our webinar. This is all to say that basically we're connecting to a database. Some databases are completely open, but most restrict access to verified users. So Twitter API is no different. Basically we have to tell it who we are in order to request data. So we're going to use real Twitter data. So we're going to use my credentials. They're going to be destroyed at the end of the session. So have your phone now, please don't try anything crazy. But it's reasonably quick. It's something we cover in the webinar, how to register, to use the Twitter API. You can take two to three, maybe even five days for you to be verified and to be granted access. So if you're in a rush to do your research, then factor that in that Twitter does actually check the application you make to use this data. But it very rarely stands in the way of academic usage of the data. So for example, there's a file on the repository here. So Twitter API, fake credentials, it's just a little text file with some variables, storing the access token, the secret, et cetera. What that basically means is that when you want to connect to the Twitter database, you give it your username and you give it your password and that verifies who you are. So we're going to execute these commands here. Basically I'm opening the file that I just showed you and loading in the data and storing it in this variable or this Python object called tokens. My username or my consumer key is stored in a field called usefully, consumer key, et cetera. Then I have a little command here that just basically prints the information we've read for the files. So these are fake credentials. So this is me, here's my secret, et cetera. So we're going to actually load in my real credentials now so we can actually access Twitter data. So we're going to execute this code cell here. So with the actual proper genuine file open, let's load in the data. So this is basically what we call in the education business deliberate mistake. Oops, as you have noticed, if we go back to the real file, I've stored what I need in some of my basic, my idea is called the consumer key and my password is in a field called API key and secret. And you can see here that I'm looking for a field called API key, which doesn't exist. So I'm just going to update to consumer key and API key secret is correct. So if I load that in and execute it again, now we don't get any error. So if I, again, just to show us happening, so if you want to insert your own text or your own coding cell, if you go to the insert option up here, you can insert a cell over below while you're currently selected. So I'm just going to type it very quick. I'm going to put information back to the screen, so I'm going to just say print consumer. Okay, so I have made, exactly. Okay, just a moment. So I said I was a deliberate mistake a lot, but I then compounded that with an unintentional mistake. So I need to put up my real credentials very quickly. And I'll do that just, there we go. So as you can see, we're edited slightly earlier, but I didn't follow through. I'll just double check that that works. I promise once you see what we're actually able to do with Twitter, you'll be very happy that I got this working. So just indulge me very slightly. No, so correct, I didn't actually save my details. Correct it. So we'll do this very, very quickly. Holds, excellent. I thought this is somewhat useful, this shows you once you actually register to use the Twitter database, you can see kind of your settings of what you're allowed to do. So basically I set up an application called Charity Tweets. That's what I'm interested in as a researcher. Basically I need to get my API key and my secret password so that Twitter knows who I am when I request data. And it's quite secure so you can see it regenerates my user ID and my password. And each time so it prevents, it kind of acts as a bit of a safety mechanism against people getting hold of your details. So you don't have to worry about any of this because I'll re-upload this file. I think that's what happens in the sense. This don't ever work with live code or children. Okay, so once more I'll just upload my actual working Twitter credentials. Excellent, okay. So what you might need to do is because we've opened this stupor notebook, before we made the changes that I just made is that you probably will have to launch it again. So you just click on URL bar at the top and press enter should work. So we'll go back to section four. Actually, the easiest thing is to relaunch and everything wants to work. Yeah, so the original link that Julia posted into the YouTube chat, so if you just click that again and then basically that basically rebuilds all the Jupyter notebooks from scratch because we've made a change and that should have the proper Twitter credentials file. And yeah, okay. So I'll bore you a little bit about what's going on here since we've got a tiny bit of time. What we're using is a thing called Binder. Binder is basically a software online service that if you've got some code, basically it builds a little mini computer online that can execute the code. So if you're working on your machine, you may have installed lots of different packages and programs, et cetera. You basically bundle all of that up, you host it on Binder and Binder is it to recreate your computer, basically. And not your entire machine, not everything you've ever installed or downloaded, but anything that's specific to a piece of work like this, you can recreate it online or other people to use for free, which is quite good. So that should be us. Hopefully this is all useful for your own sake when we inevitably bump into issues, but I do apologize. That's probably my issue for trying to keep things half-secret and half-not-secret. Yes, so if we go back to section four, we, yeah, reload everything in. So API to key should be consumer key and then we can reload everything in. Set it so it's above or we can just hopefully double check that this now works. So what is the consumer key? We're just asking Python to tell us what it is. Okay, now that's much better. That looks like an actual consumer key. Yeah, and that looks okay. So now we've got my genuine Twitter credentials. We can connect to the API, the database. We can start getting some Twitter data. Fantastic. So we've got my credentials. The next thing we want to do is pass those credentials to the Twitter database itself. So this is where the 2P module in the Python comes in. It basically saves you writing a lot of code directly to the Twitter API. So 2P has been specifically written to make it much faster, much more efficient, much easier to connect to Twitter and to request data. So basically it hides a lot of the more detailed and intermediate level Python code you need to write and wraps all of that into really simple functions and really simple methods that you can use yourself. So for example, if I want to authenticate or validate my access to the Twitter API, what I do here is I call the 2P package in Python and I use this method called application authentication handler or a UTH handler for short. And I pass it in my new user ID and my password. And I store that just for its own sake in a variable called auth AUTH. So basically I take my credentials here and I pass them into the Twitter API itself. So again, the best thing we can do here is, in case we're confused of what did we pass and who did the composite connection make that might not allow it to request data, let's actually try and request some data. So I've written a little bit of a really simple command here, I'm basically saying, you know, connect to the Twitter API, use the search method and the query I want to search for is my first name, which is DearMid, and then I just want to return the first 20 results. And for each search result, show me the texts that are every tweet. So let's see if this works. Excellent. So I should say that this is lying on Twitter at the moment, so if there is any swear words or anything of it inappropriate for, yeah, so there's some words that maybe shouldn't be in a live demonstration, but it's a live piece of code that we're running, so apologies if you do see something. So basically I just asked Twitter to search through all of the kind of most recent tweets and say, can you find my first name? And this case is sensitive, as you can see, so there's a Twitter user here, it's not me, it was tweeting about Jeremy Clarkson, saying something to him, and you can see that there's some Japanese tweets. Apparently there's a manga or anime character who has my first name, which is quite interesting, I think so, and given that I'm Irish, as you can probably tell, there are some Irish language tweets that I mentioned in my name, so because this is live, I don't really have a very popular or common name, but if we were to rerun this code, it might produce some different results, it doesn't just now, it's maybe in a minute we can come back, just to show that I'm not seeing these results somewhere and calling them now, and in real time we are actually querying the Twitter API. So that's one of the simplest things we can do, we can just say, here's a piece of text that we're interested in, Twitter, go through recent tweets, and give me the first 20 search results and that find my name, so for example, let's extend it, let's say give me the 30 most recent, so it gives us the 20 that we've just seen, you know, plus a longer list here as well, excellent. So let's just confirm that we have actually connected to the Twitter API, again very simply, get our credentials, we sort those in a file, I think that's the safest thing to do, then we open the file, read the file, get our credentials, pass it to the Twitter API and the way we can then start using the Twitter API and search and come analysis methods that are most interesting. Right, so let's get to the good part, so we wanna request some more data from the Twitter API, so we can see we can do a very general search of texts, contain the tweets for a certain search term, let's be more specific, let's say give me all the information you have about a certain user, so I'm on Twitter, so remember earlier I created this Python variable or object called API and that's just basically a variable that stores all of the information needed to connect to the Twitter API, and so it's a nice shorthand, I don't basically have to write any of this out ever again, which is quite long, anytime I want to use the Twitter API, I just call this variable or this object here and whatever method I'm interested in. So as I said, Tweety is designed to make it a lot easier and faster to request data, and one way is by creating a method called get underscore user, as the name suggests, get me all the information you have about a given Twitter ID, so my ID is I'm German Mac, so basically I'm asking the Twitter API, tell me everything you know about me. So this is all my publicly available information, so there shouldn't be any secrets, but to begin with what you can see is some kind of metadata about the request to the API it just made, and what we're interested in is the actual content or the data itself, and that's all the sort of field called underscore JSON. So at the moment this is quite difficult to read the way Twitter returns the information. We can see in a moment how we can take out the information that's of interest, and we can put it into a nice clean data set or data frame, so we'll do that in a moment, but even if we just visually scan the information that's returned, and you can see the name that I gave in Twitter and my screen name or my profile name where I currently am, my former role, which needs updating, and do I have a personal website, et cetera. What was my account created, 2010? It was interesting. How many times have I liked something? How many times have I tweeted? I had tweets that are known as slapses in Twitter, and you can see what I have most recently done, so at five to eight this morning was the last time I tweeted, and I was to have an argument about public health, which everyone thinks they're an expert in at the moment in computing myself. So you can see there's reams of information returned, most of them is about my recent tweets, but some of it is quite interesting, kind of metadata about me as a user, and there's some interesting things about each tweet as well. You can see that I posted it from my phone or my laptop or where I was at when I posted it, so there's some quite good information you can get just about users and to begin with. As social scientists, or most of you probably are, if you're doing a case study on a given organization, it can be quite good to pull this kind of metadata about users. If we take something a bit more important on myself at the moment, we can see Boris Johnson, the Prime Minister of the UK, has a Twitter account as well, so we can get the same information from the term from him. So his screen name is his full name without a space in between. You can see he doesn't tell you where he is, but I guess we know where he is, which is number 10, Downstreet, and the description of him, who he's a Member of Parliament for, and then we can see what he's most recently tweeted. So if we go to the, so there's a field called Stascus, and within that role, the tweets that he's recently posted. So you can see the last time he tweeted, it was about 10 a.m. Here's the unique ID of the tweet that he posted, and here's the text that he actually posted. So yeah, it's Holocaust remember and say, so his most recent tweet was in honor and in memory of that, but we can keep scrolling back, and we can see other tweets that he made. Also, you can see that there's URLs that he's posted in his tweets as well, that can be quite interesting, which will allow a quick look up as well. Just now, that's very, very quick, just kind of look at, requesting basic search information and some basic user data and user metadata about people on Twitter. And as long as someone's profile is public, which I think is the best my knowledge of Twitter, that there's not such thing really is, there are private accounts, but most accounts worth anything or are following the public figures, for example, or are public, so you can get access to pretty much everything they said or everything they've shared, for example, or retweeted or like, so you can do a lot of really powerful stuff with the Twitter data. So we're going to run through a very quick example of in my research fields, I'm interested in charities and nonprofits. And there's a really, you know, kind of venerable and effective UK charity called the Royal National Lifeboat Institution, the RNLI for short. It has a very active Twitter account, which I'm quite interested in. And what I want to do very quickly is I want to kind of pull in as many of the most recent tweets as possible. I want to extract certain fields from those tweets, so certain bits of information that are really interesting. And I want to spit all that information out into a nice kind of clean database and save all of that to a fighter. So it sounds maybe reasonably complicated, but it's actually quite easy to kind of pipe on Twitter. So what I want to do is I want to get the full user timeline. So basically that's not everything that this Twitter account has posted, but it's the couple of thousands of most recent tweets that it's made. So just because Twitter has a database doesn't mean that you can access every single scrap of information that Twitter has ever collected about every user. And I know we're all that naive, but you will be surprised sometimes about the restrictions that are put on the data that you can access. So I can't get everything that this charity has ever posted, but I can certainly get quite a lot of this most recent Twitter activity. So again, there's a method called get user on your score timeline. I pass it a Twitter ID. So this is its profile name on Twitter. And all I'm doing then is saving all of those results and to a variable called rnli recent timeline. And then I just call that variable and say, right, show me what you've actually saved in this period. So what you can see is we get so many results that we get a little output window here. We've got quite a long-gated scroll. So you can see when I ask for the recent timeline, I'm getting reads and reads and information back, but as I said, not everything is ever posted. So again, as you can see the actual kind of data that's returned is in a field called on your score JSON. So we get some metadata about the most recent tweets. So they tweeted at 20 to 12 earlier today. Here's the unique idea of the tweet. What did they actually say? So special New Year's message, et cetera, et cetera, the president of the charity said, X, Y, and Z. So you can see that the request we made to the Twitterer goes successful, but we've gotten back at all the too much information that we can really read. So we need to handle it a bit better. So basically we've asked for the recent timeline, that the recent timeline is basically a list of all the recent tweets that it's made. What we wanted to do now is employ the program logic. We want to loop over the list of results that are returned and we want to then kind of pick out the relevant fields from the results and we just want to kind of predict what those fields contain. So basically I'm saying for every status in the variable I've just created, so in the big list of recent tweets, for every tweet, find the field called create that and find the field called text and print those back to the screen so we can read them. So basically this is just looping over this big list of results that we just produced, only this time pulling out kind of more readable, more interesting information. So we saw the most recent tweet was just before midday today, UK time. And we can see that they posted about an hour and 15 minutes before that, which was in response to someone else. And even in such a very simple thing that we've done here, we're already beginning to see the network aspect of Twitter data. So we can see that they're applying to a Twitter account called this and they're asking this Twitter account to email the Charlie directly. So not only have we got two Twitter accounts talking to each other, we've got the possibility that somebody then has actually contacted the charity through a different mechanism through email. Maybe I'm maybe being able to do more than it is, but I'm just trying to show you how we were beginning to see the kind of network and the web of connections and simply through one Twitter profile and at the moment. So as you can see, they've been reasonably prolific on Twitter, probably about four, five, six tweets per day. And typically, not always the case, but probably on average. And I'd say probably half their tweets are then posting content themselves. And the other half of the time it's replying to people to contact them on Twitter as well. And then they're able to tweet in well results, which is quite interesting. I mean, just seem to pick one, pick one out of really reasonably around them. On the 22nd of January, somebody called Emma from this Twitter account and seems to have done some fundraising for this charity by, I guess, cutting off her hair. So again, one simple act there, we can see a connection between charity and individual and a couple of other fundraising campaigns as well. So this is where my interest comes in between fundraising campaigns, charities and donors and how they're all connected. But I won't wax on about my research area, we'll wrap this coding example up shortly. Yes, so we've created a variable called rnli recent timeline. And that contains, as I said, about three and a half thousands of this most recent tweets, so we got the exact number just now. So basically, I want to kind of filter through that list and extract the information that I'm interested in solely because as I have to tell, there's a lot of metadata, there's a lot of stuff I'm not particularly interested in just now, but you may be interested in other bits of information also. So what I want to do is I want to create a new list. So I'm creating a blank list, which is what I'm doing here. So square brackets and pythons means I'm creating a new variable of the list type. So it's going to store common separated values basically. And then into this list, I want to take out the JSON field. So as I said before, when we make a request to the particular API, there's a little bit of metadata returned to actually the content of what you're asking for is in a field called underscore JSON. So that's all I'm doing here, creating a blank list and for every tweet in the user timeline, I'm basically saying, give me the JSON field. So next to that code there, it will probably take, I don't know, probably 25, maybe 30 seconds. When you're using Jupyter notebooks, if you go up to the tab in which you're running the notebook, and you can see a little hourglass, if there's an hourglass figure or symbol in the tab, that means that the notebook is still actually executing code. So just if you ever execute a cell and you're like, nothing's happening, it's frozen, odds are it's not frozen at all, it's just some commands take a little longer than others. The other way you can tell is if you go to the cell that you just executed in the square brackets that would be an asterisk, meaning it's still processing. And as soon as it's replaced by a number and there's no error message, then you know it's executed successfully. And what the number means is that's the order of code cells that you run. So this is the 14th code cell that I've run today. So how do we know that worked? So as I said, I've created a new list. A good thing about a list is you can ask the list how long it is. I saw how many elements are in the list. So in Python, I use the length method and I say how long is the list? Yeah, and as I said before, there's about 3,230 tweets, recent tweets that it's found for that chart that I'm interested in. Another way then to look at the list is every element in the list is numbered. And the number is beginning at zero, which may be somewhat confusing. Getting that would usually come from one. In Python, you come from zero. So the first element in the list is element zero. So that's why I have a zero in this field here. And you can see what I've actually then populated my list with. It's all the information in that JSON field. So here's the tweet, here's the data that was created at. Again, this is information that we've seen previously, but as you can tell now, it kind of stripped away all the metadata and the stuff I'm not thoroughly interested in. I'm just focused on information about each tweet. Again, I would advise you just to play around with the results that are returned because as you can tell, there's so much. A lot of it is very intuitive if you use Twitter. For example, so we've got information on every tweet. So this is the unique idea of this tweet. Here it was when it was posted and here is how many times it was retweeted. So that's a measure of engagement or again networking who's gonna be tweeting. And here's how many times that tweet was liked. It occurs to me actually that it's probably worth pulling this up in real time on Twitter itself just very quickly. So I look for the RNOI and just to show again, just to prove that we're not making this up and that we're connecting. So here's the first tweet that we keep seeing. Yeah, so in our special New Year's message to the president. So president is toilet-inus. Yeah, so retweeted 18 times and liked 62 times. Our information is obviously slightly out of date because we made a request five, six, seven minutes ago. So again, both of that's to show you that we're getting real-time information here from the Twitter database which is really interesting. So finally, what we want to do with that is we're done with this. It's all gonna create a new list. And this time, I just want to put in certain variables, certain bits of information in this list. So very simply, all I want is for every tweet. All I want is the created app field. So I know what it was posted. I want the unique ID because with the unique ID, there's other Twitter API methods where I can say, and give me all the information for that specific tweet, for example. That's important because when Twitter data gets shared, so if you're doing an academic paper, there are obviously some personal information concerns with just sharing all of the information that we've just requested here ourselves. What tends to happen is if you write an academic paper based on Twitter data, what you'd end up sharing is a list of all the tweet IDs that underpin your analysis. So the whole point there is that you're not technically sharing personal data or confidential or sensitive data. What you're doing is saying, okay, here's the genuine list of all the tweets underpinning my analysis and it's up to you to go get the underlying information yourself. And then it's your responsibility because you're the one who's done a little bit of personal data and that's up to you to manage it correctly. So it's a technique called rehydrating Twitter data. So you get a list of unique IDs of users or tweets and it's up to you to kind of rehydrate or refresh that information yourself. So sorry, a bit of a divergent, but it's just some terminology you might come across. And then I want the actual content of the tweet which is in the text field. And again, I just want to, I'm just storing those three fields basically in a new kind of row or a new observation called status info. And then I append that row to the blank list that I created. And again, the kind of proof is in the pudding. It's easier to actually see this when I actually show you the results. So I have a new kind of data set called RNLI underscore data and I'm asking for the first five rows of the first five elements of that data set. And you can see row one has the date field, the ID field on the content and here's the actual information for that tweet. Here's the second row, second piece of information on the second observation and the three fields that we want, et cetera. So again, number Python counts from zero. So that's kind of elements or a row one. We can ask it for 10 fields per term. Yeah, it doesn't really matter. It's just to show you how it works. So what I'm going to do here is this is how you would save the information to your machine, for example, because we're working with the online version of the snow book and we don't kind of have permission to save the repository online because it's kind of protected. It's the original set of files. So this is the one thing about working online just here is in reality, when you're doing this work, you would have Python and Jupyter in a row book and tweet the install on your machines then you could start saving to a file on your machine all the information you downloaded. So this bit of code will run it will kind of create a file called this using the online server that we're using but unfortunately there's no way of getting that downloading it to your machine just now. So just bear that in mind for using this online service for kind of teaching purposes but when you're doing this yourself you'll have everything installed on your machine so you can then save and manipulate files much more easily. So that's just how you would save the information that we've downloaded. We can also kind of put it into a more familiar data format. So at the moment we've got a data format which is a list so that's where it brackets and then each element of the list is kind of like three separate fields itself which are kind of analogous to rows or observations. We can just make that explicit, we can say write actually shape the data into what we're typically kind of familiar with but where every column is very golden every row is an observation. So this is where we use the pandas Python module which in short time we've called PD. So using the pandas module use the data frame method so it creates a dataset basically containing the orn li data we created earlier. Call it a dataset df this is kind of standard Python terminology you could call it whatever you want and then you give me a random sample of five observations from the dataset. And this is how it looks. Now we're kind of, we've turned what's quite kind of abstract or probably unfamiliar data formatting into something we're quite familiar with but it's the exact same information that we've downloaded and we've just kind of reformatted what we've done. So we've asked for a random sample so here is row 63 which contains a tweet from Friday, January 15th, 2021 and then we've got a preview of the content fields also there's quite a lot of content and then what we can do again is if we were doing this on a machine and if we had a folder called data and we can create a file called this a CSV file and we can spit out the dataset to that file. Excellent. So thank you for bearing with me through the IT issues unfortunately. That's a very kind of reasonably quick tour of the Twitter API. As I said, there's a webinar which is again about an hour long where we actually show you how you can sign up to the Twitter API. And I run through this example here but not in as much detail and I cover some kind of next steps and some further learning. But hopefully today was worth it for kind of going reasonably line by line and taking our time saying this is all happening at every stage showing you how the data looks so when you ask Twitter for data and how we can then kind of repackage it into something a bit more familiar. As I said, this online notebook is available whenever you need it using the link that we sent or again as I showed earlier we've got the UK data service Gation Overpository containing all the open source free materials that we've developed for other courses. So again, the social network analysis one has all the code that we use today. It's got a reading list as well if you wanted to do some further reading and a lot of these books as well are actually open access. Some are but some are which is quite good some suggestions for learning Python, et cetera. So I will finish all the texts and questions if you post them to YouTube. What I will do just to prove that we're in real time once more on this again, Blackfyre is let's do our search again for my name. This time walk 20 minutes later. Yeah, so you can see that we've got some new search results because I think this was the first tweet before. Yeah, so now you can see there's been a lot of tweets containing my first name none of them about me, thankfully. But again, lots of Japanese tweets about that anime or manga character and I need to actually need to ask somebody about that and never check out it. So that was our whistle-stop coping demo of the Twitter API. So thank you very much. Hopefully you found that useful even in the troubleshooting I needed to do. So I'll take some questions just now if that's okay. So I'll just take a look at the YouTube chat going forward and you'll be able to contact the UK Data Service so we've got a help and email account which I'm sure Julia can post for you just now. Obviously you've been contacted by us so you'll have the UK Data Services details. I know I've left. So technically it's easy to say this but you can contact the team. I'm sure they're very helpful, very willing to talk to you about your individual project. I'm gonna apologize for just looking at some of the questions. It did look like I was speaking a bit quietly and I do have a baby who was asleep for some of this. So apologies. I should, I can throw up the sound next time. So I'll keep an eye on the YouTube chat just now if you want to post a question. If you don't, as I say, the UK Data Service will be more than happy to field your queries about social network analysis and some ideas for further training as well. Sort of very keen on progressing work in this area. Yes, Julia, you did all get the weird volume because I actually turned on the live stream on my machine myself, which this word inception echo thing going on because all the time I'm broadcasting from the machine and also I prefer crunchy peanut butter. Thank you, Julia. Though if it's a question of smooth or crunchy peanut peanut butter versus something else, I would choose something else if that's a better question. So again, if there are no questions that hopefully you found out very useful critical comments are just as welcome as pretty useful comments. So please, please don't slow because we've got, if I'm right, Julia, we've got another year or possibly even two years of working in this area. So there's plenty of good training coming up. So please let Julia and the team know about your needs and your ideas and your critical comments. That would be fantastic. So I'll leave that from me just now. So thank you very much. I was going to say, I haven't even agreed but thank you very much and hopefully see you.