 Hi everybody and welcome to working with Twitter data. My name is Joseph Allen And I am a research associate with the UK data service The UK data service is the UK's largest collection of social economic data and more a lot of that Sensors data we've been talking about recently is on there So feel free to check it out And if you do have any questions about this talk or want to contact me the best way to do that is probably on Twitter at Joseph Allen 1234 or you can send me an email at joseph alan at Manchester ac.uk But yeah, let's get on with it and just as a little warning. We are about to scrape some random tweets from Twitter So obviously I am slightly responsible for what we grabbed there, but we might get some really awful sexist racist Stuff from there because that's what happens on Twitter So apologies if I read anything out that is offensive in any way But know that it's very likely to not be my sentiment To summarize what we're going to talk about we're going to talk about why social media in general is useful We're going to talk about some project ideas some stuff You can do with WhatsApp data Twitter data similar stuff like that We're going to look at Twitter's built-in analytics Stuff in general we're going to talk about something called to archive Which is sort of a third-party tool That does some more advanced sort of analytics we're going to look at something called pipe dream and some basic automation And then finally we're going to look at what we can do with Python. So there's no programming until that very end bit So to start with let's talk about why it might be worth our time to scrape any social media What weird So when we try to analyze an individual we usually only have access to what I'm sort of calling their facticity This is sort of the set of facts that makes up who they are today sort of at a particular time, you know We know people change all the time People think they know what they're going to change into and things like that So we're really looking at just an individual snapshot of this individual might have access to a data set of somebody's age Sex family details salary all sorts of stuff and this can be useful to tell us at the time of collection What kind of person this is but that means that our analysis can already be dated by the next day We make a lot of assumptions at this point, right? We're not only assuming that this data is recent enough to still be relevant We're also assuming that this individual didn't lie and that's not necessarily malicious You know sometimes people lie to themselves as well For example, somebody might claim to be exercising every single day or they might be kind of upset that they don't exercise every day And they might downplay how good they are at sticking to a healthy diet and all sorts of things like that We also assume events outside of this data set can be ignored including the history this individual has there might be traumas Particularly to this individual there might be societal pressures that drive their behavior in unseen ways So we're making a big assumption when we do any sort of analytics, especially with social media data So just as an example we can look at salary for example We might have a measurement of somebody's current salary just at a particular time It might be 25,000 pounds a year this might imply that this person had a bunch of previous jobs Maybe their first job they made 12 grand a year then 15 then 20 and now it's on 25 And they might have just jumped up to this 25 from recent job change Alternatively, this could be somebody who you know worked some large corporation making hundreds of thousands of pounds And they realized that career wasn't really for them and they've dropped down to this 25,000 pounds salary a year So the fact here is that just having a snapshot at any particular time doesn't really indicate a past or future You know, there are some assumptions we can make but it's not always true This person might for example in a data set Oh wait This person might for example have a label vegan in the data set maybe for their diet Maybe lifestyle. We don't really we might not really understand what that label means and we can ask all these questions You know does this encompass That they are always vegan in every aspect Do they have a cheat day on Christmas day when they feel some pressure from their family? Do they wear vegan shoes? Do they have like fake leather clothing? Are they vegan because they're worried about their health? Are they vegan because they're worried about the planet? Did they transition to becoming a vegan or was it sort of an instantaneous Change after watching a documentary and finally will they continue to be a vegan? We don't really know if somebody says I'm vegan today That doesn't mean they are tomorrow or it might mean that they are for the rest of their lives This is just one label and it doesn't really answer any of these questions for us So when we look at the future of this individual Well, I hope that's recording. Sorry. There's some weird stuff happening on my screen So when we look at the future of this individual we can ask this question. Will they even remain vegan? This is not an easy question to answer Is this even a quantitative title, right? Will they transition to only eating animal products on weekends? Will they stay vegan full-time for the rest of their lives? Will they inspire their children to be vegan as well? We can make these predictions based on their ages salary job title, but they might make it easier for us So enter the tweets. So maybe instead of all this talk about machine learning and prediction We could use a more human skill, right? We could ask this individual just how are you feeling about veganism? How are you feeling about society's perceptions of veganism? And we could read their words. We could read their body language their intonation And we kind of understand whether they were exhausted whether they were grateful how they were really feeling about that While traditionally the qualitative side of this analysis has been tricky We now have individuals who will willingly publicly state their views over a period of time Not only is this collection done for us, but it comes with a modern web API allowing for easy scraping and easy searching It's restricted to 140 characters. That's a type of I believe So content is usually succinct and very high in topic high in sentiment for us to analyze that does make things a bit easier So when our user vegan boy Tweets, I'm getting pretty tired of this whole hashtag vegan thing We could be detecting early signs about this individuals future We can also pick up the vegan boys implying some aspect of this vegan diet is exhausting them If we continue to dig into tweets, we might find more might find complaints about the government lack of availability at restaurants all sorts of stuff like that Alternatively, we might see a tweet like this It says it has never been easier to be vegan and it's finally getting cheap You might infer that vegan boy is feeling positive about veganism vegan boy might also be smart enough to keep some of these opinions to himself But there are always millions of other Twitter users that we can look at and analyze instead So you've got to remember Twitter makes us easy for us. Okay, Twitter is a marketing platform They want the users to find it valuable They want the users to find it easy to use and sort of encourage you to share how you feel Even if that might be an entirely meaningless exercise So this is great news from a sort of social science perspective, but maybe not a good news from a social perspective So it's realistic that with a little bit of Python We could target an individual by name sample all of their historic tweets or even collect all of them If there's not that many search for keywords in those tweets associated with veganism or any other topic and analyze the sentiment of These tweets over time and with some smoothing we can kind of visualize somebody's You know growing opinion of something or even just growing discussion of what that thing is especially with veganism You know there's a huge increase in the number of tweets over time about veganism because it's kind of trending upward at the moment So did I talk yes, I talked about sentiment analysis just then but I realized you might not know what that is So let's briefly look at sentiment analysis and sort of a naive implementation of it I'm gonna have a drink of water as well before I get into that apologies. Okay So it's usually easy for us as humans to look at a sentence like this and determine is it positive or negative? We can determine which parts of it are positive or negative So looking at this, you know, I love food obviously love a positive sentiment thing and it's attached here to the word food So we know they're talking about food But in the very next breath they can say they hate the government and we know immediately That's negative sentiment that negative sentiment is associated with the government not with the food There are a few ways sentiment analysis can work But one example is this to start with we need a training set So we might have a huge number of thousands of sentences have been labeled positive or negative so things like I love cats I'm having an amazing day. These are clearly positive to us as people who understand the English language Things like I hate the weather. I struggle to sleep are obviously negative to people who understand the English language And so with this label data set our sort of model can begin to understand what words are associated with those positive and negative labels But we also have some more nuanced sentences English isn't so easy We have things like I love the government taxing my hard-earned money to us Maybe clearly sarcasm to some people. It's not clearly sarcasm, but it uses strong positive text like the word love Again, you might have something said playfully to a friend or a partner. I hate you You're so silly could perhaps be just a playful thing said to a partner. It's positive We know it's positive as humans, but our model probably won't know it's positive As long as these cases are rarer than the usual cases our model should adapt to dealing with these though So our model will begin to recognize certain words show up in frequency with particular sentiments We get the positive words like love maybe cats show up in a lot even though it's not a positive word It might be associated with positivity Amazing great love all good examples of positive words. You also get some negative words hate struggle Stub, but then we might get some topics associated with that negativity as well weather or sleep And then outside of those positive or negative words We have what are called sort of stop words or neutral words. These are things like I am View the these these words don't really mean anything but grammatically they're needed to sort of connect words together So looking at our recent tweet from vegan boy, we see that he loves food and he hates the government Our model will score each individual word here So the neutral stop words we talked about before like I food and there will have neutral sentiment And they'll have a score of zero the word love from our training data has a huge positive sentiment It might give a sort of plus five sentiment score moving us to a positive side of the scale The word hate has a large negative sentiment which might give us negative five bringing us back to sort of a zero score saying This is a neutral tweet now as I said before a quirk of this is that if complaints or particular topics Sorry, if complaints are frequently associated with the topic for example if 80% of our tweets about the government happen to be negative We might train our model to think that the word government itself is negative This is really important for Twitter data because people are going to be using it Probably more to complain than to defend something you're more likely to get sort of polarizing Content and I would imagine that most tweets about the government and negative. I'm not not sure about that There's nothing backing it up, but let's see So yeah, and this in this use case perhaps government should be a neutral term But in our training data, maybe it isn't because we trained it on tweets or something like that So this could tip the scales and basically classify this entire tweet as a negative tweet So while we know there's two different clear sentiments here The one tweet itself is is basically neutral slash negative depending on how we trained it So traditionally tweets and free text was quite difficult to analyze But really we do have quite advanced natural language processing tooling these days that makes it much easier than it than it used to be And with this tooling now any free text you can find is open to analysis So for example, you can export your WhatsApp chat read it into a data frame and apply this sentiment analysis to calculate Which of your friends send the most positive or negative sentiment messages, for example Here we can see that of my friends Sam and Harry We have sort of the same number of negative sentiment messages But generally I send a larger percentage of positive messages Therefore bringing up the mood of our WhatsApp chat and making me the best friend Another example was a funded digital art piece where viewers could tweet mean things to a rose releasing poisoned water Alternatively people could send nice messages to water the rose and if nobody tweeted at all It would just dry up and die very similar to many users on social media And a useful thing for testing this is just targeting American politics on Twitter So if you search for mentions for particular politicians or political parties You will very reliably get a new tweet every couple of seconds. So we can be quite a nice way to test things out And an extension to this I've not really explored but you could run some topic analysis So this extracts exactly what people are talking about in their tweets and with this you could classify tweets Which color veganism or veganism and pricing ethics politics and more so we can simplify our text use Simpler synonyms to group topics so we could seek out things like vegan veganism vegetarian You know, these are all similar plant-based diets plant-based diet is a good one as well You know lots of stuff that we could look for as well as veganism to improve the size of our data set We could also look at terms that are opposite to veganism. Maybe things like barbecue or steak I don't know if anyone would use carnivore, but it might show up And you can sort of see how those those different opposites would react to similar events or react to policy changes and A final interesting thing we could do is make a word cloud out of all of our positive or negative words We could get all the positive words about veganism and mask them over a piece of broccoli or something like that use some colors And these make quite quite cool visualizations and you'll notice if you look into it particular words Sort of pop out as well. So it can be quite useful visualization Sort of from a I don't know from a media perspective, but generally not too useful if you're actually trying to do any academic research Okay, so that's a lot of what we can do and why it's cool to do it But let's actually start making something happen. So to start with I'm gonna assume nobody here wants to code There's no point me going into a Python example if you don't know Python So let's see what we can do just by being a bit tech savvy So we start with the Twitter's built-in analytics a Twitter again as a modern social media platform Provides an analytics service for its business users. You might have to opt in to turn this on for the first time There isn't really a trivial way to export any of this data other than writing down numbers But it can be useful to sort of inspire What data is Collected from Twitter and hence how we could sort of use that in research projects or otherwise So if you're not done this before you do need a Twitter account So log into that Twitter account click on the settings cog in the left-hand side And you click the analytics tab and you might be presented here with the option to turn on analytics to capture from this point On this page when it has collected some data We get a summary of the last 28 days of Twitter activity. So here we can see in the last 28 days I've tweeted 68 times and that is almost three times the amount I tweeted the month before We have tweet impressions so tweet impressions while it looks impressive with 23,000 tweet impressions This is just the number of times you've shown up in anybody's feed or search results So it doesn't mean anyone even looked at your tweet It means it could be like the bottom of somebody's page and they didn't scroll to it It doesn't mean they engage with it. That's that's different. That's measured as what we call engagement So this is where somebody likes or reads retweets or does anything with with intention with that tweet itself We also get profile visits, which is the number of people who view my personal Twitter page We get the number of mentions, which is when another account tweets involving me as a Twitter user And then finally the number of followers number of people who follow and see my content Beyond this we also get a monthly breakdown for every month since she started tracking that data The format of these do change over time as they sort of make new data available Recently we can see our top tweet top mention and top media tweets again These all report which ones receive the most impressions Which might not be the most useful metric You also see your top follower who's your follower that month with the highest number of followers So it could be useful. It might you know, I would be thinking about What that can mean if you're doing some sort of network analysis or something like that But but all we're really looking for here is is there something in this analytics that interests us This is a great top-down view of the kind of data we can get about not just ourselves But pretty much anyone on Twitter with a public account Next we have twa archive. I don't know if that's the correct pronunciation. So apologies to the creators But if that Twitter analytics wasn't good enough for you, there's a slightly more technical step of requesting your Twitter archives This website is maintained by Bastion Greshank Solvaris who handily helped me with this presentation when parts of it wasn't working He's the director of research at a larger project called open humans and they also offer a grant of five thousand US dollars For anyone who can do any sort of interesting visualizations with sort of newer forms of data. So yeah, it's worth looking into Feel free to check that out and I'll try and put it in the comments. Well the description below this video So in order to request your Twitter archives again, click that settings cog click on settings and privacy You'll go into your account and download an archive of your data and And you sorry download an archive of the data this process will take a couple of days But you should receive a notification from Twitter when it's done I didn't notice a notification from Twitter. So yeah check in in about a week and there should be something there You can then upload this document to that open humans website And it'll sort of generate some analysis of your data So first thing we can see is where we have been tweeting for me doesn't look like I've been tweeting anywhere So while every tweet has actually quite a large amount of metadata that we'll see a bit later Including geolocation it doesn't have to so you can turn off certain parts of that to it Twitter metadata Which I have done. So there's no geolocations on any of my tweets. I don't have this data Luckily on this website. We have the ability to make our data public for other users to enjoy So I can share where somebody else tweets from Realistically, this isn't my data here, but we can look at anybody else's Data if they chose to make it public on this website, which is quite nice So looking at this for example, we can see that this person tweets a lot from looks like New York California Tanzania and maybe we can even see the places they've been on holiday or the places they've been for work So it can be a useful visualization, but really what can we do with that? It's it's more useful to see some sort of You know, we could plot periods of time This person seemed to live somewhere from their tweets for example if they use Twitter a lot But from the geographic visualization alone, there's not too much to go off there We also have tweets per day. So this is shown a rolling 180 day average, which is quite a lot of dampening That's quite a lot of smoothing But we can't really control that using this third-party cell service if we want to do that ourselves We need to dive into that data ourselves So we can see here when I first started I must have tweeted quite a lot It makes sense. I was originally using Twitter for some events management stuff So it's not too surprising that we see a huge amount of tweets at the start there I also started running a meetup around 2018 and you can see sort of from that point onwards tweets of Slowly increased but not to very much, you know, we're talking about four tweets a day. That sounds like a lot But it's not really And then you can also see just at the end of 2020. I got a job that required me to use a lot more Twitter data again So I seem to have spiked in my use again We can also see So what do we need to talk about here? We've got replies. So yeah, let's find this replies or when you Tweet something in response to somebody else's tweet We have retweets, which is where you reshare somebody's tweet and basically resurface their content to your followers And then we have a regular tweet, which is where you write your own original content for your followers This graph shows the ratios of these over time. They're kind of erratic It's quite difficult to understand what's going on without understanding The the things that are happening to me personally over this period of time And even me knowing that data, we don't really see a lot of change, you know, maybe the most interesting part is this Area over the covid period. There was probably a lot more Maybe more collaboration. Maybe more working with other Venues to run meetups or run events, but less retweets at the same time. So it does seem a bit strange But you can just see how like Looking at one person's tweets for this isn't really telling us much. We need to really understand a lot of the context behind this Twitter usage Uh, we also have tweets by our I didn't have enough data to generate this but there's an amazing Split and it does show you the split between weekday and weekend usage and for the average nine to five You know salaried worker You see some really interesting spikes here, you know people waking up at regular times Checking social media before work even checking social media peaking in the morning Seemingly just after they get into work. So it might be part of that, you know The ritual of the morning getting a cup of tea checking social media checking your emails and catching up on the day Then we see a little post lunchtime spike again I think a little post lunchtime ritual or a lunchtime ritual of checking Social media and then finally we have a big spike After dinner time it seems and then again a spike just before bedtime And then completely juxtaposed on the weekend. It looks like You know our data spikes just after lunchtime and that seems to be about it for the weekend for this user But again can be quite interesting to see would be really cool to see with some geolocation data as well You know, maybe you could classify tweets at work compared to at home and see how that changes Um, but yeah, again, we're looking at these things to try and maybe inspire a project idea and just see what's possible with this data Um, there's some media coverage about using this website to assess whether you are a sexist on twitter Um, it's for the sake of transparency. Here's mine. Luckily, it doesn't seem too bad. You know, there's some there's some trading around but in Generally, I'm sort of replying and reacting to the Manchester data community And luckily in the Manchester data community There is a huge amount of sort of meetups for women by women We've got things like her plus data pie ladies our ladies all sorts of stuff like that So yeah, it does seem that most of the content I get is from from females Um, I didn't have enough data for retweets by gender for some reason Um, but this one seems like a better indicator of sort of whose content we're consuming and surfacing You can see this person is generally retweeting more content by men than women Which brings up a much larger question. How do we know if somebody on twitter is a man or a woman? Can we trust it so on this third party site? They use a python package called gender guesser, which uses the first name of your account to make a prediction of your gender um In 2017 we had this great tagline from gender if you haven't added one We basically assumed one the one that is most strongly associated with your account Um problematic from many angles and we have to remember that twitter is a marketing space So to them it's obviously very useful to have an understanding of whether you're male or female um, or anything else and Also, interestingly, you know, does that affect the type of content we see do brands have a gender in twitter's eyes? Probably um It does seem this data isn't exposed over the apis, which is a good thing because it's not confidently generated Uh, neither is age in other protected fields, but you can infer these things from somebody's data sometimes So it's a brief aside. I tried this gender guesser package to see if I could rely on it. I gave my name joe and it gave male Um, perhaps joe for joe would be the female version of joe Uh, but yeah, lots of assumptions to make sam interestingly gives a mostly male tag And that's because this package can give gives a range of male mostly male androgynous mostly female to female And then we also have the unknown tag so female names will return Female and then these big brands like uk data service my employer or mcdonald's they show up as unknown I'm assuming because they're not in the training data Um, so that's quite interesting as well And then another interesting thing is as I use twitter quite a lot. I have like over 10 different twitter accounts I thought I'd look through them all and see if twitter had correctly assumed that I was male in all these cases To my personal twitter that you might have seen is male matching the guests from my first name makes sense Pi Data Manchester a sort of coding event. I run Uh was also gendered male not surprisingly, you know stereotypically. It's a male activity. Um But yeah, I guess that right A reasonable guess but still kind of kind of icky Out of all of them the only one that was misgendered was my japanese language learning twitter So something about engaging with japanese content on twitter or language learning or anything like that is female in twitter's eyes And then otherwise I have you know a charity a web agency a data company. They are all gendered male The charity is mostly run by females. So that's quite interesting as well. So we can see These assumptions aren't entirely accurate Even anecdotally from just these six datasets a third of them are wrong So be careful basing sort of all of your analysis off this and you know Who's to say how well this is trained and how well that works on non english names and all sorts of stuff like that So just yeah be aware Of the danger of using something like that Okay, so we've covered about as far as we can get without getting a bit more technical So just because we're working with a modern web api doesn't mean we need a programming language just yet A fantastic automation tool i've been using recently is called pipe dream There's another one great one called zappia, but pipe dream is a little bit better for some sort of technical Things like sentiment analysis it seems So pipe dream is a tool for automation It's got over 300 integrated apps such as slack twitter google drive And we can listen for twitters which could be anything from you know run this once a day to run this every time I make a tweet or upload a youtube video And after these triggers we can perform a series of actions such as modifying our code Performing sentiment analysis of a tweet writing that tweet to a google sheet Um, I mean this is quite useful because I think for me the hardest part of sort of scraping Social media or doing anything like that is is handling your authentication tokens and all that weirdness And I think it's such a huge step up Um from the quite basic process that a lot of people want when they're trying to scrape some twitter data Um, so this kind of gets around that because you can use your personal token to scrape twitter As you would if you were just a normal user on the website So an ideal solution for recording some live tweets But not really good for any historic tweets if you want anything older than like a weeks data It's it's not going to be what you need Okay, so I'm going to do a demo. Hopefully this works because it didn't work last time Still recording. Okay Let me get this Let me use this as well Just get my notes up, but Let's see if we actually need them. Okay pipe dream demo So I've already made an account here on pipe dream But check that out pipe dream.com We're going to click new and we're going to make a new workflow name it scrape Twitter or something like that We need a trigger. So as I said, this can be any app you're interested in but I'm going to look at twitter And I'm going to search mentions so emit new tweets that match a search criteria You'll be asked to connect a twitter account which uses my personal account josephalen1234 Then we can input a search term we're interested in so I'm going to search for vegan And you can do all sorts of stuff here. You can restrict to languages. You can make sure there's no retweets or replies Uh, we can pick a geo code and a radius around that geo codes And we can pick how often we want to scrape it and even if there's a particular person Oh, no, this is a name for this step. Okay So then we're going to click create source And that will grab a recent tweet that we can use basically as a test case So we'll see here. We get one tweet Um, so we can see full text you are making the assumption You are suggesting animal genocide, you know, exactly the argument who's trying to make a vegan lifestyle would reduce exposure to pandemics Which you admitted via factory farms your responses in such bad faith. It's surprising. You're a doctor so Again putting on that human hat there as a human Negative sentiment. It sounds like but let's see what happens. We also get all sorts of other data here So we get we would get the user mentions as an array so we can sort of crawl down into those If we want to if there were any hashtags, we could grab them if there were any URLs We could grab them and also really really interestingly we have this user object here with 39 keys There is loads of stuff in here the description about that person The username of that person where they are all sorts of crazy stuff So there's there's loads of stuff you can get into there and it's really nice to just be able to see that here Because we kind of forget how much of that we actually have access to So, yeah, we could send a test event. It'll grab all that data It's all good So that's that's grabbing a tweet um, next we're going to Yeah, that's all good. Next we're going to adding node j s Sort of code action. So this just lets us write any javascript we want Obviously, we're getting a little bit into programming here, but All we're doing this for is to run some sentiment analysis in real time We don't have to do that if we're just trying to create some tweets But I'm just trying to see, you know, is this good sentiment negative sentiment is even possible Can I rely on it for the future research? So We can include any npm package here in this javascript and I know I'm talking about a lot of javascript specific stuff here So don't worry. Don't worry about that. All I'm going to do is grab this usage example from here and paste it in there This first line is going to say, please Import the sentiment package from npm. So this is all that pipe dream needs to see Um, in order to handle that package management for us, which is really nice We're going to create a new sentiment object that will allow us to do some sort of analysis Then we're going to ask it to calculate the sentiment of the sentence cats are stupid And then it's going to log out that result. So that's all we're doing for now I'm going to deploy that so I can test it. I'm going to send a test event And you can see that triggers the first one and now it triggers this code as well So all it's doing for now is logging. We've got a negative to Got a negative score sentiment minus two That just means generally the tweet is negative which makes sense And we can also see how it's calculated this So it's calculated the only word out of those tokens that has any sentiment associated with it is the word stupid And it's saying there's no positive words. There is a negative word. It makes sense. We've got a negative score there So that's all well and good, but we don't want to analyze cats are stupid And we also want to return this result So the next step is instead of cats are stupid. We need to access that event data from before How do I do that? Let me check my notes. Okay, so test event all good sentiment Okay, so this this object here event That's handed in will collect the data from the previous events basically in here. We can type event Dot and it should give us access to all of that stuff that was returned from the previous one So all we want is event dot full text to analyze that twitter data But we could be doing that with any of these keys that we returned from before And instead of logging those results, I'm going to return those results And this just means in the same way that we could access the data from the twitter step We can now access the data from our node js step as well. So I'm going to redeploy that And I'll send a test event and this time that test event will get passed into that sentiment object So we see here we have a calculation We have a score of minus three and negative sentiment which makes sense. It's still It's still the same text we looked at before We've got two negative words bad and admitted interestingly One positive word faith Again interesting so anything we would add there you think pandemic might be I mean genocide is certainly a negative sentiment, right? So symptoms of maybe a weaker model then we would like but that's okay. We got negative sentiment Next up We want to add this to a google sheet So it's great that we can just see those tweets in real time But really we'd like to store a sample of them if we could get a couple of thousand Maybe we could do some analysis to see if it works So I'm going to add a single row To a google sheet. I've already sorted this out. So it's connected with my gmail account And here we can so I've got this demo sheet here. I'm going to use I've got an empty one Let's use this and I'm just going to delete all this data here You can see I've already been scraping loads of stuff Oh, maybe let's not do that. Okay So I got loads of historic tweets there, but I'll do a new one So in sheet two I'm going to collect the timestamp of the tweet The username of the user the full text and the sentiment and let me copy that as well before I Forget it So what we need to do is find these in our object now that would take a lot of guesswork But it actually gives us this handy tool to help us find it So the first one we need is timestamp. So I'm going to grab this created at Then we click this plus and we can get the next one which needs to be username. So again events Probably in the user object if we're looking for the username And I think we want screen name here So this is your twitter handle itself as opposed to your actual name that shows on twitter So we've got a twitter handle could be useful if we want to follow a particular user's journey through veganism as they tweet them out Next we need the full text of their tweet which is right here So we click select path on the side and then finally we need our sentiment calculations So instead of events we're going to look in the steps object And then there you'll see our node js step or whatever you rename this to if you want to we'll show up there We get all those return values, but for now I'm just going to grab the score Next we need a spreadsheet id. So again if I look at this demo spreadsheet Your spreadsheet id is this weird number string combination in the url not too intuitive to find But that's that and then your sheet which in my case is sheet two By default it'll be sheet one, but sheet two is fine. I'm just going to rerun this This twitter demo for later because it does Take a little while sometimes, but I do not found Oh, okay. Okay, um, so that should all be good now. So if I run this test event We should see something sharp in here So I'll send the test event it'll run through that and it should add A tweet here with a timestamp username full text and sentiment. There we go. I mean, it's our test event again We can see the same text from before Our sentiment we can do all sorts of stuff with that very good. We got one tweet We got our test tweet all we need to do now is enable this live trigger And that means that every 15 minutes it'll run again. There's somewhere you can go to see that So here steps trigger Here we go. Okay, so we can see the next time it's going to run is in about seven minutes It's going to run every 15 minutes You can see here it would have scraped All of these historic tweets and we can just click run now and it'll do it again Now when we click run now, it'll try and get 100 tweets and it'll try and write them all here So you can see it's writing over itself It's getting 100 tweets, but it's not successfully writing them one at a time All we need to do there is limit concurrency Um, so it's just it's it's firing off 100 events and they just happen to be coming in at the same time So all we need to do to change this is go to Back to that workflow We need to go into this settings object here and we just need to click limit concurrency and that solves that problem for us So save that go back to Where we just were and we can force it to rerun prematurely if we like in this edit coding configuration So we click run now. There's 200 tweets And I don't believe it gets duplicates here So if we run it in a short period of time, we shouldn't see another 100 tweets come in we should see a sort of recent I guess there's no maybe I'm doing it too often But it's not erroring it should it should store some of those tweets for us at this point basically So at that point we will basically collect up to 100 tweets every 15 minutes. We get a sample With one of them running you should be all right, but you can hit a limit There are a few limits of how many times you can hit this api We've got some neutral tweets. We've got a positive sentiment one I bought vegan oat milk so excited to try it with my recipe excited sounds very positive sentiment to me. So that looks good Um As you collect more you'll get some hugely positive hugely negative tweets here um See if I can find a good Yeah The first very positive one and very negative one were both kind of rude there. So I'm just looking for a better one Positive mama made a vegan curry and fresh patty. I'm happy positive makes sense Cows and inadequate always target the weakened vulnerable negative sentiment. So in terms of general English language, it's doing pretty well Let's just run this again and see if I can get any more It doesn't look like it's getting me anymore Now still 200 okay Well, if I leave that on it'll collect some more stuff But I'm not going to leave it on because I do want to use my pipe dream for something else so that is Yeah, that's everything we need there and if if you don't want to work in here You can obviously export this as a csv so you can import it into pandas or whatever other data analytics stuff you want to do But yeah, that's everything for the pipe dream demo So let's get back into this Get my speaker notes up Okay, and finally, let's look at the twitter api itself Oops, there we go So the twitter api is made up of various endpoints which will expose data and methods It's very likely that the version of twitter we use as sort of public twitter users uses some sort of version of this api as well without Without so many restrictions With this api we can look up individual tweets or users We can search recent tweets and with the premium tier we can scrape more historic data So the public version of the api not the public version the version of the api you will have access to as a developer Will only expose the last seven days of twitter data But if you have the enterprise version you can access the last 30 days And beyond that you can use some web scraping tools or the academic tier We'll talk about in a little bit to access basically an archival search of all tweets forever We can also stream tweets in real time visualize that we can filter them And we can also explore the relationships between followers and influencers and sort of crawl through that network if we want to So yeah using the free tier as I said, we only get the last seven days of twitter data But using the premium or enterprise tiers allows to access the last 30 days Obviously twitter makes them on 30 days publicly available. So you could lean into some web scraping technology There's also a python package called get old tweets 3 or something like that that does something quite powerful as well But obviously that requires you to be competent in python to use There is also a newly introduced academic tier. This gives you full archive searching So a different endpoint fantastic resource will let you save the usual workarounds of trying to scrape that data Many social media companies are pretty Strict on who has access to sort of this high tier of data So you have to be very careful make sure that you're using this for non-commercial use And they'll ask you to sort of defend why you think you need it and things like that I've gone through this process just today It seems okay. It doesn't seem too bad You could probably do it and you know, you can paste in the abstract of whatever paper you're writing or something like that And that'll work fine um, they've also Created a new and easier to use developer dashboard that lets you create new projects Um adds limitations to those projects and all sorts of stuff and and this will change depending on whether you're on the standard track or academic track So my suggestion would be to apply for a developer account uh, choose the standard track and just sort of This this will let you build projects for fun or for a good cause or for education reasons Um, it might take a couple of days to get this account, but they're pretty they're pretty quick with it Um, and I would use this and sort of explore The api and try and figure out what you could do with seven days and ideally you can translate that to what you would do with 10 years of data Um, next you can create a new project um Yeah, every couple of months this process seems to get a little bit more restrictive. They're trying to protect the user data obviously Make it clear that this is for fun or education. You'll be fine. The creation of these projects is pretty much instant Then you'll get some api keys, which you can copy Never share these keys publicly. They can easily be scraped by others on github Um, and you don't want to get them removed if your projects sort of rely on them But I will share mine for the demo But I'll delete them immediately after so don't try and use them and don't don't get me in trouble Um Yeah, let's have a look at that. So this is the twitter developer dashboard. So once you've got the Any track basically so you can see I'm on standard at the moment But if you had the academic track it would say academic here you can go into projects and apps This is my digital art rose demo from before for example You can just add an app here you can Oh No, I don't want to create an app. I want to create a project I clicked inside that one. Sorry scroll to the bottom of that and click create an app and you can make a new app It needs a unique name test I'm gonna try and come up with something Here we go. And then you'll get an api key secret key bearer tokens These are the things that you don't want to share with anyone. You definitely don't want to share on a talk like this But I'll delete them immediately afterwards. So it'll be all right So these are the things you're going to use in the twitter demo. I've got in the description below So you're going to paste that key as your consumer key paste your secret key As your consumer secret and this should allow us to access the twitter api through python So I should be able to run these in binder and so should you so you don't need to worry about Notebooks or package management or anything like that. That's all done for you Bring in these variables and then we're going to Create an auth handler and we're going to create an object that allows us to access that api. So it's not complaining As a demonstration, I will get the user twitter themselves And I will print their screen name which we see here name twitter Which is this line of code will print their followers Which is the number of followers they have so you can see I'm accessing this sort of friendly user Object here that has a bunch of methods for getting followers counting followers all sorts of stuff like that I can also iterate over each friend they have and print it out if we wanted to sort of create a tree network of their followers This might be how we do that Interestingly, there's only a few people on twitter there Um, but that's because twitter only actually follows about 10 people So luckily that hasn't given us any problems that trying to print out half a million followers would do If we want to paginate through pages of content, which is what we'd sort of have to do if we were scraping That can normally be quite tricky But luckily this package tweet pie has a cursor object that does that for you So you can easily call Thousands of items from here and it will basically handle paginating through those results 10 at a time for you in quite an easy way So here we'll just list five recent vegan tweets for us Doesn't do that sentiment analysis, but we could add that in here if we needed to And really it's not useful to just regurgitate them into our notebook. We need to store them somewhere So there's a little snippet here that will search for vegan tweets print the text out and then it will append those To a text series, which we can then read into a data frame So once you've got them read into a data frame, we can now put them to a csv And this can be what we use as the start of our research process basically So next steps for for you as the viewer I would suggest exploring the user and tweet objects here And there's loads of cool stuff in here loads of functions of tweet pie that return their friends their followers Let you search their users Get direct messages all sorts of crazy stuff. So it's it's good to look at that And I think the goal here is, you know, see what's possible Have an explore through the standard product track And if you think it'll be useful for an academic project go through the academic tier and get access to all the data you could possibly want That is everything next slide is questions, which obviously you can't ask me right now As I said, my twitter is josephalen1234. So if you do have any questions feel free to tweet them at me There I'll put the slides. I'll put everything else up there Binder links and all sorts. So it should be good And after this there's just some sources some extra tutorials you can look at and things like that But yeah, thank you very much for watching. I appreciate that's a very long video But yeah, thank you if you have seen this and I will be giving it occasionally So feel free to reach out if you do actually want to see a version of this talk But thank you very much for watching. I've been josephalen and you've been a pleasure